Re: Review Request 72379: "get_splits" udf ignores limit constraint when creating splits

Adesh Rao Mon, 20 Apr 2020 11:19:26 -0700


> On April 20, 2020, 2:19 a.m., Sankar Hariappan wrote:
> > itests/hive-unit/src/test/java/org/apache/hive/jdbc/AbstractTestJdbcGenericUDTFGetSplits.java
> > Lines 181 (patched)
> > <https://reviews.apache.org/r/72379/diff/1/?file=2223577#file2223577line181>
> >
> >     Add test for limit with predicates, groupby queries as well. Also, make 
> > sure, we have partitioned table too.


Added queries with predicates and groupby. Since the existing test suite does 
not have partitioned table, need to go through other tests to figure out the 
standard way to generate a partitioned table.


> On April 20, 2020, 2:19 a.m., Sankar Hariappan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFGetSplits.java
> > Line 240 (original), 241 (patched)
> > <https://reviews.apache.org/r/72379/diff/1/?file=2223578#file2223578line241>
> >
> >     Do we need this computation? Can't we just use forceSingleSplit instead 
> > of generateSingleSplit?

Yes, we don't need this for limit query. Removed.


> On April 20, 2020, 2:19 a.m., Sankar Hariappan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFGetSplits.java
> > Line 259 (original), 260 (patched)
> > <https://reviews.apache.org/r/72379/diff/1/?file=2223578#file2223578line260>
> >
> >     Error message to be updated for limit query too.

Not required, as not generating single split for limit queries.


> On April 20, 2020, 2:19 a.m., Sankar Hariappan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFGetSplits.java
> > Lines 344 (patched)
> > <https://reviews.apache.org/r/72379/diff/1/?file=2223578#file2223578line344>
> >
> >     This issue should be there even for "order by" queries.
> >     How it work there?

By table, I meant the original table. The materialized table will have a single 
file only.


> On April 20, 2020, 2:19 a.m., Sankar Hariappan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFGetSplits.java
> > Lines 347 (patched)
> > <https://reviews.apache.org/r/72379/diff/1/?file=2223578#file2223578line347>
> >
> >     Since, we always materialize the results for limit query, I think, 
> > forceSingleSplit logic is not needed. Even, if we generate multiple splits 
> > on this temp table, the eventual results will have only expected number of 
> > rows.

Ack. modified the approach to not generate single split.


- Adesh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72379/#review220363
-----------------------------------------------------------


On April 20, 2020, 6:10 p.m., Adesh Rao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72379/
> -----------------------------------------------------------
> 
> (Updated April 20, 2020, 6:10 p.m.)
> 
> 
> Review request for hive and Sankar Hariappan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Since limit constraint was ignored while creating splits. If multiple llap 
> daemons execute splits, output contains more rows than specified by limit 
> constraint.
> 
> 
> Diffs
> -----
> 
>   
> itests/hive-unit/src/test/java/org/apache/hive/jdbc/AbstractTestJdbcGenericUDTFGetSplits.java
>  8cbca69737 
>   
> itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcGenericUDTFGetSplits.java
>  defbe78802 
>   
> itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcGenericUDTFGetSplits2.java
>  330174513c 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFGetSplits.java 
> 00a6c89b1e 
> 
> 
> Diff: https://reviews.apache.org/r/72379/diff/2/
> 
> 
> Testing
> -------
> 
> Waiting for Hive QA run.
> 
> 
> Thanks,
> 
> Adesh Rao
> 
>

Re: Review Request 72379: "get_splits" udf ignores limit constraint when creating splits

Reply via email to