> On May 23, 2017, 5 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/CountDistinctRewriteProc.java
> > Lines 61 (patched)
> > <https://reviews.apache.org/r/59468/diff/1/?file=1727326#file1727326line61>
> >
> >     Comment: Queries of form : select max(c), count(distinct c) from T; 
> > generates a plan of form TS->mGBy->RS->rGBy->FS 
> >     This plan suffers from a problem that vertex containing rGBy->FS 
> > necessarily need to have 1 task. This limitation results in slow execution 
> > because that task gets all the data. 
> >     This optimization if successful will rewrite above plan to 
> > TS->mGby->RS->mGby2->RS->rGBy->FS This introduces extra vertex of mGby2->RS 
> > Note this vertex can have multiple tasks and since we are doing 
> > aggregation, output of this must necessarily be smaller than its input, 
> > which results in much less data going in to rGby->FS vertex, which 
> > continues to have single task.
> >     Also note on calcite tree we have HiveExpandDistinctAggregatesRule rule 
> > which does similiar plan transformation but has different conditions which 
> > needs to be satisified.
> >     Additionally, we don't do any costing here but this is possibly that 
> > this transformation may slow down query a bit since if data is small enough 
> > to fit in a single task of last reducer, injecting additional vertex in 
> > pipeline may make query slower.

Thanks for the detailed comments.


> On May 23, 2017, 5 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/CountDistinctRewriteProc.java
> > Lines 313 (patched)
> > <https://reviews.apache.org/r/59468/diff/1/?file=1727326#file1727326line313>
> >
> >     This should be PARTIAL2 mode as well, since GBy operator is running in 
> > Partial2 mode.

partial2 is expecting integer as input. However, here we are counting key_col0, 
which is a string. Thus, hash is more appropriate.


- pengcheng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59468/#review175801
-----------------------------------------------------------


On May 25, 2017, 4:03 a.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59468/
> -----------------------------------------------------------
> 
> (Updated May 25, 2017, 4:03 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Gopal V.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-16654
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 2dfc8b6f89 
>   itests/src/test/resources/testconfiguration.properties 47a13c93b9 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 8b04cd44fa 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/CountDistinctRewriteProc.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 7dace9076f 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 38a9ef2af1 
>   ql/src/test/queries/clientpositive/count_dist_rewrite.q PRE-CREATION 
>   ql/src/test/results/clientpositive/groupby_sort_11.q.out 2b3bf4a07a 
>   ql/src/test/results/clientpositive/groupby_sort_8.q.out 4faa0757cc 
>   ql/src/test/results/clientpositive/llap/count_dist_rewrite.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/nullgroup4.q.out e5a8eeee14 
>   ql/src/test/results/clientpositive/perf/query16.q.out cf90c0c162 
>   ql/src/test/results/clientpositive/perf/query28.q.out 78129cf68b 
>   ql/src/test/results/clientpositive/perf/query94.q.out 836b16bf9f 
>   ql/src/test/results/clientpositive/perf/query95.q.out fa94d0842b 
>   ql/src/test/results/clientpositive/udf_count.q.out f60ad0485e 
>   ql/src/test/results/clientpositive/vector_empty_where.q.out b2dec6d7f6 
> 
> 
> Diff: https://reviews.apache.org/r/59468/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>

Reply via email to