----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65174/#review206027 -----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java Lines 37 (patched) <https://reviews.apache.org/r/65174/#comment288987> I was talking to Gopal. We were thinking that for first implementation we could limit the processor to introduce new TopNKeyOp only below RS-Gby(HASH mode). So basically 1) match RS with GBy below, 2) check whether RS contains TopN, 3) check whether GBy is in hash mode, and 4) check whether RS keys are same as GBy keys. Then, if condition is met, introduce topN below GBy. When we work on pushdown in follow-up, we can generalize this process, extend it, etc. - Jesús Camacho Rodríguez On July 11, 2018, 12:30 p.m., Teddy Choi wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/65174/ > ----------------------------------------------------------- > > (Updated July 11, 2018, 12:30 p.m.) > > > Review request for hive. > > > Bugs: HIVE-17896 > https://issues.apache.org/jira/browse/HIVE-17896 > > > Repository: hive-git > > > Description > ------- > > For TPC-DS Query27, the TopN operation is delayed by the group-by - the > group-by operator buffers up all the rows before discarding the 99% of the > rows in the TopN Hash within the ReduceSink Operator. > The RS TopN operator is very restrictive as it only supports doing the > filtering on the shuffle keys, but it is better to do this before breaking > the vectors into rows and losing the isRepeating properties. > Adding a TopN Key operator in the physical operator tree allows the following > to happen. > GBY->RS(Top=1) > can become > TNK(1)->GBY->RS(Top=1) > So that, the TopNKey can remove rows before they are buffered into the GBY > and consume memory. > Here's the equivalent implementation in Presto > https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35 > Adding this as a sub-feature of GroupBy prevents further optimizations if the > GBY is on keys "a,b,c" and the TopNKey is on just "a". > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6ea68c3500 > itests/src/test/resources/testconfiguration.properties 9e012ce2f8 > > ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java > a002348013 > ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java > 71ee25d9e0 > ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 7bb6590d5e > ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/optimizer/TopNKeyProcessor.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java > 7afbf04797 > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java dfd790853b > ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/plan/VectorTopNKeyDesc.java > PRE-CREATION > ql/src/test/queries/clientpositive/topnkey.q PRE-CREATION > ql/src/test/queries/clientpositive/vector_topnkey.q PRE-CREATION > ql/src/test/results/clientpositive/llap/topnkey.q.out PRE-CREATION > ql/src/test/results/clientpositive/llap/vector_topnkey.q.out PRE-CREATION > ql/src/test/results/clientpositive/tez/topnkey.q.out PRE-CREATION > ql/src/test/results/clientpositive/tez/vector_topnkey.q.out PRE-CREATION > ql/src/test/results/clientpositive/topnkey.q.out PRE-CREATION > ql/src/test/results/clientpositive/vector_topnkey.q.out PRE-CREATION > > serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java > 9393fb853f > > > Diff: https://reviews.apache.org/r/65174/diff/3/ > > > Testing > ------- > > > Thanks, > > Teddy Choi > >