> On Jan. 20, 2017, 6:26 p.m., Chao Sun wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java, line > > 31 > > <https://reviews.apache.org/r/55776/diff/1/?file=1610799#file1610799line31> > > > > Is it possible that `numPartitions` equals to 0?
No. If partition number is zero, that means no partition. Then we will not even get here. Nevertheless, if it's set to 0, we take 1 instead. > On Jan. 20, 2017, 6:26 p.m., Chao Sun wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java, line > > 34 > > <https://reviews.apache.org/r/55776/diff/1/?file=1610799#file1610799line34> > > > > I wonder whether this also has some extra cost comparing to the > > original `groupByKey`, since it needs to sort all records by key in a > > single partition, right? Well, we don't know which one performs better yet. repartitionAndSortWithinPartitions() brings extra softing, but it eliminates grouping in groupByKey(). Also, groupByKey() has unbounded memory usage, which is the problem we are tryig to solve. As described in the JIRA description. We will follow up with performance testing, and may provide an option to use either groupBy() which might be more performing but w/ unlimitted memory usage or the new way where memory usage is bounded. - Xuefu ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55776/#review162449 ----------------------------------------------------------- On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55776/ > ----------------------------------------------------------- > > (Updated Jan. 20, 2017, 6:07 p.m.) > > > Review request for hive, Chao Sun and Rui Li. > > > Bugs: HIVE-15580 > https://issues.apache.org/jira/browse/HIVE-15580 > > > Repository: hive-git > > > Description > ------- > > See JIRA description. > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java > e128dd2 > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java > eeb4443 > > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java > d57cac4 > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876 > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395 > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java > 997ab7e > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java > 66ffe5d > > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java > 0d31e5f > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f > ql/src/test/queries/clientpositive/union_top_level.q d93fe38 > ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83 > ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out > 65a6e3e > ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4 > ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3 > ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f > > Diff: https://reviews.apache.org/r/55776/diff/ > > > Testing > ------- > > All test passed > > > Thanks, > > Xuefu Zhang > >