----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/42508/ -----------------------------------------------------------
Review request for hive, Chaoyu Tang, Szehon Ho, and Xuefu Zhang. Repository: hive-git Description ------- HIVE-12889: Support COUNT(DISTINCT) for partitioning query. Diffs ----- data/files/windowing_distinct.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/functions/HiveSqlCountAggFunction.java 793704024158a895405db13aca310fc5e06015e2 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/functions/HiveSqlSumAggFunction.java 8f629707baa05d6d560aebc87ad0b97c4224ea61 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/PlanModifierForASTConv.java e2fbb4f48b9c52d43badb3e33d7ffc8c5834a962 ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/SqlFunctionConverter.java 37249f9cb429ee69ef6217fbf08c97dd32216add ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 1c44ade230681eab40222995ab3d9133b9097548 ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 15ca75489b1e56b09eb6d9005d8b32179299d22b ql/src/java/org/apache/hadoop/hive/ql/parse/PTFInvocationSpec.java 29b85105c487816e838918f51cf9c12c05806aa3 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a0251fb09c5b607d7263914aa5544d44f5309fc2 ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java a181f7c1a7bc66f37277e94fd3ffbef91290dc6f ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java eaf112e6b7326e3e2ceab289279e434b6535f418 ql/src/test/queries/clientpositive/windowing_distinct.q PRE-CREATION ql/src/test/results/clientpositive/windowing_distinct.q.out PRE-CREATION Diff: https://reviews.apache.org/r/42508/diff/ Testing ------- Support count(distinct) over partitioning window. 1. Enabling the parser to properly parse such query "count(distinct) over (partition by c1)"; 2. ORDER BY and windowing frame won't work with the functions of distinct due to performance concern and implementation requirement. 3. We insert the distinct fields into the order by list, so during counting, we only need to compare the current row against the previous remembered row. Thanks, Aihua Xu