Chao created HIVE-8859: -------------------------- Summary: ColumnStatsTask fails because of SparkMapJoinResolver Key: HIVE-8859 URL: https://issues.apache.org/jira/browse/HIVE-8859 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao
The following query fails: {code} ANALYZE TABLE src COMPUTE STATISTICS FOR COLUMNS key,value; {code} The plan looks like: {noformat} STAGE DEPENDENCIES: Stage-0 is a root stage Stage-2 is a root stage STAGE PLANS: Stage: Stage-0 Spark Edges: Reducer 2 <- Map 1 (GROUP, 1) DagName: chao_20141113105959_486b4bba-a2da-43c5-bf42-0ee69cd42576:1 Vertices: Map 1 Map Operator Tree: TableScan alias: src Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: key (type: string), value (type: string) outputColumnNames: key, value Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: compute_stats(key, 16), compute_stats(value, 16) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE value expressions: _col0 (type: struct<columntype:string,maxlength:bigint,sumlength:bigint,count:bigint,countnulls:bigint,bitvector:string,numbitvectors:int>), _col1 (type: struct<columntype:string,maxlength:bigint,sumlength:bigint,count:bigint,countnulls:bigint,bitvector:string,numbitvectors:int>) Reducer 2 Reduce Operator Tree: Group By Operator aggregations: compute_stats(VALUE._col0), compute_stats(VALUE._col1) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Select Operator expressions: _col0 (type: struct<columntype:string,maxlength:bigint,avglength:double,countnulls:bigint,numdistinctvalues:bigint>), _col1 (type: struct<columntype:string,maxlength:bigint,avglength:double,countnulls:bigint,numdistinctvalues:bigint>) outputColumnNames: _col0, _col1 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-2 Column Stats Work Column Stats Desc: Columns: key, value Column Types: string, string Table: src {noformat} This query will fail because {{SparkMapJoinResolver#createSparkTask}} swaps the order of two tasks in the root task list. But, this is rather interesting, since if they are both root tasks, then order shouldn't matter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)