[
https://issues.apache.org/jira/browse/HIVE-16368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954570#comment-15954570
]
zhihai xu edited comment on HIVE-16368 at 4/4/17 5:33 AM:
----------------------------------------------------------
Without the patch, the query plan of Reduce Operator Tree for the MR job with
LateralViewJoinOperator is
{code}
| Stage: Stage-3
|
| Map Reduce
|
| Map Operator Tree:
| ......
| Reduce Operator Tree:
|
| Join Operator
|
| condition map:
|
| Inner Join 0 to 1
|
| keys:
|
| 0 _col7 (type: string)
|
| 1 msg.chain_uuid (type: string)
|
| outputColumnNames: _col0, _col3, _col4, _col5, _col7, _col8, _col9,
_col10, _col11, _col12, _col14, _col15, _col16, _col17, _col18, _col19, _col20,
_col21, _col22, _col26, _col27, _col28, _col31, _col32, _col33, _col34, _col35,
_col36, _col44
|
| Statistics: Num rows: 35439950 Data size: 1169518365 Basic stats:
COMPLETE Column stats: NONE
|
| Select Operator
|
| expressions: _col0 (type: string), _col3 (type: string), _col4
(type: bigint), _col5 (type: bigint), _col7 (type: string), _col8 (type:
string), _col9 (type: int), _col10 (type: int), _col11 (type: string), _col12
(type: string), _col14 (type: double), _col15 (type: double), _col16 (type:
double), _col17 (type: double), _col18 (type: double), _col19 (type: double),
_col20 (type: double), _col21 (type: double), _col22 (type: double), _col26
(type: timestamp), _col27 (type: string), _col28 (type: array<string>), _col31
(type: double), _col32 (type: double), _col33 (type: double), _col34 (type:
double), _col35 (type: string), _col36 (type: bigint), _col44.all_points (type:
array<struct<ts:bigint,duration:bigint,lat:double,lng:double,on_uuids:array<string>,speed:double,assigned_uuids:array<string>>>)
|
| outputColumnNames: _col0, _col3, _col4, _col5, _col7, _col8,
_col9, _col10, _col11, _col12, _col14, _col15, _col16, _col17, _col18, _col19,
_col20, _col21, _col22, _col26, _col27, _col28, _col31, _col32, _col33, _col34,
_col35, _col36, _col37
|
| Statistics: Num rows: 35439950 Data size: 1169518365 Basic stats:
COMPLETE Column stats: NONE
|
| Lateral View Forward
|
| Statistics: Num rows: 35439950 Data size: 1169518365 Basic
stats: COMPLETE Column stats: NONE
|
| Select Operator
|
| expressions: _col0 (type: string), _col10 (type: int), _col11
(type: string), _col12 (type: string), _col14 (type: double), _col15 (type:
double), _col16 (type: double), _col17 (type: double), _col18 (type: double),
_col19 (type: double), _col20 (type: double), _col21 (type: double), _col22
(type: double), _col26 (type: timestamp), _col27 (type: string), _col28 (type:
array<string>), _col3 (type: string), _col31 (type: double), _col32 (type:
double), _col33 (type: double), _col34 (type: double), _col35 (type: string),
_col36 (type: bigint), _col4 (type: bigint), _col5 (type: bigint), _col7 (type:
string), _col8 (type: string), _col9 (type: int)
|
| outputColumnNames: _col0, _col10, _col11, _col12, _col14,
_col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col26, _col27,
_col28, _col3, _col31, _col32, _col33, _col34, _col35, _col36, _col4, _col5,
_col7, _col8, _col9
|
| Statistics: Num rows: 35439950 Data size: 1169518365 Basic
stats: COMPLETE Column stats: NONE
|
| Lateral View Join Operator
|
| outputColumnNames: _col0, _col10, _col11, _col12, _col14,
_col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col26, _col27,
_col28, _col3, _col31, _col32, _col33, _col34, _col35, _col36, _col4, _col5,
_col7, _col8, _col9, _col38
|
| Statistics: Num rows: 70879900 Data size: 2339036730 Basic
stats: COMPLETE Column stats: NONE
|
| File Output Operator
|
| compressed: false
|
| table:
|
| input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
|
| output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
|
| serde:
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
|
| Select Operator
|
| expressions: _col37 (type:
array<struct<ts:bigint,duration:bigint,lat:double,lng:double,on_uuids:array<string>,speed:double,assigned_uuids:array<string>>>)
|
| outputColumnNames: _col0
|
| Statistics: Num rows: 35439950 Data size: 1169518365 Basic
stats: COMPLETE Column stats: NONE
|
| UDTF Operator
|
| Statistics: Num rows: 35439950 Data size: 1169518365 Basic
stats: COMPLETE Column stats: NONE
|
| function name: explode
|
| Lateral View Join Operator
|
| outputColumnNames: _col0, _col10, _col11, _col12, _col14,
_col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col26, _col27,
_col28, _col3, _col31, _col32, _col33, _col34, _col35, _col36, _col4, _col5,
_col7, _col8, _col9, _col38
|
| Statistics: Num rows: 70879900 Data size: 2339036730
Basic stats: COMPLETE Column stats: NONE
|
| File Output Operator
|
| compressed: false
|
| table:
|
| input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
|
| output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
|
| serde:
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
{code}
The query plan of Map Operator Tree for the MR job with the TableScanOperator
is:
{code}
| Stage: Stage-4
|
| Map Reduce
|
| Map Operator Tree:
|
| TableScan
|
| Reduce Output Operator
|
| key expressions: _col27 (type: string), _col7 (type: string),
_col38.ts (type: bigint)
|
| sort order: +++
|
| Map-reduce partition columns: _col27 (type: string), _col7
(type: string)
|
| Statistics: Num rows: 70879900 Data size: 2339036730 Basic
stats: COMPLETE Column stats: NONE
|
| value expressions: _col0 (type: string), _col3 (type: string),
_col4 (type: bigint), _col5 (type: bigint), _col8 (type: string), _col9 (type:
int), _col10 (type: int), _col11 (type: string), _col12 (type: string), _col14
(type: double), _col15 (type: double), _col16 (type: double), _col17 (type:
double), _col18 (type: double), _col19 (type: double), _col20 (type: double),
_col21 (type: double), _col22 (type: double), _col26 (type: timestamp), _col28
(type: array<string>), _col31 (type: double), _col32 (type: double), _col33
(type: double), _col34 (type: double), _col35 (type: string), _col36 (type:
bigint), _col38 (type:
struct<ts:bigint,duration:bigint,lat:double,lng:double,on_uuids:array<string>,speed:double,assigned_uuids:array<string>>)
|
| Reduce Operator Tree:
{code}
was (Author: zxu):
The query plan of Reduce Operator Tree for the MR job with
LateralViewJoinOperator is
{code}
| Stage: Stage-3
|
| Map Reduce
|
| Map Operator Tree:
| ......
| Reduce Operator Tree:
|
| Join Operator
|
| condition map:
|
| Inner Join 0 to 1
|
| keys:
|
| 0 _col7 (type: string)
|
| 1 msg.chain_uuid (type: string)
|
| outputColumnNames: _col0, _col3, _col4, _col5, _col7, _col8, _col9,
_col10, _col11, _col12, _col14, _col15, _col16, _col17, _col18, _col19, _col20,
_col21, _col22, _col26, _col27, _col28, _col31, _col32, _col33, _col34, _col35,
_col36, _col44
|
| Statistics: Num rows: 35439950 Data size: 1169518365 Basic stats:
COMPLETE Column stats: NONE
|
| Select Operator
|
| expressions: _col0 (type: string), _col3 (type: string), _col4
(type: bigint), _col5 (type: bigint), _col7 (type: string), _col8 (type:
string), _col9 (type: int), _col10 (type: int), _col11 (type: string), _col12
(type: string), _col14 (type: double), _col15 (type: double), _col16 (type:
double), _col17 (type: double), _col18 (type: double), _col19 (type: double),
_col20 (type: double), _col21 (type: double), _col22 (type: double), _col26
(type: timestamp), _col27 (type: string), _col28 (type: array<string>), _col31
(type: double), _col32 (type: double), _col33 (type: double), _col34 (type:
double), _col35 (type: string), _col36 (type: bigint), _col44.all_points (type:
array<struct<ts:bigint,duration:bigint,lat:double,lng:double,on_uuids:array<string>,speed:double,assigned_uuids:array<string>>>)
|
| outputColumnNames: _col0, _col3, _col4, _col5, _col7, _col8,
_col9, _col10, _col11, _col12, _col14, _col15, _col16, _col17, _col18, _col19,
_col20, _col21, _col22, _col26, _col27, _col28, _col31, _col32, _col33, _col34,
_col35, _col36, _col37
|
| Statistics: Num rows: 35439950 Data size: 1169518365 Basic stats:
COMPLETE Column stats: NONE
|
| Lateral View Forward
|
| Statistics: Num rows: 35439950 Data size: 1169518365 Basic
stats: COMPLETE Column stats: NONE
|
| Select Operator
|
| expressions: _col0 (type: string), _col10 (type: int), _col11
(type: string), _col12 (type: string), _col14 (type: double), _col15 (type:
double), _col16 (type: double), _col17 (type: double), _col18 (type: double),
_col19 (type: double), _col20 (type: double), _col21 (type: double), _col22
(type: double), _col26 (type: timestamp), _col27 (type: string), _col28 (type:
array<string>), _col3 (type: string), _col31 (type: double), _col32 (type:
double), _col33 (type: double), _col34 (type: double), _col35 (type: string),
_col36 (type: bigint), _col4 (type: bigint), _col5 (type: bigint), _col7 (type:
string), _col8 (type: string), _col9 (type: int)
|
| outputColumnNames: _col0, _col10, _col11, _col12, _col14,
_col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col26, _col27,
_col28, _col3, _col31, _col32, _col33, _col34, _col35, _col36, _col4, _col5,
_col7, _col8, _col9
|
| Statistics: Num rows: 35439950 Data size: 1169518365 Basic
stats: COMPLETE Column stats: NONE
|
| Lateral View Join Operator
|
| outputColumnNames: _col0, _col10, _col11, _col12, _col14,
_col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col26, _col27,
_col28, _col3, _col31, _col32, _col33, _col34, _col35, _col36, _col4, _col5,
_col7, _col8, _col9, _col38
|
| Statistics: Num rows: 70879900 Data size: 2339036730 Basic
stats: COMPLETE Column stats: NONE
|
| File Output Operator
|
| compressed: false
|
| table:
|
| input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
|
| output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
|
| serde:
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
|
| Select Operator
|
| expressions: _col37 (type:
array<struct<ts:bigint,duration:bigint,lat:double,lng:double,on_uuids:array<string>,speed:double,assigned_uuids:array<string>>>)
|
| outputColumnNames: _col0
|
| Statistics: Num rows: 35439950 Data size: 1169518365 Basic
stats: COMPLETE Column stats: NONE
|
| UDTF Operator
|
| Statistics: Num rows: 35439950 Data size: 1169518365 Basic
stats: COMPLETE Column stats: NONE
|
| function name: explode
|
| Lateral View Join Operator
|
| outputColumnNames: _col0, _col10, _col11, _col12, _col14,
_col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col26, _col27,
_col28, _col3, _col31, _col32, _col33, _col34, _col35, _col36, _col4, _col5,
_col7, _col8, _col9, _col38
|
| Statistics: Num rows: 70879900 Data size: 2339036730
Basic stats: COMPLETE Column stats: NONE
|
| File Output Operator
|
| compressed: false
|
| table:
|
| input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
|
| output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
|
| serde:
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
{code}
The query plan of Map Operator Tree for the MR job with the following
TableScanOperator is:
{code}
| Stage: Stage-4
|
| Map Reduce
|
| Map Operator Tree:
|
| TableScan
|
| Reduce Output Operator
|
| key expressions: _col27 (type: string), _col7 (type: string),
_col38.ts (type: bigint)
|
| sort order: +++
|
| Map-reduce partition columns: _col27 (type: string), _col7
(type: string)
|
| Statistics: Num rows: 70879900 Data size: 2339036730 Basic
stats: COMPLETE Column stats: NONE
|
| value expressions: _col0 (type: string), _col3 (type: string),
_col4 (type: bigint), _col5 (type: bigint), _col8 (type: string), _col9 (type:
int), _col10 (type: int), _col11 (type: string), _col12 (type: string), _col14
(type: double), _col15 (type: double), _col16 (type: double), _col17 (type:
double), _col18 (type: double), _col19 (type: double), _col20 (type: double),
_col21 (type: double), _col22 (type: double), _col26 (type: timestamp), _col28
(type: array<string>), _col31 (type: double), _col32 (type: double), _col33
(type: double), _col34 (type: double), _col35 (type: string), _col36 (type:
bigint), _col38 (type:
struct<ts:bigint,duration:bigint,lat:double,lng:double,on_uuids:array<string>,speed:double,assigned_uuids:array<string>>)
|
| Reduce Operator Tree:
{code}
> Unexpected java.lang.ArrayIndexOutOfBoundsException from query with LaterView
> Operation for hive on MR.
> -------------------------------------------------------------------------------------------------------
>
> Key: HIVE-16368
> URL: https://issues.apache.org/jira/browse/HIVE-16368
> Project: Hive
> Issue Type: Bug
> Components: Query Planning
> Reporter: zhihai xu
> Assignee: zhihai xu
> Attachments: HIVE-16368.000.patch
>
>
> Unexpected java.lang.ArrayIndexOutOfBoundsException from query. It happened
> in LaterView Operation. It happened for hive-on-mr. The reason is because the
> column prune change the column order in LaterView operation, for back-back
> reducesink operators using MR engine, FileSinkOperator and TableScanOperator
> are added before the second ReduceSink operator, The serialization column
> order used by FileSinkOperator in LazyBinarySerDe of previous reducer is
> different from deserialization column order from table desc used by
> MapOperator/TableScanOperator in LazyBinarySerDe of current failed mapper.
> The serialization is decided by the outputObjInspector from
> LateralViewJoinOperator,
> {code}
> ArrayList<String> fieldNames = conf.getOutputInternalColNames();
> outputObjInspector = ObjectInspectorFactory
> .getStandardStructObjectInspector(fieldNames, ois);
> {code}
> So the column order for serialization is decided by getOutputInternalColNames
> in LateralViewJoinOperator.
> The deserialization is decided by TableScanOperator which is created at
> GenMapRedUtils.splitTasks.
> {code}
> TableDesc tt_desc = PlanUtils.getIntermediateFileTableDesc(PlanUtils
> .getFieldSchemasFromRowSchema(parent.getSchema(), "temporarycol"));
> // Create the temporary file, its corresponding FileSinkOperaotr, and
> // its corresponding TableScanOperator.
> TableScanOperator tableScanOp =
> createTemporaryFile(parent, op, taskTmpDir, tt_desc, parseCtx);
> {code}
> The column order for deserialization is decided by rowSchema of
> LateralViewJoinOperator.
> But ColumnPrunerLateralViewJoinProc changed the order of
> outputInternalColNames but still keep the original order of rowSchema,
> Which cause the mismatch between serialization and deserialization for two
> back-to-back MR jobs.
> Similar issue for ColumnPrunerLateralViewForwardProc which change the column
> order of its child selector colList but not rowSchema.
> The exception is
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 875968094
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:78)
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(LazyBinaryDouble.java:43)
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264)
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201)
> at
> org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
> at
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
> at
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
> at
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
> at
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:554)
> at
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:381)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
