[
https://issues.apache.org/jira/browse/HIVE-16368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhihai xu updated HIVE-16368:
-----------------------------
Attachment: HIVE-16368.001.patch
> Unexpected java.lang.ArrayIndexOutOfBoundsException from query with LaterView
> Operation for hive on MR.
> -------------------------------------------------------------------------------------------------------
>
> Key: HIVE-16368
> URL: https://issues.apache.org/jira/browse/HIVE-16368
> Project: Hive
> Issue Type: Bug
> Components: Query Planning
> Reporter: zhihai xu
> Assignee: zhihai xu
> Attachments: HIVE-16368.000.patch, HIVE-16368.001.patch
>
>
> Unexpected java.lang.ArrayIndexOutOfBoundsException from query. It happened
> in LaterView Operation. It happened for hive-on-mr. The reason is because the
> column prune change the column order in LaterView operation, for back-back
> reducesink operators using MR engine, FileSinkOperator and TableScanOperator
> are added before the second ReduceSink operator, The serialization column
> order used by FileSinkOperator in LazyBinarySerDe of previous reducer is
> different from deserialization column order from table desc used by
> MapOperator/TableScanOperator in LazyBinarySerDe of current failed mapper.
> The serialization is decided by the outputObjInspector from
> LateralViewJoinOperator,
> {code}
> ArrayList<String> fieldNames = conf.getOutputInternalColNames();
> outputObjInspector = ObjectInspectorFactory
> .getStandardStructObjectInspector(fieldNames, ois);
> {code}
> So the column order for serialization is decided by getOutputInternalColNames
> in LateralViewJoinOperator.
> The deserialization is decided by TableScanOperator which is created at
> GenMapRedUtils.splitTasks.
> {code}
> TableDesc tt_desc = PlanUtils.getIntermediateFileTableDesc(PlanUtils
> .getFieldSchemasFromRowSchema(parent.getSchema(), "temporarycol"));
> // Create the temporary file, its corresponding FileSinkOperaotr, and
> // its corresponding TableScanOperator.
> TableScanOperator tableScanOp =
> createTemporaryFile(parent, op, taskTmpDir, tt_desc, parseCtx);
> {code}
> The column order for deserialization is decided by rowSchema of
> LateralViewJoinOperator.
> But ColumnPrunerLateralViewJoinProc changed the order of
> outputInternalColNames but still keep the original order of rowSchema,
> Which cause the mismatch between serialization and deserialization for two
> back-to-back MR jobs.
> Similar issue for ColumnPrunerLateralViewForwardProc which change the column
> order of its child selector colList but not rowSchema.
> The exception is
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 875968094
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:78)
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(LazyBinaryDouble.java:43)
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264)
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201)
> at
> org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
> at
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
> at
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
> at
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
> at
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:554)
> at
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:381)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)