[ 
https://issues.apache.org/jira/browse/HIVE-16368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated HIVE-16368:
-----------------------------
    Description: 
Unexpected java.lang.ArrayIndexOutOfBoundsException from query. It happened in 
LaterView Operation. It happened for hive-on-mr. The reason is because the 
column prune change the column order in LaterView operation, for back-back 
reducesink operators using MR engine, FileSinkOperator and TableScanOperator 
are added before the second ReduceSink operator, The serialization column order 
used by FileSinkOperator in LazyBinarySerDe of previous reducer is different 
from deserialization column order from table desc used by 
MapOperator/TableScanOperator in LazyBinarySerDe of current failed mapper.
The serialization is decided by the outputObjInspector from 
LateralViewJoinOperator,
{code}
    ArrayList<String> fieldNames = conf.getOutputInternalColNames();
    outputObjInspector = ObjectInspectorFactory
        .getStandardStructObjectInspector(fieldNames, ois);
{code}
So the column order for serialization is decided by getOutputInternalColNames 
in LateralViewJoinOperator.

The deserialization is decided by TableScanOperator which is created at  
GenMapRedUtils.splitTasks. 
{code}
    TableDesc tt_desc = PlanUtils.getIntermediateFileTableDesc(PlanUtils
        .getFieldSchemasFromRowSchema(parent.getSchema(), "temporarycol"));
    // Create the temporary file, its corresponding FileSinkOperaotr, and
    // its corresponding TableScanOperator.
    TableScanOperator tableScanOp =
        createTemporaryFile(parent, op, taskTmpDir, tt_desc, parseCtx);
{code}
The column order for deserialization is decided by rowSchema of 
LateralViewJoinOperator.
But ColumnPrunerLateralViewJoinProc changed the order of outputInternalColNames 
but still keep the original order of rowSchema,
Which cause the mismatch between serialization and deserialization for two 
back-to-back MR jobs.
Similar issue for ColumnPrunerLateralViewForwardProc which change the column 
order of its child selector colList but not rowSchema.
The exception is 
{code}
Caused by: java.lang.ArrayIndexOutOfBoundsException: 875968094
        at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:78)
        at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(LazyBinaryDouble.java:43)
        at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264)
        at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201)
        at 
org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
        at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
        at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
        at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
        at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:554)
        at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:381)
{code}

  was:
Unexpected java.lang.ArrayIndexOutOfBoundsException from query. It happened in 
LaterView Operation. It happened for hive-on-mr. The reason is because the 
column prune change the column order in LaterView operation, for back-back 
reducesink operators using MR engine, FileSinkOperator and TableScanOperator 
are added before the second ReduceSink operator, The serialization column order 
used by FileSinkOperator in LazyBinarySerDe of previous reducer is different 
from deserialization column order from table desc used by 
MapOperator/TableScanOperator in LazyBinarySerDe of current failed mapper.
The serialization is decided by the outputObjInspector from 
LateralViewJoinOperator,
{code}
    ArrayList<String> fieldNames = conf.getOutputInternalColNames();
    outputObjInspector = ObjectInspectorFactory
        .getStandardStructObjectInspector(fieldNames, ois);
{code}
So the column order for serialization is decided by getOutputInternalColNames 
in LateralViewJoinOperator.

The deserialization is decided by TableScanOperator which is created at  
GenMapRedUtils.splitTasks. 
{code}
    TableDesc tt_desc = PlanUtils.getIntermediateFileTableDesc(PlanUtils
        .getFieldSchemasFromRowSchema(parent.getSchema(), "temporarycol"));
    // Create the temporary file, its corresponding FileSinkOperaotr, and
    // its corresponding TableScanOperator.
    TableScanOperator tableScanOp =
        createTemporaryFile(parent, op, taskTmpDir, tt_desc, parseCtx);
{code}
The column order for deserialization is decided by rowSchema of 
LateralViewJoinOperator.
But ColumnPrunerLateralViewJoinProc changed the order of outputInternalColNames 
but still keep the original order of rowSchema,
Which cause the mismatch between serialization and deserialization for two 
back-to-back MR jobs.
Similar issue for ColumnPrunerLateralViewForwardProc which change the column 
order of its child selector colList but not rowSchema.


> Unexpected java.lang.ArrayIndexOutOfBoundsException from query with LaterView 
> Operation for hive on MR.
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16368
>                 URL: https://issues.apache.org/jira/browse/HIVE-16368
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>
> Unexpected java.lang.ArrayIndexOutOfBoundsException from query. It happened 
> in LaterView Operation. It happened for hive-on-mr. The reason is because the 
> column prune change the column order in LaterView operation, for back-back 
> reducesink operators using MR engine, FileSinkOperator and TableScanOperator 
> are added before the second ReduceSink operator, The serialization column 
> order used by FileSinkOperator in LazyBinarySerDe of previous reducer is 
> different from deserialization column order from table desc used by 
> MapOperator/TableScanOperator in LazyBinarySerDe of current failed mapper.
> The serialization is decided by the outputObjInspector from 
> LateralViewJoinOperator,
> {code}
>     ArrayList<String> fieldNames = conf.getOutputInternalColNames();
>     outputObjInspector = ObjectInspectorFactory
>         .getStandardStructObjectInspector(fieldNames, ois);
> {code}
> So the column order for serialization is decided by getOutputInternalColNames 
> in LateralViewJoinOperator.
> The deserialization is decided by TableScanOperator which is created at  
> GenMapRedUtils.splitTasks. 
> {code}
>     TableDesc tt_desc = PlanUtils.getIntermediateFileTableDesc(PlanUtils
>         .getFieldSchemasFromRowSchema(parent.getSchema(), "temporarycol"));
>     // Create the temporary file, its corresponding FileSinkOperaotr, and
>     // its corresponding TableScanOperator.
>     TableScanOperator tableScanOp =
>         createTemporaryFile(parent, op, taskTmpDir, tt_desc, parseCtx);
> {code}
> The column order for deserialization is decided by rowSchema of 
> LateralViewJoinOperator.
> But ColumnPrunerLateralViewJoinProc changed the order of 
> outputInternalColNames but still keep the original order of rowSchema,
> Which cause the mismatch between serialization and deserialization for two 
> back-to-back MR jobs.
> Similar issue for ColumnPrunerLateralViewForwardProc which change the column 
> order of its child selector colList but not rowSchema.
> The exception is 
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 875968094
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:78)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(LazyBinaryDouble.java:43)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201)
>       at 
> org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
>       at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
>       at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
>       at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
>       at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:554)
>       at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:381)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to