[
https://issues.apache.org/jira/browse/HIVE-21966?focusedWorklogId=273252&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-273252
]
ASF GitHub Bot logged work on HIVE-21966:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 08/Jul/19 12:10
Start Date: 08/Jul/19 12:10
Worklog Time Spent: 10m
Work Description: ShubhamChaurasia commented on pull request #706:
HIVE-21966: Llap external client - Arrow Serializer throws
ArrayIndexOutOfBoundsException
URL: https://github.com/apache/hive/pull/706#discussion_r301059022
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/arrow/Serializer.java
##########
@@ -365,27 +369,74 @@ private void writeStruct(NonNullableStructVector
arrowVector, StructColumnVector
}
}
- private void writeList(ListVector arrowVector, ListColumnVector hiveVector,
ListTypeInfo typeInfo, int size,
+ // selected[] points to the valid/filtered/selected records at row level.
+ // for MultiValuedColumnVector such as ListColumnVector one record of
vector points to multiple nested records.
+ // In child vectors we get these records in exploded manner i.e. the
number of records in child vectors can have size more
+ // than actual the VectorizedRowBatch, consequently selected[] also needs
to be readjusted.
+ // This method creates a shallow copy of VectorizedRowBatch with corrected
size and selected[]
+
+ private static VectorizedRowBatch
correctSelectedAndSize(VectorizedRowBatch sourceVrb,
+ ListColumnVector
listColumnVector) {
Review comment:
done
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 273252)
Time Spent: 50m (was: 40m)
> Llap external client - Arrow Serializer throws ArrayIndexOutOfBoundsException
> in some cases
> -------------------------------------------------------------------------------------------
>
> Key: HIVE-21966
> URL: https://issues.apache.org/jira/browse/HIVE-21966
> Project: Hive
> Issue Type: Bug
> Components: llap, Serializers/Deserializers
> Affects Versions: 3.1.1
> Reporter: Shubham Chaurasia
> Assignee: Shubham Chaurasia
> Priority: Major
> Labels: pull-request-available
> Attachments: HIVE-21966.1.patch, HIVE-21966.2.patch
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> When we submit query through llap-ext-client, arrow serializer throws
> ArrayIndexOutOfBoundsException when 1), 2) and 3) below are satisfied.
> 1) {{hive.vectorized.execution.filesink.arrow.native.enabled=true}} to take
> arrow serializer code path.
> 2) Query contains a filter or limit clause which enforces
> {{VectorizedRowBatch#selectedInUse=true}}
> 3) Projection involves a column of type {{MultiValuedColumnVector}}.
> Sample stacktrace:
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 150
> at
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeGeneric(Serializer.java:679)
> at
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writePrimitive(Serializer.java:518)
> at
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:276)
> at
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:342)
> at
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:282)
> at
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:365)
> at
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:279)
> at
> org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:199)
> at
> org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:135)
> ... 30 more
> {code}
> It can be reproduced by:
> from beeline:
> {code}
> CREATE TABLE complex_tbl(c1 array<struct<f1:string,f2:string>>) STORED AS ORC;
> INSERT INTO complex_tbl SELECT ARRAY(NAMED_STRUCT('f1','v11', 'f2','v21'),
> NAMED_STRUCT('f1','v21', 'f2','v22'));
> {code}
> and when we fire query: {{select * from complex_tbl limit 1}} through
> llap-ext-client.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)