[ 
https://issues.apache.org/jira/browse/HIVE-21966?focusedWorklogId=273228&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-273228
 ]

ASF GitHub Bot logged work on HIVE-21966:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Jul/19 11:17
            Start Date: 08/Jul/19 11:17
    Worklog Time Spent: 10m 
      Work Description: sankarh commented on pull request #706: HIVE-21966: 
Llap external client - Arrow Serializer throws ArrayIndexOutOfBoundsException
URL: https://github.com/apache/hive/pull/706#discussion_r301039096
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/io/arrow/Serializer.java
 ##########
 @@ -365,27 +369,74 @@ private void writeStruct(NonNullableStructVector 
arrowVector, StructColumnVector
     }
   }
 
-  private void writeList(ListVector arrowVector, ListColumnVector hiveVector, 
ListTypeInfo typeInfo, int size,
+    // selected[] points to the valid/filtered/selected records at row level.
+    // for MultiValuedColumnVector such as ListColumnVector one record of 
vector points to multiple nested records.
+    // In child vectors we get these records in exploded manner i.e. the 
number of records in child vectors can have size more
+    // than actual the VectorizedRowBatch, consequently selected[] also needs 
to be readjusted.
+    // This method creates a shallow copy of VectorizedRowBatch with corrected 
size and selected[]
+
+    private static VectorizedRowBatch 
correctSelectedAndSize(VectorizedRowBatch sourceVrb,
+                                                             ListColumnVector 
listColumnVector) {
 
 Review comment:
   Should we use MultiValuedColumnVector instead of ListColumnVector?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 273228)
    Time Spent: 40m  (was: 0.5h)

> Llap external client - Arrow Serializer throws ArrayIndexOutOfBoundsException 
> in some cases
> -------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21966
>                 URL: https://issues.apache.org/jira/browse/HIVE-21966
>             Project: Hive
>          Issue Type: Bug
>          Components: llap, Serializers/Deserializers
>    Affects Versions: 3.1.1
>            Reporter: Shubham Chaurasia
>            Assignee: Shubham Chaurasia
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-21966.1.patch
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> When we submit query through llap-ext-client, arrow serializer throws 
> ArrayIndexOutOfBoundsException when 1),  2) and 3) below are satisfied.
> 1) {{hive.vectorized.execution.filesink.arrow.native.enabled=true}} to take 
> arrow serializer code path.
> 2) Query contains a filter or limit clause which enforces 
> {{VectorizedRowBatch#selectedInUse=true}}
> 3) Projection involves a column of type {{MultiValuedColumnVector}}.
> Sample stacktrace:
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 150
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeGeneric(Serializer.java:679)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writePrimitive(Serializer.java:518)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:276)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:342)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:282)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:365)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:279)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:199)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:135)
>       ... 30 more
> {code}
> It can be reproduced by:
> from beeline:
> {code}
> CREATE TABLE complex_tbl(c1 array<struct<f1:string,f2:string>>) STORED AS ORC;
> INSERT INTO complex_tbl SELECT ARRAY(NAMED_STRUCT('f1','v11', 'f2','v21'), 
> NAMED_STRUCT('f1','v21', 'f2','v22'));
> {code}
> and when we fire query: {{select * from complex_tbl limit 1}} through 
> llap-ext-client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to