[ 
https://issues.apache.org/jira/browse/HIVE-23022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Chaurasia updated HIVE-23022:
-------------------------------------
    Description: 
Arrow deserializer - {{org.apache.hadoop.hive.ql.io.arrow.Deserializer}} in 
some cases does not set the size of hive vector correctly. Size of hive vector 
should be set at least equal to arrow vector to be able to read (accommodate) 
it fully.

Following exception can be seen when we try to read some table which contains 
complex types (struct nested in list to be specific) and number of rows in 
table is more than default (1024) batch/vector size.

{code:java}
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
  at 
org.apache.hadoop.hive.ql.io.arrow.Deserializer.readStruct(Deserializer.java:440)
  at org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:143)
  at 
org.apache.hadoop.hive.ql.io.arrow.Deserializer.readList(Deserializer.java:394)
  at org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:137)
  at 
org.apache.hadoop.hive.ql.io.arrow.Deserializer.deserialize(Deserializer.java:122)
  at 
org.apache.hadoop.hive.ql.io.arrow.ArrowColumnarBatchSerDe.deserialize(ArrowColumnarBatchSerDe.java:284)
  at 
org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:75)
  ... 23 more
{code}




  was:
Arrow deserializer - {{org.apache.hadoop.hive.ql.io.arrow.Deserializer}} in 
some cases does not set the size of hive vector correctly. Size of hive vector 
should be set at least equal to arrow vector to be able to read (accommodate) 
it fully.

Following exception can be seen when we try to read some table which contains 
complex types (struct nested in list to be specific) and table size is more 
than default (1024) batch/vector size.

{code:java}
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
  at 
org.apache.hadoop.hive.ql.io.arrow.Deserializer.readStruct(Deserializer.java:440)
  at org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:143)
  at 
org.apache.hadoop.hive.ql.io.arrow.Deserializer.readList(Deserializer.java:394)
  at org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:137)
  at 
org.apache.hadoop.hive.ql.io.arrow.Deserializer.deserialize(Deserializer.java:122)
  at 
org.apache.hadoop.hive.ql.io.arrow.ArrowColumnarBatchSerDe.deserialize(ArrowColumnarBatchSerDe.java:284)
  at 
org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:75)
  ... 23 more
{code}





> Arrow deserializer should ensure size of hive vector equal to arrow vector
> --------------------------------------------------------------------------
>
>                 Key: HIVE-23022
>                 URL: https://issues.apache.org/jira/browse/HIVE-23022
>             Project: Hive
>          Issue Type: Bug
>          Components: llap, Serializers/Deserializers
>            Reporter: Shubham Chaurasia
>            Assignee: Shubham Chaurasia
>            Priority: Major
>
> Arrow deserializer - {{org.apache.hadoop.hive.ql.io.arrow.Deserializer}} in 
> some cases does not set the size of hive vector correctly. Size of hive 
> vector should be set at least equal to arrow vector to be able to read 
> (accommodate) it fully.
> Following exception can be seen when we try to read some table which contains 
> complex types (struct nested in list to be specific) and number of rows in 
> table is more than default (1024) batch/vector size.
> {code:java}
>     Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Deserializer.readStruct(Deserializer.java:440)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:143)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Deserializer.readList(Deserializer.java:394)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:137)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Deserializer.deserialize(Deserializer.java:122)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.ArrowColumnarBatchSerDe.deserialize(ArrowColumnarBatchSerDe.java:284)
>   at 
> org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:75)
>   ... 23 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to