[ 
https://issues.apache.org/jira/browse/HIVE-18411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321722#comment-16321722
 ] 

Colin Ma commented on HIVE-18411:
---------------------------------

[~Ferd], here is the situation, the CombineHiveInputSplit have several files 
for one task, and several files will share the same VectorizedRowBatch. At 
first, the size of VectorizedRowBatch will be initialized to 1024. The actual 
size will be updated in the method setChildrenInfo().
Here is the point, the last VectorizedRowBatch of one split is always < 1024, 
and the VectorizedRowBatch whose size is less than 1024 will be used for the 
next split, then, the ArrayIndexOutOfBoundsException is thrown in the following 
line:
{code}
private void addElement(ListColumnVector lcv, List<Object> elements, 
PrimitiveObjectInspector.PrimitiveCategory category, int index) throws 
IOException {
    lcv.offsets[index] = elements.size(); // ArrayIndexOutOfBoundsException 
will be thrown here!

    // Return directly if last value is null
    if (definitionLevel < maxDefLevel) {
{code}

> Fix ArrayIndexOutOfBoundsException for VectorizedListColumnReader
> -----------------------------------------------------------------
>
>                 Key: HIVE-18411
>                 URL: https://issues.apache.org/jira/browse/HIVE-18411
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Colin Ma
>            Assignee: Colin Ma
>            Priority: Critical
>         Attachments: HIVE-18411.001.patch
>
>
> ColumnVector should be initialized to the default size at the begin of 
> readBatch(), otherwise, ArrayIndexOutOfBoundsException will be thrown because 
> the size of ColumnVector may be updated in the last readBatch().



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to