[ 
https://issues.apache.org/jira/browse/HIVE-22856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22856:
---------------------------------------
    Description: 
LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
loadNextBatch returns column vector with 0 length. But we should keep reading 
data until loadNextBatch returns false. Some batch may return column vector of 
length 0, but we should ignore and wait for the next batch.

The batch size of 0 is possible in the case when a split read by ORC reader has 
all deleted or aborted data. The VectorizedOrcAcidRowBatchReader , reads the 
data from split info and then filters the rows which are not visible to the 
read transaction. So it may happen that, none of the records satisfy the 
filter. In that case VectorizedOrcAcidRowBatchReader sends a batch size of 0. 
With 0 batch size, VectorFileSinkArrowOperator creates a batch of just metadata 
and set the value count to 0. This kind of batch should be ignore by the client 
and should wait for next batch.

  was:
LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
loadNextBatch returns column vector with 0 length. But we should keep reading 
data until loadNextBatch returns false. Some batch may return column vector of 
length 0, but we should ignore and wait for the next batch.

The batch size of 0 is possible in the case when a split read by ORC reader has 
all deleted or aborted data. In that case VectorizedOrcAcidRowBatchReader sends 
a batch size of 0. With 0 batch size, VectorFileSinkArrowOperator creates a 
batch of just metadata and set the value count to 0. This kind of batch should 
be ignore by the client and should wait for next batch.


> Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when 
> ArrowStreamReader returns a 0 length batch.
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-22856
>                 URL: https://issues.apache.org/jira/browse/HIVE-22856
>             Project: Hive
>          Issue Type: Bug
>            Reporter: mahesh kumar behera
>            Assignee: mahesh kumar behera
>            Priority: Major
>         Attachments: HIVE-22856.01.patch, HIVE-22856.02.patch
>
>
> LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
> loadNextBatch returns column vector with 0 length. But we should keep reading 
> data until loadNextBatch returns false. Some batch may return column vector 
> of length 0, but we should ignore and wait for the next batch.
> The batch size of 0 is possible in the case when a split read by ORC reader 
> has all deleted or aborted data. The VectorizedOrcAcidRowBatchReader , reads 
> the data from split info and then filters the rows which are not visible to 
> the read transaction. So it may happen that, none of the records satisfy the 
> filter. In that case VectorizedOrcAcidRowBatchReader sends a batch size of 0. 
> With 0 batch size, VectorFileSinkArrowOperator creates a batch of just 
> metadata and set the value count to 0. This kind of batch should be ignore by 
> the client and should wait for next batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to