[ 
https://issues.apache.org/jira/browse/FLINK-11899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated FLINK-11899:
---------------------------------
    Description: 
Parquet ColumnarRow split reader is introduced to read parquet data in batches.

When returning each row of data, instead of actually retrieving each field, we 
use BaseRow's abstraction to return a Columnar Row-like view.

This will greatly improve the downstream filtered scenarios, so that there is 
no need to access redundant fields on the filtered data.

  was:
VectorizedParquetInputFormat is introduced to read parquet data in batches.

When returning each row of data, instead of actually retrieving each field, we 
use BaseRow's abstraction to return a Columnar Row-like view.

This will greatly improve the downstream filtered scenarios, so that there is 
no need to access redundant fields on the filtered data.


> Introduce parquet ColumnarRow split reader
> ------------------------------------------
>
>                 Key: FLINK-11899
>                 URL: https://issues.apache.org/jira/browse/FLINK-11899
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table SQL / Runtime
>            Reporter: Jingsong Lee
>            Assignee: Jingsong Lee
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.11.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Parquet ColumnarRow split reader is introduced to read parquet data in 
> batches.
> When returning each row of data, instead of actually retrieving each field, 
> we use BaseRow's abstraction to return a Columnar Row-like view.
> This will greatly improve the downstream filtered scenarios, so that there is 
> no need to access redundant fields on the filtered data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to