[ 
https://issues.apache.org/jira/browse/FLINK-11899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingsong Lee updated FLINK-11899:
---------------------------------
    Description: 
VectorizedParquetInputFormat is introduced to read parquet data in batches.

When returning each row of data, instead of actually retrieving each field, we 
use BaseRow's abstraction to return a Columnar Row-like view.

This will greatly improve the downstream filtered scenarios, so that there is 
no need to access redundant fields on the filtered data.

  was:
Vectorized Column Row Input Parquet Format is introduced to read parquet data 
in batches.

When returning each row of data, instead of actually retrieving each field, we 
use BaseRow's abstraction to return a Columnar Row-like view.

This will greatly improve the downstream filtered scenarios, so that there is 
no need to access redundant fields on the filtered data.


> Introduce vectorized parquet InputFormat for blink runtime
> ----------------------------------------------------------
>
>                 Key: FLINK-11899
>                 URL: https://issues.apache.org/jira/browse/FLINK-11899
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Operators
>            Reporter: Jingsong Lee
>            Assignee: Jingsong Lee
>            Priority: Major
>
> VectorizedParquetInputFormat is introduced to read parquet data in batches.
> When returning each row of data, instead of actually retrieving each field, 
> we use BaseRow's abstraction to return a Columnar Row-like view.
> This will greatly improve the downstream filtered scenarios, so that there is 
> no need to access redundant fields on the filtered data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to