[
https://issues.apache.org/jira/browse/FLINK-11899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jingsong Lee updated FLINK-11899:
---------------------------------
Description:
VectorizedParquetInputFormat is introduced to read parquet data in batches.
When returning each row of data, instead of actually retrieving each field, we
use BaseRow's abstraction to return a Columnar Row-like view.
This will greatly improve the downstream filtered scenarios, so that there is
no need to access redundant fields on the filtered data.
was:
Vectorized Column Row Input Parquet Format is introduced to read parquet data
in batches.
When returning each row of data, instead of actually retrieving each field, we
use BaseRow's abstraction to return a Columnar Row-like view.
This will greatly improve the downstream filtered scenarios, so that there is
no need to access redundant fields on the filtered data.
> Introduce vectorized parquet InputFormat for blink runtime
> ----------------------------------------------------------
>
> Key: FLINK-11899
> URL: https://issues.apache.org/jira/browse/FLINK-11899
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Operators
> Reporter: Jingsong Lee
> Assignee: Jingsong Lee
> Priority: Major
>
> VectorizedParquetInputFormat is introduced to read parquet data in batches.
> When returning each row of data, instead of actually retrieving each field,
> we use BaseRow's abstraction to return a Columnar Row-like view.
> This will greatly improve the downstream filtered scenarios, so that there is
> no need to access redundant fields on the filtered data.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)