Sun Shun created FLINK-29527:
--------------------------------
Summary: Make unknownFieldsIndices work for single ParquetReader
Key: FLINK-29527
URL: https://issues.apache.org/jira/browse/FLINK-29527
Project: Flink
Issue Type: Bug
Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.16.0
Reporter: Sun Shun
Currently, from the improvement [[FLINK-23715], Flink use a collection named
`unknownFieldsIndices` to track the nonexistent fields, and it is kept inside
the `ParquetVectorizedInputFormat`, and applied to all parquet files under
given path.
However, some fields may only be nonexistent in some of the historical parquet
files, while exist in latest ones. And based on `unknownFieldsIndices`, flink
will always skip these fields, even thought they are existing in the later
parquets.
As a result, the value of these fields will become empty when they are
nonexistent in some historical parquet files.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)