yuchuanchen commented on code in PR #21563: URL: https://github.com/apache/flink/pull/21563#discussion_r1062158485
########## flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetVectorizedInputFormat.java: ########## @@ -123,7 +151,7 @@ public ParquetReader createReader(final Configuration config, final SplitT split FilterCompat.Filter filter = getFilter(hadoopConfig.conf()); List<BlockMetaData> blocks = filterRowGroups(filter, footer.getBlocks(), fileSchema); - MessageType requestedSchema = clipParquetSchema(fileSchema); + MessageType requestedSchema = clipParquetSchema(fileSchema, builtProjectedRowType); Review Comment: ParquetFileReader truly reads all children fields of `s` in readNextRowGroup(). But the parquet vectorized reader only reads `p1, s_f4, s_f2.q1, s_f3` from ParquetFileReader. However, The requestedSchema here should only contains necessary columns. `f1` should be exclude here. Then ParquetFileReader only reads the pages contain `s_f4, s_f2.q1, s_f3`. I will fix this later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org