parquet] Supports nested projection pushdown for filesystem connector of columnar format

GitBox Wed, 04 Jan 2023 22:55:42 -0800


yuchuanchen commented on code in PR #21563:
URL: https://github.com/apache/flink/pull/21563#discussion_r1062158485



##########
flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetVectorizedInputFormat.java:
##########
@@ -123,7 +151,7 @@ public ParquetReader createReader(final Configuration 
config, final SplitT split
         FilterCompat.Filter filter = getFilter(hadoopConfig.conf());
         List<BlockMetaData> blocks = filterRowGroups(filter, 
footer.getBlocks(), fileSchema);
 
-        MessageType requestedSchema = clipParquetSchema(fileSchema);
+        MessageType requestedSchema = clipParquetSchema(fileSchema, 
builtProjectedRowType);

Review Comment:
   ParquetFileReader truly reads all children fields of `s` in 
readNextRowGroup(). But the parquet vectorized reader only reads `p1, s_f4, 
s_f2.q1, s_f3` from ParquetFileReader.
   
   However, The requestedSchema here should only contains necessary columns. 
`f1` should be exclude here. Then ParquetFileReader only reads the pages 
contain `s_f4, s_f2.q1, s_f3`. I will fix this later.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] yuchuanchen commented on a diff in pull request #21563: [FLINK-19889][connectors/hive/filesystem][format/parquet] Supports nested projection pushdown for filesystem connector of columnar format

Reply via email to