[ https://issues.apache.org/jira/browse/HUDI-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lin Liu reassigned HUDI-6786: ----------------------------- Assignee: Lin Liu (was: Jonathan Vexler) > Integrate FileGroupReader with NewHoodieParquetFileFormat for Spark MOR > Snapshot Query > -------------------------------------------------------------------------------------- > > Key: HUDI-6786 > URL: https://issues.apache.org/jira/browse/HUDI-6786 > Project: Apache Hudi > Issue Type: New Feature > Reporter: Ethan Guo > Assignee: Lin Liu > Priority: Blocker > Labels: pull-request-available > Fix For: 1.0.0 > > > Goal: When `NewHoodieParquetFileFormat` is enabled with > `hoodie.datasource.read.use.new.parquet.file.format=true` on Spark, the MOR > Snapshot query should use HoodieFileGroupReader. All relevant tests on basic > MOR snapshot query should pass (except for the caveats in the current > HoodieFileGroupReader, see other open tickets around HoodieFileGroupReader in > this EPIC). > The query logic is implemented in > `NewHoodieParquetFileFormat#buildReaderWithPartitionValues`; see the > following code for MOR snapshot query: > {code:java} > else { > if (logFiles.nonEmpty) { > val baseFile = createPartitionedFile(InternalRow.empty, > hoodieBaseFile.getHadoopPath, 0, hoodieBaseFile.getFileLen) > buildMergeOnReadIterator(preMergeBaseFileReader(baseFile), logFiles, > filePath.getParent, requiredSchemaWithMandatory, > requiredSchemaWithMandatory, outputSchema, partitionSchema, > partitionValues, broadcastedHadoopConf.value.value) > } else { > throw new IllegalStateException("should not be here since file slice > should not have been broadcasted since it has no log or data files") > //baseFileReader(baseFile) > } {code} > `buildMergeOnReadIterator` should be replaced by `HoodieFileGroupReader`, > with a new config `hoodie.read.use.new.file.group.reader`, by passing in the > correct base and log file list. -- This message was sent by Atlassian Jira (v8.20.10#820010)