Re: [PR] [HUDI-6786] HoodieFileGroupReader integration [hudi]

via GitHub Wed, 04 Oct 2023 03:01:02 -0700


codope commented on code in PR #9819:
URL: https://github.com/apache/hudi/pull/9819#discussion_r1345533536



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/NewHoodieParquetFileFormat.scala:
##########
@@ -155,9 +160,22 @@ class NewHoodieParquetFileFormat(tableState: 
Broadcast[HoodieTableState],
                   }
                 } else {
                   if (logFiles.nonEmpty) {
-                    val baseFile = createPartitionedFile(InternalRow.empty, 
hoodieBaseFile.getHadoopPath, 0, hoodieBaseFile.getFileLen)
-                    buildMergeOnReadIterator(preMergeBaseFileReader(baseFile), 
logFiles, filePath.getParent, requiredSchemaWithMandatory,
-                      requiredSchemaWithMandatory, outputSchema, 
partitionSchema, partitionValues, broadcastedHadoopConf.value.value)
+                    val hoodieReaderContext = new 
SparkFileFormatInternalRowReaderContext(
+                      sparkSession, this, broadcastedHadoopConf.value.value)
+                    val logFilesAsJava = logFiles.toStream.map(lf => 
lf.toString).toList.asJava
+                    val reader = new HoodieFileGroupReader[InternalRow](
+                      
hoodieReaderContext.asInstanceOf[HoodieReaderContext[InternalRow]],
+                      broadcastedHadoopConf.value.value,
+                      tableState.value.tablePath,
+                      FSUtils.getCommitTime(logFilesAsJava.get(0)),

Review Comment:
   This might change with the log file now having deltacommit time instead of 
base instant time.



##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##########
@@ -63,7 +64,7 @@
  * @param <T> The type of engine-specific record representation, e.g.,{@code 
InternalRow}
  *            in Spark and {@code RowData} in Flink.
  */
-public final class HoodieFileGroupReader<T> implements Closeable {
+public final class HoodieFileGroupReader<T> implements Closeable, Iterator {

Review Comment:
   Should reader implement Iterator? We can simply keep the list of merged 
records here and then get the iterator out of list at the call site. If you 
plan to make it an iterator, then implement `ClosableIterator` and rename as 
`HoodieFileGroupIterator`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-6786] HoodieFileGroupReader integration [hudi]

Reply via email to