jonvex commented on code in PR #10137:
URL: https://github.com/apache/hudi/pull/10137#discussion_r1408151212


##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala:
##########
@@ -62,7 +63,7 @@ class SparkFileFormatInternalRowReaderContext(baseFileReader: 
Option[Partitioned
                                      requiredSchema: Schema,
                                      conf: Configuration): 
ClosableIterator[InternalRow] = {
     val fileInfo = sparkAdapter.getSparkPartitionedFileUtils
-      .createPartitionedFile(partitionValues, filePath, start, length)
+      .createPartitionedFile(InternalRow.empty, filePath, start, length)

Review Comment:
   Ok so part of this PR is to clean up the philosophy of the fg reader.
   We want to give the fg reader a requested schema, and we want the output to 
be an iterator with records in exactly that schema (maybe not with cdc, haven't 
given it much thought yet).
   spark parquet file reader will append the partition values to the end of 
each record.  Putting the logic for dealing with that inside of the fg reader 
is adding unnecessary stuff into the fg reader that is probably only relevant 
for spark.
   Therefore I think it makes sense for 
HoodieFileGroupReaderBasedParquetFileFormat to be responsible for appending the 
partition columns to the end. In addition to being better organized in my 
opinion, this will also be more performant because it prevents adding several 
calls that append the partition col and then project the column away. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to