Re: [PR] [HUDI-9205] Introduce a representative file containing the estimated total size of file slice [hudi]

via GitHub Sat, 05 Apr 2025 10:21:22 -0700


TheR1sing3un commented on code in PR #13070:
URL: https://github.com/apache/hudi/pull/13070#discussion_r2025922724



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala:
##########
@@ -178,8 +183,8 @@ class 
HoodieFileGroupReaderBasedParquetFileFormat(tablePath: String,
                 internalSchemaOpt,
                 metaClient,
                 props,
-                file.start,
-                file.length,
+                0,

Review Comment:
   > can you elaborate this change?
   
   <img width="927" alt="image" 
src="https://github.com/user-attachments/assets/2315313f-cb52-42f0-b02e-4f0eb5e3c325";
 />
   These two arguments are provided to the file group reader to tell it the 
start location and length of the base file. This value used to be taken 
directly from the `PartitiondFile` because when there is a base file in the 
file slice, The actual size of the base file is used as the length of the 
`PartitiondFile`. Now the `PartitiondFile` is a representative file of the file 
slice and is not the length of the actual base file, so we need to get the 
actual length out of the base file



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-9205] Introduce a representative file containing the estimated total size of file slice [hudi]

Reply via email to