TheR1sing3un commented on code in PR #13070:
URL: https://github.com/apache/hudi/pull/13070#discussion_r2025922724
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala:
##########
@@ -178,8 +183,8 @@ class
HoodieFileGroupReaderBasedParquetFileFormat(tablePath: String,
internalSchemaOpt,
metaClient,
props,
- file.start,
- file.length,
+ 0,
Review Comment:
> can you elaborate this change?
<img width="927" alt="image"
src="https://github.com/user-attachments/assets/2315313f-cb52-42f0-b02e-4f0eb5e3c325"
/>
These two arguments are provided to the file group reader to tell it the
start location and length of the base file. This value used to be taken
directly from the `PartitiondFile` because when there is a base file in the
file slice, The actual size of the base file is used as the length of the
`PartitiondFile`. Now the `PartitiondFile` is a representative file of the file
slice and is not the length of the actual base file, so we need to get the
actual length out of the base file
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]