rahil-c commented on code in PR #13591:
URL: https://github.com/apache/hudi/pull/13591#discussion_r2226630154
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala:
##########
@@ -217,6 +218,7 @@ class
HoodieFileGroupReaderBasedParquetFileFormat(tablePath: String,
.withStart(file.start)
.withLength(baseFileLength)
.withShouldUseRecordPosition(shouldUseRecordPosition)
+
.withEnableOptimizedLogBlockScan(props.getBoolean(HoodieReaderConfig.ENABLE_OPTIMIZED_LOG_BLOCKS_SCAN.key(),
HoodieReaderConfig.ENABLE_OPTIMIZED_LOG_BLOCKS_SCAN.defaultValue().toBoolean))
Review Comment:
@danny0405 does the field `TypedProperties props` always guaranteed to have
all the configs values from the `HoodieWriteConfig` or `HoodieMetadataConfig`?
If so then I think this fallback logic can work.
However if it does not, then I do not think we should be doing this in the
FileGroupReader builder, as this can cause the value of
`enableOptimizedLogBlockScan` to be incorrectly overwritten depending on the
invoker.
The field `enableOptimizedLogBlockScan` is a boolean that seems to be set in
the builder to `false`
`private boolean enableOptimizedLogBlockScan = false`, so I do not think we
can check for it being `null`.
If we do the check to see if its `false` this could be intentionally have
been set by the user in places where we have access to the
`HoodieMetadataConfig` or the `HoodieReaderConfig` and pass it in via
`.withEnableOptimizedLogBlockScan(config.enableOptimizedLogBlocksScan())`.
So doing something like this, could cause the default value for the config
to be used in the event the `props` was not correctly populated and did not
have the key.
`withEnableOptimizedLogBlockScan =
props.getBoolean(HoodieReaderConfig.ENABLE_OPTIMIZED_LOG_BLOCKS_SCAN.key(),
HoodieReaderConfig.ENABLE_OPTIMIZED_LOG_BLOCKS_SCAN.defaultValue().toBoolean)`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]