nsivabalan commented on code in PR #17601:
URL: https://github.com/apache/hudi/pull/17601#discussion_r2820238568
##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java:
##########
@@ -182,6 +186,8 @@ protected AbstractHoodieLogRecordReader(HoodieStorage
storage, String basePath,
this.forceFullScan = forceFullScan;
this.internalSchema = internalSchema == null ?
InternalSchema.getEmptyInternalSchema() : internalSchema;
this.enableOptimizedLogBlocksScan = enableOptimizedLogBlocksScan;
+ this.enableLogicalTimestampFieldRepair =
storage.getConf().getBoolean(HoodieFileReader.ENABLE_LOGICAL_TIMESTAMP_REPAIR,
Review Comment:
hey @yihua :
in 1.x, we have FileGroupReader abstraction and hence we had reader context
where we pass some config values from driver to executor. but in 0.x, for log
record reader, we do not have any such medium through which we can pass in
adhoc configs.
for eg, value for `HoodieFileReader.ENABLE_LOGICAL_TIMESTAMP_REPAIR`.
Even we do not have arguments like map<String, String> or even hadoop conf
that we pass to log record reader
https://github.com/apache/hudi/blob/164b35a79e6decb868780964b2bdac1fc35f23b7/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java#L479
So, this makes it cumbersome to share some info from driver to executor.
hence @linliu-code resorted to computing this value w/n spark tasks. which
makes the computation repetitive.
do you think this is worth adding hadoop conf or some kind of params
(map<String, String>) as an argument to HoodieMergedLogRecordScanner and
AbstractHoodieLogRecordReader.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]