cshuo commented on code in PR #13366:
URL: https://github.com/apache/hudi/pull/13366#discussion_r2113078643
##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##########
@@ -181,17 +176,15 @@ private FileGroupRecordBuffer<T>
getRecordBuffer(HoodieReaderContext<T> readerCo
RecordMergeMode
recordMergeMode,
TypedProperties props,
Option<HoodieBaseFile>
baseFileOption,
- boolean hasNoLogFiles,
+ boolean hasLogFiles,
boolean isSkipMerge,
boolean
shouldUseRecordPosition,
HoodieReadStats readStats,
boolean emitDelete) {
- if (hasNoLogFiles) {
- return null;
- } else if (isSkipMerge) {
+ if (isSkipMerge) {
Review Comment:
Currently, besides the engine-row iterator, FG reader can also get
* HoodieRecord Iterator: mapping from engine-row iterator, additional
conversion: `T -> BufferedRecord<T> -> HoodieRecord<T>`
* Record key iterator: mapping from engine-row iterator, `T -> Record Key`
To avoid additional conversion `T -> BufferedRecord<T>` and `T -> Record
Key`, we can expose iterator of `BufferedRecord` in `FileGroupRecordBuffer`
instead, which contains necessary messages for constructing all these three
kind iterators, that's the general idea for the improvement.
What you mentioned is a bad case : no log files + engine-row iterator, the
current implementation incurs additional overhead of creating the buffered
record.
To solve the problem, my preliminary idea is using a singleton
`BufferedRecord` object as a reused wrapper for the engine-row iterator, which
can avoid additional costs of creating `BufferedRecord` per record.
WDYT? cc @the-other-tim-brown @danny0405
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]