Re: [PR] [HUDI-9469] Improve performance for getting hoodie record / record ke… [hudi]

via GitHub Wed, 28 May 2025 19:57:47 -0700


cshuo commented on code in PR #13366:
URL: https://github.com/apache/hudi/pull/13366#discussion_r2113078643



##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##########
@@ -181,17 +176,15 @@ private FileGroupRecordBuffer<T> 
getRecordBuffer(HoodieReaderContext<T> readerCo
                                                    RecordMergeMode 
recordMergeMode,
                                                    TypedProperties props,
                                                    Option<HoodieBaseFile> 
baseFileOption,
-                                                   boolean hasNoLogFiles,
+                                                   boolean hasLogFiles,
                                                    boolean isSkipMerge,
                                                    boolean 
shouldUseRecordPosition,
                                                    HoodieReadStats readStats,
                                                    boolean emitDelete) {
-    if (hasNoLogFiles) {
-      return null;
-    } else if (isSkipMerge) {
+    if (isSkipMerge) {

Review Comment:
   Currently, besides the engine-row iterator, FG reader can also get
   * HoodieRecord Iterator: mapping from engine-row iterator, additional 
conversion: `T -> BufferedRecord<T> -> HoodieRecord<T>`
   * Record key iterator: mapping from engine-row iterator, `T -> Record Key`
   
   To avoid additional conversion `T -> BufferedRecord<T>` and `T -> Record 
Key`, we can expose iterator of `BufferedRecord` in `FileGroupRecordBuffer` 
instead, which contains necessary messages for constructing all these three 
kind iterators, that's the general idea for the improvement.
   
   What you mentioned is a bad case : no log files + engine-row iterator, the 
current implementation incurs additional overhead of creating the buffered 
record. 
   To solve the problem, my preliminary idea is using a singleton 
`BufferedRecord` object as a reused wrapper for the engine-row iterator, which 
can avoid additional costs of creating `BufferedRecord` per record. 
   
   WDYT? cc @the-other-tim-brown @danny0405 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-9469] Improve performance for getting hoodie record / record ke… [hudi]

Reply via email to