[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

GitBox Sun, 10 Oct 2021 19:11:09 -0700


xiarixiaoyao commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r725746415




##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeCompactedRecordReader.java
##########
@@ -102,97 +102,107 @@ private HoodieMergedLogRecordScanner 
getMergedLogRecordScanner() throws IOExcept
 
   @Override
   public boolean next(NullWritable aVoid, ArrayWritable arrayWritable) throws 
IOException {
+    // deal with DeltaOnlySplits
+    if (logReader.isPresent()) {
+      return logReader.get().next(aVoid, arrayWritable);
+    }
     // Call the underlying parquetReader.next - which may replace the passed 
in ArrayWritable
     // with a new block of values
-    while (this.parquetReader.next(aVoid, arrayWritable)) {
-      if (!deltaRecordMap.isEmpty()) {
-        String key = arrayWritable.get()[recordKeyIndex].toString();
-        if (deltaRecordMap.containsKey(key)) {
-          // mark the key as handled
-          this.deltaRecordKeys.remove(key);
-          // TODO(NA): Invoke preCombine here by converting arrayWritable to 
Avro. This is required since the
-          // deltaRecord may not be a full record and needs values of columns 
from the parquet
-          Option<GenericRecord> rec = 
buildGenericRecordwithCustomPayload(deltaRecordMap.get(key));
-          // If the record is not present, this is a delete record using an 
empty payload so skip this base record
-          // and move to the next record
-          if (!rec.isPresent()) {
-            continue;
+    boolean result = this.parquetReader.next(aVoid, arrayWritable);
+    if (!result) {
+      // if the result is false, then there are no more records
+      return false;
+    }
+    if (!deltaRecordMap.isEmpty()) {
+      // TODO(VC): Right now, we assume all records in log, have a matching 
base record. (which
+      // would be true until we have a way to index logs too)

Review comment:
       I need to clarify two points on this issue。
   first： those code is produced by revert [HUDI-1969] Support reading logs for 
MOR Hive rt table (#3033) and those code are original hudi kernal code。
   
   second： At present, we cannot distinguish between the newly added data and 
the updated data in a log file， so hbase index will not be the case。   
HUDI-1969 try to solve this problem but introduce a new bug。 see 
https://github.com/apache/hudi/pull/3203#issuecomment-927820027
   
   @nsivabalan what's your suggest about hbase index， maybe we cannot handle 
this case util we have a way to index logs




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

Reply via email to