Re: [PR] [HUDI-8552] Add a new merge handle based on file group reader for compaction in Spark [hudi]

via GitHub Mon, 02 Dec 2024 17:16:08 -0800


yihua commented on code in PR #12390:
URL: https://github.com/apache/hudi/pull/12390#discussion_r1866842950



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/HoodieSparkCopyOnWriteTable.java:
##########
@@ -264,6 +269,24 @@ public Iterator<List<WriteStatus>> handleInsert(
     return Collections.singletonList(createHandle.close()).iterator();
   }
 
+  @Override
+  public boolean supportsFileGroupReader() {
+    return true;
+  }
+
+  @Override
+  public List<WriteStatus> runCompactionUsingFileGroupReader(String 
instantTime,

Review Comment:
   The existing implementation of compaction uses 
`HoodieSparkCopyOnWriteTable#handleUpdate` which implements 
`HoodieCompactionHandler#handleUpdate`.  That's because 
`HoodieSparkMergeOnReadTable` uses `HoodieSparkCopyOnWriteTable` instance for 
compaction the following logic:
   ```
   @Override
     public HoodieWriteMetadata<HoodieData<WriteStatus>> compact(
         HoodieEngineContext context, String compactionInstantTime) {
       RunCompactionActionExecutor<T> compactionExecutor = new 
RunCompactionActionExecutor<>(
           context, config, this, compactionInstantTime, new 
HoodieSparkMergeOnReadTableCompactor<>(),
           new HoodieSparkCopyOnWriteTable<>(config, context, getMetaClient()), 
WriteOperationType.COMPACT);
       return compactionExecutor.execute();
     }
   ```
   So for parity I have to add the same here to not break any contract around 
`HoodieTable`.  I think we need to revisit this; the use of 
`HoodieSparkCopyOnWriteTable` instance in `HoodieSparkMergeOnReadTable` is 
counter-intuitive.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-8552] Add a new merge handle based on file group reader for compaction in Spark [hudi]

Reply via email to