Re: [PR] [HUDI-8552] Add a new merge handle based on file group reader for compaction in Spark [hudi]

via GitHub Mon, 02 Dec 2024 16:24:54 -0800


yihua commented on code in PR #12390:
URL: https://github.com/apache/hudi/pull/12390#discussion_r1866796877



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java:
##########
@@ -161,66 +162,70 @@ public List<WriteStatus> compact(HoodieCompactionHandler 
compactionHandler,
                                    Option<InstantRange> instantRange,
                                    TaskContextSupplier taskContextSupplier,
                                    CompactionExecutionHelper executionHelper) 
throws IOException {
-    HoodieStorage storage = metaClient.getStorage();
-    Schema readerSchema;
-    Option<InternalSchema> internalSchemaOption = Option.empty();
-    if (!StringUtils.isNullOrEmpty(config.getInternalSchema())) {
-      readerSchema = new Schema.Parser().parse(config.getSchema());
-      internalSchemaOption = SerDeHelper.fromJson(config.getInternalSchema());
-      // its safe to modify config here, since we are running in task side.
-      ((HoodieTable) compactionHandler).getConfig().setDefault(config);
+    if 
(config.getBooleanOrDefault(HoodieReaderConfig.FILE_GROUP_READER_ENABLED)
+        && compactionHandler.supportsFileGroupReader()) {

Review Comment:
   I agree with @vinothchandar that major use cases of snapshot queries already 
work with file group reader.  There are cases that might have issues, e.g., 
MDT, schema on read, etc., which I have explicitly turned off the new feature 
to let them go through the old flow of compaction.
   
   @danny0405 yes, if there is any regression in Spark, user can turn off fg 
reader based compaction by turning off 
`HoodieReaderConfig.FILE_GROUP_READER_ENABLED` / 
`hoodie.file.group.reader.enabled`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-8552] Add a new merge handle based on file group reader for compaction in Spark [hudi]

Reply via email to