kbuci commented on code in PR #18295:
URL: https://github.com/apache/hudi/pull/18295#discussion_r2963481781


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -350,6 +383,28 @@ public static HoodieWriteConfig createMetadataWriteConfig(
     }
 
     HoodieWriteConfig metadataWriteConfig = builder.build();
+    if (mergeMetdataLockConfigAtEnd) {
+      // We need to update the MDT write config to have the same lock related 
configs as the data table.
+      // All data table props with the lock prefix are always copied (to 
override MDT defaults with
+      // user-configured values). Other data table props not present in MDT 
config are also copied to
+      // support custom lock providers that may use non-standard config keys.
+      Properties lockProps = new Properties();
+      TypedProperties dataTableProps = writeConfig.getProps();
+      TypedProperties mdtProps = metadataWriteConfig.getProps();
+      for (String key : dataTableProps.stringPropertyNames()) {
+        if (key.startsWith(LockConfiguration.LOCK_PREFIX) || 
!mdtProps.containsKey(key)) {

Review Comment:
   Oh writers to data table should not be setting 
METADATA_WRITE_CONCURRENCY_MODE , this should only be set by a table service 
user application that intends to execute compaction plans on the MDT (and does 
not hold any table lock while executing the plans) . Example usage would be 
   
   ```
   HoodieBackedTableMetadataWriter metadataTableWriter =
               (HoodieBackedTableMetadataWriter)
                   SparkHoodieBackedTableMetadataWriter.create(
                       getJavaSparkContext().hadoopConfiguration(),
                       writeConfig, // User specifies 
METADATA_WRITE_CONCURRENCY_MODE
                       dataTableWriteClient.getEngineContext());
   metaClient = metadataTableWriter.getMetadataMetaClient();
   writeClient = (SparkRDDWriteClient) metadataTableWriter.getWriteClient();
   
   final List<HoodieInstant> pendingCompactionInstants =
             
metaClient.getActiveTimeline().filterPendingCompactionTimeline().getInstants();
   for (HoodieInstant pendingCompactionInstant : pendingCompactionInstants) {
           writeClient.compact(pendingCompactionInstant.getTimestamp());
   ```
   
   I'm open to making a wrapper API in HUDI lib itself for that. But even if we 
do that, I can't see a way around createMetadataWriteConfig , except maybe 
creating a new function like 
`createMetadataWriteConfigForTableServiceExecution` and then creating a new 
static helper function in `HoodieBackedTableMetadataWriter` that uses that API 
to execute pending table service plans in MDT? 
   
   At the end of the day our core problem is that we want a way to have a 
concurrent writer execute pending compaction plans on MDT while making sure 
that the data table lock is
   - The data table lock is held during the necessary checks (starting 
heartbeat, transitioning instant states, committing the compaction, etc)
   - . . . but not held the whole time during plan execution (since that will 
block ingestion)
   So if there's a more ergonomic way to achieve that (other than this PR) then 
we should definitely consider it. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to