nsivabalan commented on code in PR #10122:
URL: https://github.com/apache/hudi/pull/10122#discussion_r1400959986


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/embedded/EmbeddedTimelineService.java:
##########
@@ -146,19 +214,65 @@ public FileSystemViewManager getViewManager() {
     return viewManager;
   }
 
-  public boolean canReuseFor(String basePath) {
-    return this.server != null
-        && this.viewManager != null
-        && this.basePath.equals(basePath);
+  /**
+   * Adds a new base path to the set that are managed by this instance.
+   * @param basePath the new base path to add
+   */
+  private void addBasePath(String basePath) {
+    basePaths.add(basePath);
+  }
+
+  private boolean canReuseFor(HoodieWriteConfig newWriteConfig, String 
newHostAddr) {
+    if (server == null || viewManager == null) {
+      return false; // service is not running
+    }
+    if (basePaths.contains(newWriteConfig.getBasePath())) {
+      return true; // already running for this base path
+    }
+    if (newHostAddr != null && !newHostAddr.equals(this.hostAddr)) {
+      return false; // different host address
+    }
+    if (writeConfig.getMarkersType() != newWriteConfig.getMarkersType()) {
+      return false; // different marker type
+    }
+    return 
metadataConfigsAreEquivalent(writeConfig.getMetadataConfig().getProps(), 
newWriteConfig.getMetadataConfig().getProps());

Review Comment:
   I thought about this and also request handlers that tim pointed out. Here is 
my take on this. 
   By default we are going to disable reuse. So, users should enable only on a 
need basis. I will add more documentation to call out that we expect some 
homoegenous set up to use this. 
   
   For eg, in multi-table deltastreamer, if user has 10 tables, all 10 of them 
is expected to have similar config values for other secondary configs like num 
threads w/ timeline server, marker batch num threads, marker parallelism etc. 
If the 10 tables are going to have slightly diff values for these secondary 
configs, its better for user to not use this feature only. 
   
   As of now, the key properties that will determine whether one can reuse an 
existing timeline server or not is 
   - hostAddr
   - metadata enable flag
   - early conflict detection enabled 
   - marker type 
   - isTimelineServerBasedInstantStateEnabled
   
   
   if these matches, hudi will reuse an existing timeline server. So the first 
one to instantiate the timeline server, the resp writeConfig will be 
applied(for sec configs like marker parallelism, threads, etc). 
   
   We can revisit this for key properties going forward. but may not be 
scalable/maintainable if we account for every minor config difference. 
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to