yihua commented on code in PR #18183:
URL: https://github.com/apache/hudi/pull/18183#discussion_r2796274963


##########
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java:
##########
@@ -1370,6 +1371,21 @@ public Map<String, String> propsMap() {
         .collect(Collectors.toMap(e -> String.valueOf(e.getKey()), e -> 
String.valueOf(e.getValue())));
   }
 
+  public static String getFileComment(Properties props) {
+    final long ts = System.currentTimeMillis();

Review Comment:
   `NetworkUtils.getHostname()` can throw `HoodieException` if network 
resolution fails (e.g., in sandboxed containers or restricted environments). 
Since this is called from `storeProperties`, a network hiccup would now prevent 
writing `hoodie.properties` — which is a regression since the comment is purely 
informational. Could you wrap the `getHostname()` call in a try-catch and fall 
back to `"unknown"` or similar?



##########
hudi-client/hudi-java-client/src/main/java/org/apache/hudi/client/common/HoodieJavaEngineContext.java:
##########
@@ -192,6 +193,13 @@ public void cancelAllJobs() {
     // no operation for now
   }
 
+  @Override
+  public Map<String, String> getInfo() {
+    final Map<String, String> info = new HashMap<String, String>();
+    System.getProperties().stringPropertyNames().forEach(property -> 
info.put(property, System.getProperty(property)));

Review Comment:
   This dumps every JVM system property into commit metadata — including 
`user.name`, `user.home`, and any secrets passed via `-D` flags (e.g. 
`-Ddb.password=...`). Since commit metadata is persisted permanently in the 
timeline, this is a significant information-leak risk. Could you use a 
allowlist of safe properties instead, similar to what the Spark implementation 
does?



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieClient.java:
##########
@@ -307,4 +310,14 @@ protected boolean 
isStreamingWriteToMetadataEnabled(HoodieTable table) {
     return config.isMetadataTableEnabled()
         && 
config.isMetadataStreamingWritesEnabled(table.getMetaClient().getTableConfig().getTableVersion());
   }
+
+  protected Option<Map<String, String>> updateExtraMetadata(Option<Map<String, 
String>> extraMetadata) {

Review Comment:
   If a caller passes `Option.of(Collections.unmodifiableMap(...))` or 
`Option.of(Map.of(...))`, the `.put()` and `.putAll()` calls here will throw 
`UnsupportedOperationException`. It might be safer to always create a new 
`HashMap` and copy the existing entries into it.



##########
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java:
##########
@@ -1370,6 +1371,21 @@ public Map<String, String> propsMap() {
         .collect(Collectors.toMap(e -> String.valueOf(e.getKey()), e -> 
String.valueOf(e.getValue())));
   }
 
+  public static String getFileComment(Properties props) {
+    final long ts = System.currentTimeMillis();

Review Comment:
   `NetworkUtils.getHostname()` also incurs extra network latency now and it is 
`synchronized`.  Should the hostname be cached so it is called once per process 
per host?



##########
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java:
##########
@@ -1370,6 +1371,21 @@ public Map<String, String> propsMap() {
         .collect(Collectors.toMap(e -> String.valueOf(e.getKey()), e -> 
String.valueOf(e.getValue())));
   }
 
+  public static String getFileComment(Properties props) {
+    final long ts = System.currentTimeMillis();
+    return String.format("Date=%s, ts=%d, host=%s, #properties=%d, 
hudi_version=%s",
+        new java.util.Date(ts), ts, NetworkUtils.getHostname(), props.size(), 
HoodieVersion.get());
+  }
+
+  public static String getFileComment(Properties props, Map<String, String> 
extraKeyValue) {

Review Comment:
   nit: this method is not used.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieClient.java:
##########
@@ -307,4 +310,14 @@ protected boolean 
isStreamingWriteToMetadataEnabled(HoodieTable table) {
     return config.isMetadataTableEnabled()
         && 
config.isMetadataStreamingWritesEnabled(table.getMetaClient().getTableConfig().getTableVersion());
   }
+
+  protected Option<Map<String, String>> updateExtraMetadata(Option<Map<String, 
String>> extraMetadata) {
+    // Add write config, HUDI version and engine specific extra metadata
+    extraMetadata = extraMetadata.isPresent() ? extraMetadata : Option.of(new 
HashMap<String, String>());
+    extraMetadata.get().put("hudi.version", HoodieVersion.get());

Review Comment:
   Storing `config.toString()` in every commit's metadata could be very large 
(HoodieWriteConfig has hundreds of properties) and may include sensitive values 
like connection strings or credentials. I'd be cautious about persisting the 
entire config dump — have you considered storing just a few key config values, 
or a hash/fingerprint instead, or using allowlist or denylist for for filtering?



##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/common/HoodieFlinkEngineContext.java:
##########
@@ -233,6 +233,11 @@ public void cancelAllJobs() {
     // no operation for now
   }
 
+  @Override
+  public Map<String, String> getInfo() {

Review Comment:
   ```suggestion
     public Map<String, String> getEngineProperties() {
   ```



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java:
##########
@@ -666,6 +666,7 @@ Option<String> scheduleTableServiceInternal(Option<String> 
providedInstantTime,
     if (!tableServicesEnabled(config)) {
       return Option.empty();
     }
+    extraMetadata = updateExtraMetadata(extraMetadata);

Review Comment:
   This adds version/engine metadata even for service types that bail out 
immediately after (e.g. ARCHIVE on the next line). It might be cleaner to move 
this call after the early-return checks, so we only enrich metadata when we're 
actually going to use it.
   
   Also, is the addition to commit method good enough which also takes care of 
table services?



##########
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/common/HoodieFlinkEngineContext.java:
##########
@@ -233,6 +233,11 @@ public void cancelAllJobs() {
     // no operation for now
   }
 
+  @Override
+  public Map<String, String> getInfo() {
+    return Collections.emptyMap();

Review Comment:
   Any useful props to add from Flink Engine @danny0405 ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to