hudi-agent commented on code in PR #18816:
URL: https://github.com/apache/hudi/pull/18816#discussion_r3287998647


##########
hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java:
##########
@@ -201,7 +201,9 @@ public void removeCorruptedPendingCleanAction() {
         TimelineUtils.deleteInstantFile(client.getStorage(), 
client.getTimelinePath(),
             instant, client.getInstantFileNameGenerator());
       } catch (IOException ioe) {
-        if (ioe.getMessage().contains("Not an Avro data file")) {
+        if (ioe.getMessage() == null || ioe.getMessage().contains("Not an Avro 
data file")

Review Comment:
   πŸ€– This broadened condition now treats any `IOException` with a null message 
β€” and any message containing the very generic substrings `"EOF"` or `"unable to 
read"` β€” as corruption and silently deletes the instant file. A transient 
storage error (HDFS/S3 hiccup, permission issue, etc.) can easily produce a 
null message or an `EOFException`, and we'd then delete a legitimate pending 
clean instant. Could we narrow this to only the new LSM-related error 
signatures you actually need to recognize, rather than `getMessage() == null` + 
broad keyword matches? @yihua
   
   <sub><i>- AI-generated; verify before applying. React πŸ‘/πŸ‘Ž to flag 
quality.</i></sub>



##########
hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRepairsCommand.java:
##########
@@ -257,16 +264,16 @@ public void testRemoveCorruptedPendingCleanAction() 
throws IOException {
     for (int i = 100; i < 104; i++) {
       String timestamp = String.valueOf(i);
       // Write corrupted requested Clean File
-      
HoodieTestCommitMetadataGenerator.createEmptyCleanRequestedFile(tablePath, 
timestamp, conf);
+      org.apache.hadoop.fs.Path filePath = new 
org.apache.hadoop.fs.Path(metaClient.getTimelinePath() + "/" + timestamp + 
".clean.requested");
+      
org.apache.hudi.common.testutils.HoodieTestDataGenerator.createEmptyFile(tablePath,
 filePath, conf);
     }
 
     // reload meta client
     metaClient = HoodieTableMetaClient.reload(metaClient);
     // first, there are four instants
     assertEquals(4, 
metaClient.getActiveTimeline().filterInflightsAndRequested().countInstants());
 
-    Object result = shell.evaluate(() -> "repair corrupted clean files");
-    assertTrue(ShellEvaluationResultUtil.isSuccess(result));
+    shell.evaluate(() -> "repair corrupted clean files");

Review Comment:
   πŸ€– The previous test asserted `ShellEvaluationResultUtil.isSuccess(result)` 
after invoking `repair corrupted clean files`. The new version drops that 
assertion entirely. Is the command now expected to return a non-success status, 
or did this assertion start failing? If it's the latter, it'd be safer to keep 
the success check rather than silently ignore the result.
   
   <sub><i>- AI-generated; verify before applying. React πŸ‘/πŸ‘Ž to flag 
quality.</i></sub>



##########
hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRepairsCommand.java:
##########
@@ -227,19 +222,31 @@ public void testOverwriteHoodieProperties() throws 
IOException {
         .collect(Collectors.toMap(e -> String.valueOf(e.getKey()), e -> 
String.valueOf(e.getValue())));
     expected.putIfAbsent(TABLE_CHECKSUM.key(), 
String.valueOf(generateChecksum(tableConfig.getProps())));
     expected.putIfAbsent(DROP_PARTITION_COLUMNS.key(), 
String.valueOf(DROP_PARTITION_COLUMNS.defaultValue()));
+
+    // Add properties that are now present in Hudi 1.x by default
+    if (result.containsKey(HoodieTableConfig.TIMELINE_PATH.key())) {
+      expected.putIfAbsent(HoodieTableConfig.TIMELINE_PATH.key(), 
result.get(HoodieTableConfig.TIMELINE_PATH.key()));
+    }

Review Comment:
   πŸ€– nit: `"hoodie.table.initial.version"` is a raw string literal β€” does 
`HoodieTableConfig` expose a constant for this key? Using the constant would 
avoid a silent mismatch if the key name ever changes.
   
   <sub><i>- AI-generated; verify before applying. React πŸ‘/πŸ‘Ž to flag 
quality.</i></sub>



##########
hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java:
##########
@@ -201,7 +201,9 @@ public void removeCorruptedPendingCleanAction() {
         TimelineUtils.deleteInstantFile(client.getStorage(), 
client.getTimelinePath(),
             instant, client.getInstantFileNameGenerator());
       } catch (IOException ioe) {

Review Comment:
   πŸ€– nit: `java.io.EOFException` is used inline here with its full package path 
β€” could you add a proper import at the top instead? Inline FQNs in production 
code make the condition harder to scan.
   
   <sub><i>- AI-generated; verify before applying. React πŸ‘/πŸ‘Ž to flag 
quality.</i></sub>



##########
hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRepairsCommand.java:
##########
@@ -227,19 +222,31 @@ public void testOverwriteHoodieProperties() throws 
IOException {
         .collect(Collectors.toMap(e -> String.valueOf(e.getKey()), e -> 
String.valueOf(e.getValue())));
     expected.putIfAbsent(TABLE_CHECKSUM.key(), 
String.valueOf(generateChecksum(tableConfig.getProps())));
     expected.putIfAbsent(DROP_PARTITION_COLUMNS.key(), 
String.valueOf(DROP_PARTITION_COLUMNS.defaultValue()));
+
+    // Add properties that are now present in Hudi 1.x by default
+    if (result.containsKey(HoodieTableConfig.TIMELINE_PATH.key())) {
+      expected.putIfAbsent(HoodieTableConfig.TIMELINE_PATH.key(), 
result.get(HoodieTableConfig.TIMELINE_PATH.key()));
+    }
+    if (result.containsKey("hoodie.table.initial.version")) {
+      expected.putIfAbsent("hoodie.table.initial.version", 
result.get("hoodie.table.initial.version"));
+    }
+    if (result.containsKey(HoodieTableConfig.TIMELINE_HISTORY_PATH.key())) {
+      expected.putIfAbsent(HoodieTableConfig.TIMELINE_HISTORY_PATH.key(), 
result.get(HoodieTableConfig.TIMELINE_HISTORY_PATH.key()));
+    }
+
     assertEquals(expected, result);
 
     // check result
-    List<String> allPropsStr = Arrays.asList(NAME.key(), TYPE.key(), 
VERSION.key(),
-        TIMELINE_HISTORY_PATH.key(), TIMELINE_LAYOUT_VERSION.key(), 
TABLE_CHECKSUM.key(), DROP_PARTITION_COLUMNS.key());
+    // Include all properties from both old and new props for comparison

Review Comment:
   πŸ€– nit: `allKeys` and `allPropsStr` look like dead code here β€” they only feed 
into the `rows` variable below, but the `HoodiePrintHelper.print(..., rows)` 
call was removed, so `rows` is never consumed. Could you delete lines 240–245 
along with the now-unused `rows` computation?
   
   <sub><i>- AI-generated; verify before applying. React πŸ‘/πŸ‘Ž to flag 
quality.</i></sub>



##########
hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRepairsCommand.java:
##########
@@ -227,19 +222,31 @@ public void testOverwriteHoodieProperties() throws 
IOException {
         .collect(Collectors.toMap(e -> String.valueOf(e.getKey()), e -> 
String.valueOf(e.getValue())));
     expected.putIfAbsent(TABLE_CHECKSUM.key(), 
String.valueOf(generateChecksum(tableConfig.getProps())));
     expected.putIfAbsent(DROP_PARTITION_COLUMNS.key(), 
String.valueOf(DROP_PARTITION_COLUMNS.defaultValue()));
+
+    // Add properties that are now present in Hudi 1.x by default
+    if (result.containsKey(HoodieTableConfig.TIMELINE_PATH.key())) {
+      expected.putIfAbsent(HoodieTableConfig.TIMELINE_PATH.key(), 
result.get(HoodieTableConfig.TIMELINE_PATH.key()));
+    }
+    if (result.containsKey("hoodie.table.initial.version")) {
+      expected.putIfAbsent("hoodie.table.initial.version", 
result.get("hoodie.table.initial.version"));
+    }
+    if (result.containsKey(HoodieTableConfig.TIMELINE_HISTORY_PATH.key())) {
+      expected.putIfAbsent(HoodieTableConfig.TIMELINE_HISTORY_PATH.key(), 
result.get(HoodieTableConfig.TIMELINE_HISTORY_PATH.key()));
+    }
+
     assertEquals(expected, result);
 
     // check result
-    List<String> allPropsStr = Arrays.asList(NAME.key(), TYPE.key(), 
VERSION.key(),
-        TIMELINE_HISTORY_PATH.key(), TIMELINE_LAYOUT_VERSION.key(), 
TABLE_CHECKSUM.key(), DROP_PARTITION_COLUMNS.key());
+    // Include all properties from both old and new props for comparison
+    java.util.Set<String> allKeys = new java.util.HashSet<>(result.keySet());
+    allKeys.addAll(oldProps.keySet());
+    List<String> allPropsStr = 
allKeys.stream().sorted().collect(java.util.stream.Collectors.toList());
     String[][] rows = allPropsStr.stream().sorted().map(key -> new String[] 
{key,
             oldProps.getOrDefault(key, "null"), result.getOrDefault(key, 
"null")})
         .toArray(String[][]::new);
-    String expect = HoodiePrintHelper.print(new String[] 
{HoodieTableHeaderFields.HEADER_HOODIE_PROPERTY,
-        HoodieTableHeaderFields.HEADER_OLD_VALUE, 
HoodieTableHeaderFields.HEADER_NEW_VALUE}, rows);
-    expect = removeNonWordAndStripSpace(expect);
-    String got = removeNonWordAndStripSpace(cmdResult.toString());
-    assertEquals(expect, got);
+    String got = cmdResult.toString();
+    
assertTrue(got.contains(org.apache.hudi.common.table.HoodieTableConfig.NAME.key()));

Review Comment:
   πŸ€– The check that the rendered table output matches the formatted `expect` 
string was dropped in favor of just asserting the output contains `NAME.key()` 
and `"test_table"`. That's a fairly weak check β€” almost any non-empty output 
would pass. If the formatted-table comparison is brittle on v9, could you keep 
at least a more representative subset of expected key/value rows rather than 
just two substrings?
   
   <sub><i>- AI-generated; verify before applying. React πŸ‘/πŸ‘Ž to flag 
quality.</i></sub>



##########
hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRepairsCommand.java:
##########
@@ -308,7 +316,7 @@ public void testShowFailedCommits() {
       Object result = shell.evaluate(() -> "repair show empty commit 
metadata");
       assertTrue(ShellEvaluationResultUtil.isSuccess(result));
       final List<LogEvent> log = appender.getLog();
-      assertEquals(log.size(),4);
+      assertEquals(19, log.size());

Review Comment:
   πŸ€– This now asserts 19 empty-commit log entries instead of 4, because the 
loop switched from `createCommitFile` (which wrote real metadata) to 
`createEmptyFile`. The test used to verify that only commits with empty 
metadata are flagged; now every commit is empty so the assertion is essentially 
trivial. Was that the intent, or should we restore the partial-metadata setup 
so the filter logic is still being exercised?
   
   <sub><i>- AI-generated; verify before applying. React πŸ‘/πŸ‘Ž to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to