hudi-agent commented on code in PR #18816:
URL: https://github.com/apache/hudi/pull/18816#discussion_r3287998647
##########
hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java:
##########
@@ -201,7 +201,9 @@ public void removeCorruptedPendingCleanAction() {
TimelineUtils.deleteInstantFile(client.getStorage(),
client.getTimelinePath(),
instant, client.getInstantFileNameGenerator());
} catch (IOException ioe) {
- if (ioe.getMessage().contains("Not an Avro data file")) {
+ if (ioe.getMessage() == null || ioe.getMessage().contains("Not an Avro
data file")
Review Comment:
π€ This broadened condition now treats any `IOException` with a null message
β and any message containing the very generic substrings `"EOF"` or `"unable to
read"` β as corruption and silently deletes the instant file. A transient
storage error (HDFS/S3 hiccup, permission issue, etc.) can easily produce a
null message or an `EOFException`, and we'd then delete a legitimate pending
clean instant. Could we narrow this to only the new LSM-related error
signatures you actually need to recognize, rather than `getMessage() == null` +
broad keyword matches? @yihua
<sub><i>- AI-generated; verify before applying. React π/π to flag
quality.</i></sub>
##########
hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRepairsCommand.java:
##########
@@ -257,16 +264,16 @@ public void testRemoveCorruptedPendingCleanAction()
throws IOException {
for (int i = 100; i < 104; i++) {
String timestamp = String.valueOf(i);
// Write corrupted requested Clean File
-
HoodieTestCommitMetadataGenerator.createEmptyCleanRequestedFile(tablePath,
timestamp, conf);
+ org.apache.hadoop.fs.Path filePath = new
org.apache.hadoop.fs.Path(metaClient.getTimelinePath() + "/" + timestamp +
".clean.requested");
+
org.apache.hudi.common.testutils.HoodieTestDataGenerator.createEmptyFile(tablePath,
filePath, conf);
}
// reload meta client
metaClient = HoodieTableMetaClient.reload(metaClient);
// first, there are four instants
assertEquals(4,
metaClient.getActiveTimeline().filterInflightsAndRequested().countInstants());
- Object result = shell.evaluate(() -> "repair corrupted clean files");
- assertTrue(ShellEvaluationResultUtil.isSuccess(result));
+ shell.evaluate(() -> "repair corrupted clean files");
Review Comment:
π€ The previous test asserted `ShellEvaluationResultUtil.isSuccess(result)`
after invoking `repair corrupted clean files`. The new version drops that
assertion entirely. Is the command now expected to return a non-success status,
or did this assertion start failing? If it's the latter, it'd be safer to keep
the success check rather than silently ignore the result.
<sub><i>- AI-generated; verify before applying. React π/π to flag
quality.</i></sub>
##########
hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRepairsCommand.java:
##########
@@ -227,19 +222,31 @@ public void testOverwriteHoodieProperties() throws
IOException {
.collect(Collectors.toMap(e -> String.valueOf(e.getKey()), e ->
String.valueOf(e.getValue())));
expected.putIfAbsent(TABLE_CHECKSUM.key(),
String.valueOf(generateChecksum(tableConfig.getProps())));
expected.putIfAbsent(DROP_PARTITION_COLUMNS.key(),
String.valueOf(DROP_PARTITION_COLUMNS.defaultValue()));
+
+ // Add properties that are now present in Hudi 1.x by default
+ if (result.containsKey(HoodieTableConfig.TIMELINE_PATH.key())) {
+ expected.putIfAbsent(HoodieTableConfig.TIMELINE_PATH.key(),
result.get(HoodieTableConfig.TIMELINE_PATH.key()));
+ }
Review Comment:
π€ nit: `"hoodie.table.initial.version"` is a raw string literal β does
`HoodieTableConfig` expose a constant for this key? Using the constant would
avoid a silent mismatch if the key name ever changes.
<sub><i>- AI-generated; verify before applying. React π/π to flag
quality.</i></sub>
##########
hudi-cli/src/main/java/org/apache/hudi/cli/commands/RepairsCommand.java:
##########
@@ -201,7 +201,9 @@ public void removeCorruptedPendingCleanAction() {
TimelineUtils.deleteInstantFile(client.getStorage(),
client.getTimelinePath(),
instant, client.getInstantFileNameGenerator());
} catch (IOException ioe) {
Review Comment:
π€ nit: `java.io.EOFException` is used inline here with its full package path
β could you add a proper import at the top instead? Inline FQNs in production
code make the condition harder to scan.
<sub><i>- AI-generated; verify before applying. React π/π to flag
quality.</i></sub>
##########
hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRepairsCommand.java:
##########
@@ -227,19 +222,31 @@ public void testOverwriteHoodieProperties() throws
IOException {
.collect(Collectors.toMap(e -> String.valueOf(e.getKey()), e ->
String.valueOf(e.getValue())));
expected.putIfAbsent(TABLE_CHECKSUM.key(),
String.valueOf(generateChecksum(tableConfig.getProps())));
expected.putIfAbsent(DROP_PARTITION_COLUMNS.key(),
String.valueOf(DROP_PARTITION_COLUMNS.defaultValue()));
+
+ // Add properties that are now present in Hudi 1.x by default
+ if (result.containsKey(HoodieTableConfig.TIMELINE_PATH.key())) {
+ expected.putIfAbsent(HoodieTableConfig.TIMELINE_PATH.key(),
result.get(HoodieTableConfig.TIMELINE_PATH.key()));
+ }
+ if (result.containsKey("hoodie.table.initial.version")) {
+ expected.putIfAbsent("hoodie.table.initial.version",
result.get("hoodie.table.initial.version"));
+ }
+ if (result.containsKey(HoodieTableConfig.TIMELINE_HISTORY_PATH.key())) {
+ expected.putIfAbsent(HoodieTableConfig.TIMELINE_HISTORY_PATH.key(),
result.get(HoodieTableConfig.TIMELINE_HISTORY_PATH.key()));
+ }
+
assertEquals(expected, result);
// check result
- List<String> allPropsStr = Arrays.asList(NAME.key(), TYPE.key(),
VERSION.key(),
- TIMELINE_HISTORY_PATH.key(), TIMELINE_LAYOUT_VERSION.key(),
TABLE_CHECKSUM.key(), DROP_PARTITION_COLUMNS.key());
+ // Include all properties from both old and new props for comparison
Review Comment:
π€ nit: `allKeys` and `allPropsStr` look like dead code here β they only feed
into the `rows` variable below, but the `HoodiePrintHelper.print(..., rows)`
call was removed, so `rows` is never consumed. Could you delete lines 240β245
along with the now-unused `rows` computation?
<sub><i>- AI-generated; verify before applying. React π/π to flag
quality.</i></sub>
##########
hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRepairsCommand.java:
##########
@@ -227,19 +222,31 @@ public void testOverwriteHoodieProperties() throws
IOException {
.collect(Collectors.toMap(e -> String.valueOf(e.getKey()), e ->
String.valueOf(e.getValue())));
expected.putIfAbsent(TABLE_CHECKSUM.key(),
String.valueOf(generateChecksum(tableConfig.getProps())));
expected.putIfAbsent(DROP_PARTITION_COLUMNS.key(),
String.valueOf(DROP_PARTITION_COLUMNS.defaultValue()));
+
+ // Add properties that are now present in Hudi 1.x by default
+ if (result.containsKey(HoodieTableConfig.TIMELINE_PATH.key())) {
+ expected.putIfAbsent(HoodieTableConfig.TIMELINE_PATH.key(),
result.get(HoodieTableConfig.TIMELINE_PATH.key()));
+ }
+ if (result.containsKey("hoodie.table.initial.version")) {
+ expected.putIfAbsent("hoodie.table.initial.version",
result.get("hoodie.table.initial.version"));
+ }
+ if (result.containsKey(HoodieTableConfig.TIMELINE_HISTORY_PATH.key())) {
+ expected.putIfAbsent(HoodieTableConfig.TIMELINE_HISTORY_PATH.key(),
result.get(HoodieTableConfig.TIMELINE_HISTORY_PATH.key()));
+ }
+
assertEquals(expected, result);
// check result
- List<String> allPropsStr = Arrays.asList(NAME.key(), TYPE.key(),
VERSION.key(),
- TIMELINE_HISTORY_PATH.key(), TIMELINE_LAYOUT_VERSION.key(),
TABLE_CHECKSUM.key(), DROP_PARTITION_COLUMNS.key());
+ // Include all properties from both old and new props for comparison
+ java.util.Set<String> allKeys = new java.util.HashSet<>(result.keySet());
+ allKeys.addAll(oldProps.keySet());
+ List<String> allPropsStr =
allKeys.stream().sorted().collect(java.util.stream.Collectors.toList());
String[][] rows = allPropsStr.stream().sorted().map(key -> new String[]
{key,
oldProps.getOrDefault(key, "null"), result.getOrDefault(key,
"null")})
.toArray(String[][]::new);
- String expect = HoodiePrintHelper.print(new String[]
{HoodieTableHeaderFields.HEADER_HOODIE_PROPERTY,
- HoodieTableHeaderFields.HEADER_OLD_VALUE,
HoodieTableHeaderFields.HEADER_NEW_VALUE}, rows);
- expect = removeNonWordAndStripSpace(expect);
- String got = removeNonWordAndStripSpace(cmdResult.toString());
- assertEquals(expect, got);
+ String got = cmdResult.toString();
+
assertTrue(got.contains(org.apache.hudi.common.table.HoodieTableConfig.NAME.key()));
Review Comment:
π€ The check that the rendered table output matches the formatted `expect`
string was dropped in favor of just asserting the output contains `NAME.key()`
and `"test_table"`. That's a fairly weak check β almost any non-empty output
would pass. If the formatted-table comparison is brittle on v9, could you keep
at least a more representative subset of expected key/value rows rather than
just two substrings?
<sub><i>- AI-generated; verify before applying. React π/π to flag
quality.</i></sub>
##########
hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRepairsCommand.java:
##########
@@ -308,7 +316,7 @@ public void testShowFailedCommits() {
Object result = shell.evaluate(() -> "repair show empty commit
metadata");
assertTrue(ShellEvaluationResultUtil.isSuccess(result));
final List<LogEvent> log = appender.getLog();
- assertEquals(log.size(),4);
+ assertEquals(19, log.size());
Review Comment:
π€ This now asserts 19 empty-commit log entries instead of 4, because the
loop switched from `createCommitFile` (which wrote real metadata) to
`createEmptyFile`. The test used to verify that only commits with empty
metadata are flagged; now every commit is empty so the assertion is essentially
trivial. Was that the intent, or should we restore the partial-metadata setup
so the filter logic is still being exercised?
<sub><i>- AI-generated; verify before applying. React π/π to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]