gemini-code-assist[bot] commented on code in PR #38876:
URL: https://github.com/apache/beam/pull/38876#discussion_r3383122705


##########
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IcebergUtils.java:
##########
@@ -631,6 +633,51 @@ private static Object getLogicalTypeValue(Object 
icebergValue, Schema.FieldType
     return icebergValue;
   }
 
+  /** Serializes a table identifier without losing dots or other special 
characters in each part. */
+  public static String tableIdentifierToString(TableIdentifier 
tableIdentifier) {
+    TableIdentifier identifier = checkArgumentNotNull(tableIdentifier);
+    return requiresJsonTableIdentifier(identifier)
+        ? TableIdentifierParser.toJson(identifier)
+        : identifier.toString();
+  }
+
+  /** Parses either Iceberg's JSON table identifier representation or the 
legacy dotted form. */
+  public static TableIdentifier parseTableIdentifier(String table) {
+    if (startsWithJsonObject(table)) {
+      return TableIdentifierParser.fromJson(table);
+    }
+
+    return TableIdentifier.parse(table);
+  }
+
+  private static boolean startsWithJsonObject(@Nullable String value) {
+    if (value == null) {
+      return false;
+    }
+
+    for (int i = 0; i < value.length(); i++) {
+      if (!Character.isWhitespace(value.charAt(i))) {
+        return value.charAt(i) == '{';
+      }
+    }
+
+    return false;
+  }

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   The current implementation of `startsWithJsonObject` only checks if the 
first non-whitespace character is `{`. This can lead to false positives for 
legacy table identifiers that start with `{` but are not JSON (e.g., 
`{dogs}.food`). If such an identifier is passed to `parseTableIdentifier`, it 
will incorrectly attempt to parse it as JSON and fail.
   
   We can make this check much more robust by verifying that the string both 
starts with `{` and ends with `}` (ignoring leading/trailing whitespace). This 
ensures that only actual JSON-like strings are treated as JSON, while legacy 
dotted strings like `{dogs}.food` are safely parsed using the legacy parser.
   
   ```java
     private static boolean startsWithJsonObject(@Nullable String value) {
       if (value == null) {
         return false;
       }
   
       int start = 0;
       while (start < value.length() && 
Character.isWhitespace(value.charAt(start))) {
         start++;
       }
       if (start == value.length() || value.charAt(start) != '{') {
         return false;
       }
   
       int end = value.length() - 1;
       while (end >= 0 && Character.isWhitespace(value.charAt(end))) {
         end--;
       }
       return end >= 0 && value.charAt(end) == '}';
     }
   ```



##########
sdks/java/io/iceberg/src/test/java/org/apache/beam/sdk/io/iceberg/IcebergUtilsTest.java:
##########
@@ -60,6 +65,57 @@
 @RunWith(Enclosed.class)
 public class IcebergUtilsTest {
 
+  @RunWith(JUnit4.class)
+  public static class TableIdentifierTests {
+
+    @Test
+    public void parseTableIdentifierParsesJson() {
+      TableIdentifier identifier =
+          parseTableIdentifier(
+              " {\"namespace\": [\"dogs\", \"owners.and.handlers\"],"
+                  + " \"name\": \"food.with.dots\"}");
+
+      assertArrayEquals(
+          new String[] {"dogs", "owners.and.handlers"}, 
identifier.namespace().levels());
+      assertEquals("food.with.dots", identifier.name());
+    }
+
+    @Test
+    public void parseTableIdentifierParsesLegacyDottedString() {
+      TableIdentifier identifier = 
parseTableIdentifier("dogs.owners.and.handlers.food");
+
+      assertArrayEquals(
+          new String[] {"dogs", "owners", "and", "handlers"}, 
identifier.namespace().levels());
+      assertEquals("food", identifier.name());
+    }
+
+    @Test
+    public void tableIdentifierToStringRoundTripsSpecialCharacters() {
+      TableIdentifier expected =
+          TableIdentifier.of("dogs", "owners.and.handlers", "food.with.dots");
+
+      assertEquals(expected, 
parseTableIdentifier(tableIdentifierToString(expected)));
+    }
+
+    @Test
+    public void tableIdentifierToStringUsesLegacyFormWhenUnambiguous() {
+      assertEquals("dogs.food", 
tableIdentifierToString(TableIdentifier.of("dogs", "food")));
+    }
+
+    @Test
+    public void 
tableIdentifierToStringUsesJsonForLegacyStringsThatLookLikeJson() {
+      TableIdentifier expected = TableIdentifier.of("{dogs}", "food");
+
+      assertEquals(expected, 
parseTableIdentifier(tableIdentifierToString(expected)));
+    }

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   With the improved `startsWithJsonObject` check, a table identifier like 
`TableIdentifier.of("{dogs}", "food")` will no longer be serialized as JSON 
because its legacy string representation `"{dogs}.food"` does not end with `}`. 
To keep the test's intent of verifying JSON serialization for strings that look 
like JSON, we should update the expected identifier to 
`TableIdentifier.of("{dogs}", "{food}")` (which starts with `{` and ends with 
`}`). We can also add a test to verify that partial JSON-like strings correctly 
fall back to the legacy dotted format.
   
   ```java
       @Test
       public void 
tableIdentifierToStringUsesJsonForLegacyStringsThatLookLikeJson() {
         TableIdentifier expected = TableIdentifier.of("{dogs}", "{food}");
   
         assertEquals(expected, 
parseTableIdentifier(tableIdentifierToString(expected)));
       }
   
       @Test
       public void 
tableIdentifierToStringDoesNotUseJsonForPartialJsonLikeStrings() {
         TableIdentifier expected = TableIdentifier.of("{dogs}", "food");
   
         assertEquals("{dogs}.food", tableIdentifierToString(expected));
         assertEquals(expected, 
parseTableIdentifier(tableIdentifierToString(expected)));
       }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to