ahmedabu98 commented on code in PR #38580:
URL: https://github.com/apache/beam/pull/38580#discussion_r3282828701


##########
sdks/java/io/iceberg/src/test/java/org/apache/beam/sdk/io/iceberg/SerializableDataFileTest.java:
##########
@@ -73,4 +85,58 @@ public void testFieldsInEqualsMethodInSyncWithGetterFields() 
{
               + "to this test class's FIELDS_SET.");
     }
   }
+
+  /**
+   * Bounds with {@code capacity > limit} must be copied by {@code [position, 
limit)}, not by {@link
+   * ByteBuffer#array()}. Otherwise trailing 0x00 bytes leak into the manifest 
bounds and break
+   * equality predicate pushdown in some query engines.
+   */
+  @Test
+  public void testBoundByteBufferIsCopiedByLimitNotBackingArrayLength() {
+    // Reproduce the shape iceberg-parquet produces in the wild: a ByteBuffer
+    // whose backing array is larger than [position, limit), with trailing
+    // 0x00 bytes. iceberg-parquet hits this because the JDK UTF-8 encoder
+    // over-allocates; here we build it explicitly so the test doesn't depend
+    // on encoder internals.
+    int columnId = 3;
+    byte[] expectedLower = "lower_bound_str".getBytes(StandardCharsets.UTF_8);
+    byte[] expectedUpper = "upper_bound_str".getBytes(StandardCharsets.UTF_8);
+
+    ByteBuffer lower = ByteBuffer.allocate(expectedLower.length + 1);
+    lower.put(expectedLower);
+    lower.flip();
+    ByteBuffer upper = ByteBuffer.allocate(expectedUpper.length + 1);
+    upper.put(expectedUpper);
+    upper.flip();

Review Comment:
   Should we test this against Conversions.toByteBuffer() instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to