Copilot commented on code in PR #18140:
URL: https://github.com/apache/pinot/pull/18140#discussion_r3486242836


##########
README.md:
##########
@@ -193,5 +193,55 @@ Check out [Pinot 
documentation](https://docs.pinot.apache.org/) for a complete d
 - [Pinot Architecture](https://docs.pinot.apache.org/basics/architecture)
 - [Pinot Query 
Language](https://docs.pinot.apache.org/users/user-guide-query/pinot-query-language)
 
+### UUID Logical Type
+
+Pinot supports a logical `UUID` type for both single- and multi-value columns. 
In v1, Pinot stores `UUID` values
+using the existing 16-byte `BYTES` representation, while schema definitions 
and query results use canonical
+lowercase RFC 4122 strings.

Review Comment:
   The README states UUID is supported for both single- and multi-value 
columns, but the PR description’s v1 contract/scope exclusions say multi-value 
UUID is not supported. Please align the user-facing docs with the intended v1 
behavior (either update the README to say single-value only, or update the PR 
description and ensure MV UUID is fully supported end-to-end).



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/HashUtils.java:
##########
@@ -47,6 +48,15 @@ public static byte[] hashMD5(byte[] bytes) {
   /**
    * Returns a byte array that is a concatenation of the binary representation 
of each of the passed UUID values.
    * If any of the values is not a valid UUID, then we return the result of 
{@link PrimaryKey#asBytes()}.
+   *
+   * <p>String inputs are parsed via {@link UUID#fromString(String)} (lenient 
— accepts non-canonical short
+   * forms like {@code "1-2-3-4-5"}) for backward compatibility with tables 
that have been hashed under the
+   * pre-UUID-type {@code HashFunction.UUID} contract. Binary inputs ({@code 
byte[]} or {@link UUID}) go
+   * through {@link UuidUtils#toBytes(Object)} since they carry no parse 
ambiguity. Un-parseable strings
+   * (and any other invalid value) still fall through to {@link 
PrimaryKey#asBytes()} via the outer

Review Comment:
   The Javadoc claims UUID.fromString() is “lenient” and accepts non-canonical 
short forms like "1-2-3-4-5", but UUID.fromString() is strict about the 
canonical 8-4-4-4-12 hex form and will throw for non-canonical inputs. This 
comment should be corrected so readers don’t assume broader accepted input 
shapes than the code actually supports.



##########
pinot-spi/src/test/java/org/apache/pinot/spi/data/SchemaTest.java:
##########
@@ -100,6 +106,14 @@ public void testValidation() {
     schemaToValidate.validate();
   }
 
+  @Test
+  public void testUUIDValidationAllowsMV() {
+    Schema schema = new Schema();
+    schema.addField(new DimensionFieldSpec("uuidMv", FieldSpec.DataType.UUID, 
false));
+
+    schema.validate();
+  }

Review Comment:
   This test asserts that multi-value UUID columns pass Schema.validate(), but 
the PR description’s v1 contract says multi-value UUID is not supported. If MV 
UUID is truly out of scope for v1, validation should fail (likely via 
SchemaUtils.validateMultiValueCompatibility / SchemaUtils.validate) and this 
test should be updated accordingly; otherwise, the PR description (and README 
wording) should be updated to reflect MV support.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to