Copilot commented on code in PR #18140: URL: https://github.com/apache/pinot/pull/18140#discussion_r3486242836
########## README.md: ########## @@ -193,5 +193,55 @@ Check out [Pinot documentation](https://docs.pinot.apache.org/) for a complete d - [Pinot Architecture](https://docs.pinot.apache.org/basics/architecture) - [Pinot Query Language](https://docs.pinot.apache.org/users/user-guide-query/pinot-query-language) +### UUID Logical Type + +Pinot supports a logical `UUID` type for both single- and multi-value columns. In v1, Pinot stores `UUID` values +using the existing 16-byte `BYTES` representation, while schema definitions and query results use canonical +lowercase RFC 4122 strings. Review Comment: The README states UUID is supported for both single- and multi-value columns, but the PR description’s v1 contract/scope exclusions say multi-value UUID is not supported. Please align the user-facing docs with the intended v1 behavior (either update the README to say single-value only, or update the PR description and ensure MV UUID is fully supported end-to-end). ########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/HashUtils.java: ########## @@ -47,6 +48,15 @@ public static byte[] hashMD5(byte[] bytes) { /** * Returns a byte array that is a concatenation of the binary representation of each of the passed UUID values. * If any of the values is not a valid UUID, then we return the result of {@link PrimaryKey#asBytes()}. + * + * <p>String inputs are parsed via {@link UUID#fromString(String)} (lenient — accepts non-canonical short + * forms like {@code "1-2-3-4-5"}) for backward compatibility with tables that have been hashed under the + * pre-UUID-type {@code HashFunction.UUID} contract. Binary inputs ({@code byte[]} or {@link UUID}) go + * through {@link UuidUtils#toBytes(Object)} since they carry no parse ambiguity. Un-parseable strings + * (and any other invalid value) still fall through to {@link PrimaryKey#asBytes()} via the outer Review Comment: The Javadoc claims UUID.fromString() is “lenient” and accepts non-canonical short forms like "1-2-3-4-5", but UUID.fromString() is strict about the canonical 8-4-4-4-12 hex form and will throw for non-canonical inputs. This comment should be corrected so readers don’t assume broader accepted input shapes than the code actually supports. ########## pinot-spi/src/test/java/org/apache/pinot/spi/data/SchemaTest.java: ########## @@ -100,6 +106,14 @@ public void testValidation() { schemaToValidate.validate(); } + @Test + public void testUUIDValidationAllowsMV() { + Schema schema = new Schema(); + schema.addField(new DimensionFieldSpec("uuidMv", FieldSpec.DataType.UUID, false)); + + schema.validate(); + } Review Comment: This test asserts that multi-value UUID columns pass Schema.validate(), but the PR description’s v1 contract says multi-value UUID is not supported. If MV UUID is truly out of scope for v1, validation should fail (likely via SchemaUtils.validateMultiValueCompatibility / SchemaUtils.validate) and this test should be updated accordingly; otherwise, the PR description (and README wording) should be updated to reflect MV support. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
