Copilot commented on code in PR #17504:
URL: https://github.com/apache/pinot/pull/17504#discussion_r2688514517


##########
pinot-plugins/pinot-input-format/pinot-parquet/src/main/java/org/apache/pinot/plugin/inputformat/parquet/ParquetAvroRecordExtractor.java:
##########
@@ -40,21 +39,26 @@ protected Object transformValue(Object value, Schema.Field 
field) {
 
   Object handleDeprecatedTypes(Object value, Schema.Field field) {
     Schema.Type avroColumnType = field.schema().getType();
-    if (avroColumnType == org.apache.avro.Schema.Type.UNION) {
-      org.apache.avro.Schema nonNullSchema = null;
-      for (org.apache.avro.Schema childFieldSchema : 
field.schema().getTypes()) {
-        if (childFieldSchema.getType() != org.apache.avro.Schema.Type.NULL) {
+    if (avroColumnType == Schema.Type.UNION) {
+      Schema nonNullSchema = null;
+      for (Schema childFieldSchema : field.schema().getTypes()) {
+        if (childFieldSchema.getType() != Schema.Type.NULL) {
           if (nonNullSchema == null) {
             nonNullSchema = childFieldSchema;
           } else {
             throw new IllegalStateException("More than one non-null schema in 
UNION schema");
           }
         }
       }
+      assert nonNullSchema != null;
 
-      //INT96 is deprecated. We convert to long as we do in the native parquet 
extractor.
-      if 
(nonNullSchema.getName().equals(PrimitiveType.PrimitiveTypeName.INT96.name())) {
-       return ParquetNativeRecordExtractor.convertInt96ToLong((byte[]) value);
+      // NOTE:
+      // INT96 is deprecated. We convert to long as we do in the native 
parquet extractor.
+      // See org.apache.parquet.avro.AvroSchemaConverter about how INT96 is 
converted into Avro schema.
+      // We have to rely on the doc to determine whether a field is INT96.
+      if (nonNullSchema.getType() == Schema.Type.FIXED && 
nonNullSchema.getFixedSize() == 12
+          && "INT96 represented as byte[12]".equals(nonNullSchema.getDoc())) {

Review Comment:
   The magic number 12 and the hardcoded documentation string 'INT96 
represented as byte[12]' should be extracted as named constants (e.g., 
INT96_BYTE_SIZE and INT96_DOC_STRING) to improve maintainability and make the 
intent clearer.



##########
pinot-plugins/pinot-input-format/pinot-parquet/src/main/java/org/apache/pinot/plugin/inputformat/parquet/ParquetAvroRecordExtractor.java:
##########
@@ -40,21 +39,26 @@ protected Object transformValue(Object value, Schema.Field 
field) {
 
   Object handleDeprecatedTypes(Object value, Schema.Field field) {
     Schema.Type avroColumnType = field.schema().getType();
-    if (avroColumnType == org.apache.avro.Schema.Type.UNION) {
-      org.apache.avro.Schema nonNullSchema = null;
-      for (org.apache.avro.Schema childFieldSchema : 
field.schema().getTypes()) {
-        if (childFieldSchema.getType() != org.apache.avro.Schema.Type.NULL) {
+    if (avroColumnType == Schema.Type.UNION) {
+      Schema nonNullSchema = null;
+      for (Schema childFieldSchema : field.schema().getTypes()) {
+        if (childFieldSchema.getType() != Schema.Type.NULL) {
           if (nonNullSchema == null) {
             nonNullSchema = childFieldSchema;
           } else {
             throw new IllegalStateException("More than one non-null schema in 
UNION schema");
           }
         }
       }
+      assert nonNullSchema != null;

Review Comment:
   The assertion will not execute in production builds where assertions are 
disabled. Consider replacing with an explicit null check and throwing an 
IllegalStateException with a descriptive message, such as 'All schemas in UNION 
are NULL types'.
   ```suggestion
         if (nonNullSchema == null) {
           throw new IllegalStateException("All schemas in UNION are NULL 
types");
         }
   ```



##########
pinot-plugins/pinot-input-format/pinot-parquet/src/main/java/org/apache/pinot/plugin/inputformat/parquet/ParquetAvroRecordExtractor.java:
##########
@@ -40,21 +39,26 @@ protected Object transformValue(Object value, Schema.Field 
field) {
 
   Object handleDeprecatedTypes(Object value, Schema.Field field) {
     Schema.Type avroColumnType = field.schema().getType();
-    if (avroColumnType == org.apache.avro.Schema.Type.UNION) {
-      org.apache.avro.Schema nonNullSchema = null;
-      for (org.apache.avro.Schema childFieldSchema : 
field.schema().getTypes()) {
-        if (childFieldSchema.getType() != org.apache.avro.Schema.Type.NULL) {
+    if (avroColumnType == Schema.Type.UNION) {
+      Schema nonNullSchema = null;
+      for (Schema childFieldSchema : field.schema().getTypes()) {
+        if (childFieldSchema.getType() != Schema.Type.NULL) {
           if (nonNullSchema == null) {
             nonNullSchema = childFieldSchema;
           } else {
             throw new IllegalStateException("More than one non-null schema in 
UNION schema");
           }
         }
       }
+      assert nonNullSchema != null;
 
-      //INT96 is deprecated. We convert to long as we do in the native parquet 
extractor.
-      if 
(nonNullSchema.getName().equals(PrimitiveType.PrimitiveTypeName.INT96.name())) {
-       return ParquetNativeRecordExtractor.convertInt96ToLong((byte[]) value);
+      // NOTE:
+      // INT96 is deprecated. We convert to long as we do in the native 
parquet extractor.
+      // See org.apache.parquet.avro.AvroSchemaConverter about how INT96 is 
converted into Avro schema.
+      // We have to rely on the doc to determine whether a field is INT96.
+      if (nonNullSchema.getType() == Schema.Type.FIXED && 
nonNullSchema.getFixedSize() == 12
+          && "INT96 represented as byte[12]".equals(nonNullSchema.getDoc())) {

Review Comment:
   The new INT96 detection logic should have test coverage to verify it 
correctly identifies INT96 fields using the fixed-size and documentation 
attributes, and that it properly handles edge cases like null documentation or 
incorrect fixed sizes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to