rohangarg commented on code in PR #13294:
URL: https://github.com/apache/druid/pull/13294#discussion_r1011986493


##########
extensions-core/parquet-extensions/src/main/java/org/apache/druid/data/input/parquet/simple/ParquetGroupConverter.java:
##########
@@ -326,8 +362,8 @@ private static Object convertPrimitiveField(Group g, int 
fieldIndex, int index,
           // todo: idk wtd about unsigned
           case UINT_8:
           case UINT_16:
-          case UINT_32:
             return g.getInteger(fieldIndex, index);
+          case UINT_32:

Review Comment:
   good catch! :)



##########
extensions-core/parquet-extensions/src/main/java/org/apache/druid/data/input/parquet/simple/ParquetGroupConverter.java:
##########
@@ -173,12 +183,38 @@ optional group my_list (LIST) {
         vals.add(convertPrimitiveField(g, 0, i, binaryAsString));
       } else {
         Group listItem = g.getGroup(0, i);
-        vals.add(listItem);
+        vals.add(convertListElement(listItem, binaryAsString));
       }
     }
     return vals;
   }
 
+  private static Object convertListElement(Group listItem, boolean 
binaryAsString)
+  {
+    if (
+        listItem.getType().isRepetition(Type.Repetition.REPEATED) &&
+        !listItem.getType().isPrimitive() &&
+        listItem.getType().asGroupType().getFieldCount() == 1 &&
+        listItem.getType().getFields().get(0).isPrimitive()
+    ) {
+      // nullable primitive list elements can have a repeating wrapper 
element, peel it off
+      return convertPrimitiveField(listItem, 0, binaryAsString);
+    } else if (
+        listItem.getType().isRepetition(Type.Repetition.REPEATED) &&
+        listItem.getFieldRepetitionCount(0) == 1 &&
+        listItem.getType().getFieldCount() == 1 &&
+        listItem.getType().getName().equalsIgnoreCase("list") &&
+        listItem.getType().getFieldName(0).equalsIgnoreCase("element") &&
+        listItem.getGroup(0, 
0).getType().isRepetition(Type.Repetition.OPTIONAL)

Review Comment:
   some doubts : 
   1. should `listItem.getType().getFieldCount() == 1` be checked just after 
`listItem.getType().isRepetition(Type.Repetition.REPEATED)`?
   2. could we use 
`listItem.getType().getLogicalTypeAnnotation().equals(LogicalTypeAnnotation.listType())`
 for list check and assume the `fieldName` for `0` index to be `element` 
'type'? was checking 
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#backward-compatibility-rules
 and it seemed like the name of the field in the `list` block could be anything 
and not always 'element'. totally fine if you have thought about this and want 
to tackle a particular case
   3. should we do `convertListElement(listItem.getGroup(0, 0)` in the return? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to