gianm commented on a change in pull request #6833: fix parquet parse
performance issue
URL: https://github.com/apache/incubator-druid/pull/6833#discussion_r246643395
##########
File path:
extensions-core/parquet-extensions/src/main/java/org/apache/druid/data/input/parquet/simple/ParquetGroupConverter.java
##########
@@ -62,36 +62,39 @@ private static Object convertField(Group g, String
fieldName, boolean binaryAsSt
final int fieldIndex = g.getType().getFieldIndex(fieldName);
- Type fieldType = g.getType().getFields().get(fieldIndex);
-
- // primitive field
- if (fieldType.isPrimitive()) {
- // primitive list
- if (fieldType.getRepetition().equals(Type.Repetition.REPEATED)) {
- int repeated = g.getFieldRepetitionCount(fieldIndex);
- List<Object> vals = new ArrayList<>();
- for (int i = 0; i < repeated; i++) {
- vals.add(convertPrimitiveField(g, fieldIndex, i, binaryAsString));
+ if (g.getFieldRepetitionCount(fieldIndex) > 0) {
+ Type fieldType = g.getType().getFields().get(fieldIndex);
+
+ // primitive field
+ if (fieldType.isPrimitive()) {
+ // primitive list
+ if (fieldType.getRepetition().equals(Type.Repetition.REPEATED)) {
+ int repeated = g.getFieldRepetitionCount(fieldIndex);
+ List<Object> vals = new ArrayList<>();
+ for (int i = 0; i < repeated; i++) {
+ vals.add(convertPrimitiveField(g, fieldIndex, i, binaryAsString));
+ }
+ return vals;
+ }
+ return convertPrimitiveField(g, fieldIndex, binaryAsString);
+ } else {
+ if (fieldType.isRepetition(Type.Repetition.REPEATED)) {
+ return convertRepeatedFieldToList(g, fieldIndex, binaryAsString);
}
- return vals;
- }
- return convertPrimitiveField(g, fieldIndex, binaryAsString);
- } else {
- if (fieldType.isRepetition(Type.Repetition.REPEATED)) {
- return convertRepeatedFieldToList(g, fieldIndex, binaryAsString);
- }
- if (isLogicalMapType(fieldType)) {
- return convertLogicalMap(g.getGroup(fieldIndex, 0), binaryAsString);
- }
+ if (isLogicalMapType(fieldType)) {
+ return convertLogicalMap(g.getGroup(fieldIndex, 0), binaryAsString);
+ }
- if (isLogicalListType(fieldType)) {
- return convertLogicalList(g.getGroup(fieldIndex, 0), binaryAsString);
- }
+ if (isLogicalListType(fieldType)) {
+ return convertLogicalList(g.getGroup(fieldIndex, 0), binaryAsString);
+ }
- // not a list, but not a primtive, return the nested group type
- return g.getGroup(fieldIndex, 0);
+ // not a list, but not a primtive, return the nested group type
+ return g.getGroup(fieldIndex, 0);
+ }
}
+ return null;
Review comment:
nit, but, this would be clearer if it were in an `else` block with a comment
about why we're returning null here. Otherwise it's not clear that the
(relatively complex) block above is not meant to be able to fall through.
Alternatively, flip things such that this block is first:
```java
if (g.getFieldRepetitionCount(fieldIndex) == 0) {
return null;
}
```
It's simple enough that it's clear that it won't fall through. It's also
clear that it's a "skip if there is no data" kind of check.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]