[GitHub] [drill] vvysotskyi commented on a change in pull request #2238: DRILL-7934: Fix NullPointerException error when reading parquet files

GitBox Sun, 30 May 2021 08:46:27 -0700


vvysotskyi commented on a change in pull request #2238:
URL: https://github.com/apache/drill/pull/2238#discussion_r642094351




##########
File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScanStatistics.java
##########
@@ -115,7 +118,7 @@ public void collect(Collection<T> metadataList) {
           previousCount.setValue(Statistic.NO_COLUMN_STATS);
         }
         ColumnMetadata columnMetadata = 
SchemaPathUtils.getColumnMetadata(schemaPath, metadata.getSchema());
-        TypeProtos.MajorType majorType = columnMetadata != null ? 
columnMetadata.majorType() : null;
+        TypeProtos.MajorType majorType = columnMetadata != null ? 
columnMetadata.majorType() : NULL;

Review comment:
       It is used to collect metadata for partition columns only. Even if some 
specific field is not determined as a partition, it still may have its metadata 
and be used in row group pruning later. But if you will set the type to NULL, 
it may cause issues for this case when it is used later for some specific types 
like `FIXED_LEN_BYTE_ARRAY` or `BINARY`. So it is better do not mark such field 
as partition at all and avoid possible issues with determining comparator (see 
`ParquetGroupScanStatistics.getTypeForColumn()` method usage).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [drill] vvysotskyi commented on a change in pull request #2238: DRILL-7934: Fix NullPointerException error when reading parquet files

Reply via email to