Re: [PR] DRILL-8507, DRILL-8508 Better handling of partially missing parquet columns (drill)

via GitHub Mon, 02 Sep 2024 10:52:48 -0700


ychernysh commented on code in PR #2937:
URL: https://github.com/apache/drill/pull/2937#discussion_r1741155460



##########
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetTableMetadataUtils.java:
##########
@@ -661,6 +663,12 @@ static Map<SchemaPath, TypeProtos.MajorType> 
resolveFields(MetadataBase.ParquetT
       // row groups in the file have the same schema, so using the first one
       Map<SchemaPath, TypeProtos.MajorType> fileColumns = 
getFileFields(parquetTableMetadata, file);
       fileColumns.forEach((columnPath, type) -> putType(columns, columnPath, 
type));
+      // If at least 1 parquet file to read doesn't contain a column, enforce 
this column
+      // DataMode to OPTIONAL in the overall table schema

Review Comment:
   The first item is about resolving different data types even if there are no 
missing columns, which I didn't cover.
   `but only if the other types are REQUIRED` - is this condition necessary?
   Regarding REPEATED - I haven't covered it in any way. 
   
   In theory, implementing these should not be that hard..l



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] DRILL-8507, DRILL-8508 Better handling of partially missing parquet columns (drill)

Reply via email to