KazydubB commented on a change in pull request #1861: DRILL-7380: Query of a
field inside of an array of structs returns null
URL: https://github.com/apache/drill/pull/1861#discussion_r328618759
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetReader.java
##########
@@ -111,20 +110,37 @@ public DrillParquetReader(FragmentContext
fragmentContext,
this.numRecordsToRead = initNumRecordsToRead(recordsToRead,
entry.getRowGroupIndex(), footer);
}
+ /**
+ * Creates projection MessageType from projection columns and given schema.
+ *
+ * @param schema Parquet file schema
+ * @param projectionColumns columns to search
+ * @param columnsNotFound any projection column which wasn't found in schema
is added to the list
+ * @return projection containing matched columns or null if none column
matches schema
+ */
private static MessageType getProjection(MessageType schema,
- Collection<SchemaPath> columns,
+ Collection<SchemaPath>
projectionColumns,
List<SchemaPath> columnsNotFound) {
- MessageType projection = null;
-
- String messageName = schema.getName();
- List<ColumnDescriptor> schemaColumns = schema.getColumns();
- // parquet type.union() seems to lose ConvertedType info when merging two
columns that are the same type. This can
- // happen when selecting two elements from an array. So to work around
this, we use set of SchemaPath to avoid duplicates
- // and then merge the types at the end
- Set<SchemaPath> selectedSchemaPaths = new LinkedHashSet<>();
+ projectionColumns = adaptColumnsToParquetSchema(projectionColumns, schema);
+ List<SchemaPath> schemaColumns = getAllColumnsFrom(schema);
+ Set<SchemaPath> selectedSchemaPaths =
matchProjectionWithSchemaColumns(projectionColumns, schemaColumns,
columnsNotFound);
+ MessageType projection = convertSelectedColumnsToMessageType(schema,
selectedSchemaPaths);
+ return projection;
+ }
- // get a list of modified columns which have the array elements removed
from the schema path since parquet schema doesn't include array elements
- // or if field is (Parquet's) MAP then array/name segments are removed
from the schema as well as obtaining elements by key is handled in
EvaluationVisitor.
+ /**
+ * This method adjusts collection of SchemaPath projection columns to better
match columns in given
+ * schema. It does few things to reach the goal:
+ * - skips ArraySegments if present;
Review comment:
nit: enumerate the cases in HTML's `<ul>`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services