KazydubB commented on a change in pull request #1861: DRILL-7380: Query of a 
field inside of an array of structs returns null
URL: https://github.com/apache/drill/pull/1861#discussion_r328618759
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetReader.java
 ##########
 @@ -111,20 +110,37 @@ public DrillParquetReader(FragmentContext 
fragmentContext,
     this.numRecordsToRead = initNumRecordsToRead(recordsToRead, 
entry.getRowGroupIndex(), footer);
   }
 
+  /**
+   * Creates projection MessageType from projection columns and given schema.
+   *
+   * @param schema Parquet file schema
+   * @param projectionColumns columns to search
+   * @param columnsNotFound any projection column which wasn't found in schema 
is added to the list
+   * @return projection containing matched columns or null if none column 
matches schema
+   */
   private static MessageType getProjection(MessageType schema,
-                                           Collection<SchemaPath> columns,
+                                           Collection<SchemaPath> 
projectionColumns,
                                            List<SchemaPath> columnsNotFound) {
-    MessageType projection = null;
-
-    String messageName = schema.getName();
-    List<ColumnDescriptor> schemaColumns = schema.getColumns();
-    // parquet type.union() seems to lose ConvertedType info when merging two 
columns that are the same type. This can
-    // happen when selecting two elements from an array. So to work around 
this, we use set of SchemaPath to avoid duplicates
-    // and then merge the types at the end
-    Set<SchemaPath> selectedSchemaPaths = new LinkedHashSet<>();
+    projectionColumns = adaptColumnsToParquetSchema(projectionColumns, schema);
+    List<SchemaPath> schemaColumns = getAllColumnsFrom(schema);
+    Set<SchemaPath> selectedSchemaPaths = 
matchProjectionWithSchemaColumns(projectionColumns, schemaColumns, 
columnsNotFound);
+    MessageType projection = convertSelectedColumnsToMessageType(schema, 
selectedSchemaPaths);
+    return projection;
+  }
 
-    // get a list of modified columns which have the array elements removed 
from the schema path since parquet schema doesn't include array elements
-    // or if field is (Parquet's) MAP then array/name segments are removed 
from the schema as well as obtaining elements by key is handled in 
EvaluationVisitor.
+  /**
+   * This method adjusts collection of SchemaPath projection columns to better 
match columns in given
+   * schema. It does few things to reach the goal:
+   *    - skips ArraySegments if present;
 
 Review comment:
   nit: enumerate the cases in HTML's  `<ul>`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to