rdsr commented on a change in pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#discussion_r460360925



##########
File path: orc/src/main/java/org/apache/iceberg/orc/ApplyNameMapping.java
##########
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.orc;
+
+import java.util.List;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.NameMapping;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.orc.TypeDescription;
+
+class ApplyNameMapping extends OrcSchemaVisitor<TypeDescription> {
+  private final NameMapping nameMapping;
+
+  ApplyNameMapping(NameMapping nameMapping) {
+    this.nameMapping = nameMapping;
+  }
+
+  @Override
+  public String elementName() {
+    return "element";
+  }
+
+  @Override
+  public String keyName() {
+    return "key";
+  }
+
+  @Override
+  public String valueName() {
+    return "value";
+  }
+
+  TypeDescription setId(TypeDescription type, MappedField mappedField) {
+    if (mappedField != null) {
+      type.setAttribute(ORCSchemaUtil.ICEBERG_ID_ATTRIBUTE, 
mappedField.id().toString());
+    }
+    return type;
+  }
+
+  @Override
+  public TypeDescription record(TypeDescription record, List<String> names, 
List<TypeDescription> fields) {
+    Preconditions.checkArgument(names.size() == fields.size(), "All fields 
must have names");
+    MappedField field = nameMapping.find(currentPath());
+    TypeDescription structType = TypeDescription.createStruct();
+
+    for (int i = 0; i < fields.size(); i++) {
+      String fieldName = names.get(i);
+      TypeDescription fieldType = fields.get(i);
+      if (fieldType != null) {
+        structType.addField(fieldName, fieldType);
+      }
+    }
+    return setId(structType, field);
+  }
+
+  @Override
+  public TypeDescription list(TypeDescription array, TypeDescription element) {
+    Preconditions.checkArgument(element != null, "List type must have element 
type");
+
+    MappedField field = nameMapping.find(currentPath());
+    TypeDescription listType = TypeDescription.createList(element);
+    return setId(listType, field);
+  }
+
+  @Override
+  public TypeDescription map(TypeDescription map, TypeDescription key, 
TypeDescription value) {
+    Preconditions.checkArgument(key != null && value != null, "Map type must 
have both key and value types");
+
+    MappedField field = nameMapping.find(currentPath());
+    TypeDescription mapType = TypeDescription.createMap(key, value);
+    return setId(mapType, field);
+  }
+
+  @Override
+  public TypeDescription primitive(TypeDescription primitive) {
+    MappedField field = nameMapping.find(currentPath());
+    return setId(primitive.clone(), field);

Review comment:
       Why do we need to call clone here?

##########
File path: 
orc/src/test/java/org/apache/iceberg/orc/TestExpressionToSearchArgument.java
##########
@@ -262,4 +264,65 @@ public void testEvolvedSchema() {
     actual = ExpressionToSearchArgument.convert(boundFilter, readSchema);
     Assert.assertEquals(expected.toString(), actual.toString());
   }
+
+  @Test
+  public void testOriginalSchemaNameMapping() {
+    Schema originalSchema = new Schema(
+        required(1, "int", Types.IntegerType.get()),
+        optional(2, "long", Types.LongType.get())
+    );
+
+    TypeDescription orcSchemaWithoutIds = 
ORCSchemaUtil.removeIds(ORCSchemaUtil.convert(originalSchema));
+    NameMapping nameMapping = MappingUtil.create(originalSchema);
+
+    TypeDescription readSchema = 
ORCSchemaUtil.buildOrcProjection(originalSchema,
+        ORCSchemaUtil.applyNameMapping(orcSchemaWithoutIds, nameMapping));
+
+    Expression expr = and(equal("int", 1), equal("long", 1));
+    Expression boundFilter = Binder.bind(originalSchema.asStruct(), expr, 
true);
+    SearchArgument expected = SearchArgumentFactory.newBuilder()
+        .equals("`int`", Type.LONG, 1L)
+        .equals("`long`", Type.LONG, 1L)
+        .build();
+
+    SearchArgument actual = ExpressionToSearchArgument.convert(boundFilter, 
readSchema);
+    Assert.assertEquals(expected.toString(), actual.toString());
+  }
+
+  @Test
+  public void testModifiedSchemaNameMapping() {

Review comment:
       can we have aa test case for where complex fields are also pruned? For 
instance if all fields of a struct are pruned, the struct is pruned. Similarly 
for maps and lists?
   
   

##########
File path: orc/src/main/java/org/apache/iceberg/orc/ApplyNameMapping.java
##########
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.orc;
+
+import java.util.List;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.NameMapping;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.orc.TypeDescription;
+
+class ApplyNameMapping extends OrcSchemaVisitor<TypeDescription> {
+  private final NameMapping nameMapping;
+
+  ApplyNameMapping(NameMapping nameMapping) {
+    this.nameMapping = nameMapping;
+  }
+
+  @Override
+  public String elementName() {
+    return "element";
+  }
+
+  @Override
+  public String keyName() {
+    return "key";
+  }
+
+  @Override
+  public String valueName() {
+    return "value";
+  }
+
+  TypeDescription setId(TypeDescription type, MappedField mappedField) {
+    if (mappedField != null) {

Review comment:
       shouldn't a 'type' be pruned if a mappedField is missing or null?

##########
File path: orc/src/main/java/org/apache/iceberg/orc/OrcSchemaVisitor.java
##########
@@ -83,9 +112,53 @@
     return visitor.record(record, names, visitFields(fields, names, visitor));
   }
 
-  public void beforeField(String name, TypeDescription type) {}
+  public String elementName() {
+    return "_elem";

Review comment:
       Why underscores?

##########
File path: orc/src/main/java/org/apache/iceberg/orc/RemoveIds.java
##########
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.orc;
+
+import java.util.List;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.orc.TypeDescription;
+
+class RemoveIds extends OrcSchemaVisitor<TypeDescription> {
+
+  @Override
+  public TypeDescription record(TypeDescription record, List<String> names, 
List<TypeDescription> fields) {
+    Preconditions.checkArgument(names.size() == fields.size(), "All fields 
must have names.");
+    TypeDescription struct = TypeDescription.createStruct();
+
+    for (int i = 0; i < fields.size(); i++) {
+      struct.addField(names.get(i), fields.get(i));
+    }
+    return struct;
+  }
+
+  @Override
+  public TypeDescription list(TypeDescription array, TypeDescription element) {
+    return TypeDescription.createList(element);
+  }
+
+  @Override
+  public TypeDescription map(TypeDescription map, TypeDescription key, 
TypeDescription value) {
+    return TypeDescription.createMap(key, value);
+  }
+
+  @Override
+  public TypeDescription primitive(TypeDescription primitive) {
+    return removeIcebergAttributes(primitive.clone());
+  }
+
+  private static TypeDescription removeIcebergAttributes(TypeDescription 
orcType) {
+    orcType.removeAttribute(ORCSchemaUtil.ICEBERG_ID_ATTRIBUTE);

Review comment:
       Why don't we only remove the ID attribute?

##########
File path: orc/src/main/java/org/apache/iceberg/orc/OrcIterable.java
##########
@@ -73,7 +78,15 @@
   public CloseableIterator<T> iterator() {
     Reader orcFileReader = ORC.newFileReader(file, config);
     addCloseable(orcFileReader);
-    TypeDescription readOrcSchema = ORCSchemaUtil.buildOrcProjection(schema, 
orcFileReader.getSchema());
+
+    TypeDescription fileSchema = orcFileReader.getSchema();
+    final TypeDescription readOrcSchema;
+    if (ORCSchemaUtil.hasIds(fileSchema)) {
+      readOrcSchema = ORCSchemaUtil.buildOrcProjection(schema, fileSchema);
+    } else {
+      TypeDescription typeWithIds = ORCSchemaUtil.applyNameMapping(fileSchema, 
nameMapping);

Review comment:
       IIRC, nameMapping may not have all the fields which maybe present in the 
file schema. I think we need a way to prune columns which does not have an 
associated mapping defined. I can't say whether `buildOrcProjection` is 
handling that case or not.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to