rdblue commented on a change in pull request #2877:
URL: https://github.com/apache/iceberg/pull/2877#discussion_r682918017



##########
File path: api/src/main/java/org/apache/iceberg/util/StructProjection.java
##########
@@ -82,8 +85,31 @@ private StructProjection(StructType structType, StructType 
projection) {
                   dataField.type().asStructType(), 
projectedField.type().asStructType());
               break;
             case MAP:
+              MapType projectedMap = projectedField.type().asMapType();
+              MapType originalMap = dataField.type().asMapType();
+
+              boolean keyProjectable = !projectedMap.keyType().isStructType() 
||
+                  projectedMap.keyType().equals(originalMap.keyType());
+              boolean valueProjectable = 
!projectedMap.valueType().isStructType() ||

Review comment:
       It looks like this will support things like `map<string, map<string, 
int>>`. I don't think that will be a problem.

##########
File path: api/src/main/java/org/apache/iceberg/util/StructProjection.java
##########
@@ -82,8 +85,31 @@ private StructProjection(StructType structType, StructType 
projection) {
                   dataField.type().asStructType(), 
projectedField.type().asStructType());
               break;
             case MAP:
+              MapType projectedMap = projectedField.type().asMapType();
+              MapType originalMap = dataField.type().asMapType();
+
+              boolean keyProjectable = !projectedMap.keyType().isStructType() 
||
+                  projectedMap.keyType().equals(originalMap.keyType());
+              boolean valueProjectable = 
!projectedMap.valueType().isStructType() ||
+                  projectedMap.valueType().equals(originalMap.valueType());
+
+              Preconditions.checkArgument(keyProjectable && valueProjectable,
+                  "Cannot perform a projection of a map unless key and value 
types are primitive or a " +
+                      "struct which is fully projected. Trying to project %s 
out of %s", projectedField, dataField);
+              nestedProjections[pos] = null;
+              break;
             case LIST:
-              throw new IllegalArgumentException(String.format("Cannot project 
list or map field: %s", projectedField));
+              ListType projectedList = projectedField.type().asListType();
+              ListType originalList = dataField.type().asListType();
+
+              boolean elementProjectable = 
!projectedList.elementType().isStructType() ||
+                  
projectedList.elementType().equals(originalList.elementType());
+
+              Preconditions.checkArgument(elementProjectable,
+                  "Cannot perform a projection of a list unless it's element 
is a primitive or a struct which is " +

Review comment:
       What about something shorter, like "Cannot project a partial list 
element struct: %s from %s"?

##########
File path: 
spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java
##########
@@ -724,6 +875,64 @@ public void testManifestsTable() {
     TestHelpers.assertEqualsSafe(manifestTable.schema().asStruct(), 
expected.get(0), actual.get(0));
   }
 
+  @Test
+  public void testPruneManifestsTable() {
+    TableIdentifier tableIdentifier = TableIdentifier.of("db", 
"manifests_test");
+    Table table = createTable(tableIdentifier, SCHEMA, 
PartitionSpec.builderFor(SCHEMA).identity("id").build());
+    Table manifestTable = loadTable(tableIdentifier, "manifests");
+    Dataset<Row> df1 = spark.createDataFrame(
+        Lists.newArrayList(new SimpleRecord(1, "a"), new SimpleRecord(null, 
"b")), SimpleRecord.class);
+
+    df1.select("id", "data").write()
+        .format("iceberg")
+        .mode("append")
+        .save(loadLocation(tableIdentifier));
+
+    AssertHelpers.assertThrows("Can't prune struct inside list", 
SparkException.class,
+        "Cannot perform a projection of a list",
+        () -> spark.read()
+            .format("iceberg")
+            .load(loadLocation(tableIdentifier, "manifests"))
+            .select("partition_spec_id", "path", 
"partition_summaries.contains_null")
+            .collectAsList());
+
+    Dataset<Row> actualDf = spark.read()
+        .format("iceberg")
+        .load(loadLocation(tableIdentifier, "manifests"))
+        .select("partition_spec_id", "path", "partition_summaries");
+
+    Schema projectedSchema = SparkSchemaUtil.convert(actualDf.schema());
+
+    List<Row> actual = spark.read()
+        .format("iceberg")
+        .load(loadLocation(tableIdentifier, "manifests"))
+        .select("partition_spec_id", "path", "partition_summaries")
+        .collectAsList();
+
+    table.refresh();
+
+    GenericRecordBuilder builder = new 
GenericRecordBuilder(AvroSchemaUtil.convert(projectedSchema.asStruct()));
+    GenericRecordBuilder summaryBuilder = new 
GenericRecordBuilder(AvroSchemaUtil.convert(
+        
projectedSchema.findType("partition_summaries.element").asStructType(), 
"partition_summary"));
+    List<GenericData.Record> expected = 
Lists.transform(table.currentSnapshot().allManifests(), manifest ->
+        builder.set("partition_spec_id", manifest.partitionSpecId())
+            .set("path", manifest.path())
+            .set("partition_summaries", Lists.transform(manifest.partitions(), 
partition ->
+                summaryBuilder
+                    .set("contains_null", true)
+                    .set("contains_nan", false)
+                    .set("lower_bound", "1")
+                    .set("upper_bound", "1")
+                    .build()
+            ))
+            .build()
+    );
+
+    Assert.assertEquals("Manifests table should have one manifest row", 1, 
actual.size());
+    TestHelpers.assertEqualsSafe(projectedSchema.asStruct(), expected.get(0), 
actual.get(0));
+  }
+
+

Review comment:
       Nit: unnecessary newline.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to