[GitHub] [iceberg] rdblue commented on a diff in pull request #5665: Core: Correctly project the partition fields

GitBox Thu, 01 Sep 2022 15:59:59 -0700


rdblue commented on code in PR #5665:
URL: https://github.com/apache/iceberg/pull/5665#discussion_r961164228



##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkDataFile.java:
##########
@@ -84,10 +93,31 @@ public SparkDataFile(Types.StructType type, StructType 
sparkType) {
     sortOrderIdPosition = positions.get("sort_order_id");
   }
 
+  private void wrapPartitionSpec(GenericRowWithSchema specRow) {
+    // We get all the partition fields, but want to project to the current one
+    StructType wrappedPartitionStruct = specRow.schema();
+
+    if (!wrappedPartitionStruct.equals(currentWrappedPartitionStruct)) {
+      this.currentWrappedPartitionStruct = wrappedPartitionStruct;
+
+      // The original IDs are lost in translation, therefore we apply the ones 
that we know

Review Comment:
   @Fokko, I think what gets passed depends on what's already broadcasted and 
available (which I suspect is why Anton suggested `Broadcast<Table>`). The most 
direct route would be to send the combined struct type and the spec (since I 
think that we always write new manifests with just one spec).
   
   I would probably do this in `toManifest` because I think that we want to 
reuse the projection across rows. Doing it inside `SparkDataFile` would create 
a new projection each time, which has needed to be fixed in other places. We 
don't want to do the struct type comparison for each row.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a diff in pull request #5665: Core: Correctly project the partition fields

Reply via email to