[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6461: Spark-3.3: Store sort-order-id in manifest_entry's data_file

GitBox Tue, 20 Dec 2022 06:59:19 -0800


RussellSpitzer commented on code in PR #6461:
URL: https://github.com/apache/iceberg/pull/6461#discussion_r1053411922



##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRequiredDistributionAndOrdering.java:
##########
@@ -298,4 +300,33 @@ public void testRangeDistributionWithQuotedColumnNames() 
throws NoSuchTableExcep
         ImmutableList.of(row(7L)),
         sql("SELECT count(*) FROM %s", tableName));
   }
+
+  @Test

Review Comment:
   I think the bigger problem here is that we do not have a guarantee that this 
file was actually sorted according to the spec. We'll need to check 
distributions configuration and make sure the Spark writer is actually writing 
sorted data.



##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java:
##########
@@ -654,6 +654,7 @@ public DataWriter<InternalRow> createWriter(int 
partitionId, long taskId, long e
               .dataFileFormat(format)
               .dataSchema(writeSchema)
               .dataSparkType(dsSchema)
+              .dataSortOrder(table.sortOrder())

Review Comment:
   There is no guarantee that this file is actually sorted when written using 
the Spark Writer. I think we'll need to do a bit of extra work for this to 
actually be the case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6461: Spark-3.3: Store sort-order-id in manifest_entry's data_file

Reply via email to