RussellSpitzer commented on code in PR #6461:
URL: https://github.com/apache/iceberg/pull/6461#discussion_r1053411922
##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRequiredDistributionAndOrdering.java:
##########
@@ -298,4 +300,33 @@ public void testRangeDistributionWithQuotedColumnNames()
throws NoSuchTableExcep
ImmutableList.of(row(7L)),
sql("SELECT count(*) FROM %s", tableName));
}
+
+ @Test
Review Comment:
I think the bigger problem here is that we do not have a guarantee that this
file was actually sorted according to the spec. We'll need to check
distributions configuration and make sure the Spark writer is actually writing
sorted data.
##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java:
##########
@@ -654,6 +654,7 @@ public DataWriter<InternalRow> createWriter(int
partitionId, long taskId, long e
.dataFileFormat(format)
.dataSchema(writeSchema)
.dataSparkType(dsSchema)
+ .dataSortOrder(table.sortOrder())
Review Comment:
There is no guarantee that this file is actually sorted when written using
the Spark Writer. I think we'll need to do a bit of extra work for this to
actually be the case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]