szehon-ho commented on code in PR #7920:
URL: https://github.com/apache/iceberg/pull/7920#discussion_r1255305634
##########
spark/v3.1/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java:
##########
@@ -2028,4 +2053,22 @@ public static Dataset<Row> selectNonDerived(Dataset<Row>
metadataTable) {
public static Types.StructType nonDerivedSchema(Dataset<Row> metadataTable) {
return
SparkSchemaUtil.convert(selectNonDerived(metadataTable).schema()).asStruct();
}
+
+ private long totalSizeInBytes(Iterable<DataFile> dataFiles) {
+ return
Lists.newArrayList(dataFiles).stream().mapToLong(DataFile::fileSizeInBytes).sum();
+ }
+
+ private List<DataFile> dataFiles(Table table) {
+ CloseableIterable<FileScanTask> tasks = table.newScan().planFiles();
+ return Lists.newArrayList(CloseableIterable.transform(tasks,
FileScanTask::file));
+ }
+
+ private void assertDataFilePartitions(List<DataFile> dataFiles, int[]
expectedPartitionIds) {
Review Comment:
Nit: we can put back the size check.
##########
spark/v3.1/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java:
##########
@@ -1469,12 +1484,18 @@ public void testPartitionsTableLastUpdatedSnapshot() {
new GenericRecordBuilder(
AvroSchemaUtil.convert(
partitionsTable.schema().findType("partition").asStructType(),
"partition"));
+
+ List<DataFile> dataFiles = dataFiles(table);
+ Assert.assertEquals("Table should have 3 data files", 3, dataFiles.size());
+ assertDataFilePartitions(dataFiles, new int[] {1, 2, 2});
Review Comment:
Sorry I'm trying to remember why I said the previous comment, I was probably
thinking varargs, like
`assertDataFilePartitions(List<DataFile>, int... partitions)` so then it
would just be ` assertDataFilePartitions(dataFiles, 1,2,3)`, but on second look
it may not be so clean.
I think Arrays.asList() is equally long, let's revert to that, and it may
make the length check easier inside the method?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]