openinx commented on a change in pull request #1774:
URL: https://github.com/apache/iceberg/pull/1774#discussion_r530219917
##########
File path:
spark/src/test/java/org/apache/iceberg/spark/source/TestSparkDataWrite.java
##########
@@ -374,6 +375,58 @@ public void
testPartitionedCreateWithTargetFileSizeViaOption() throws IOExceptio
}
}
+ @Test
+ public void testPartitionedFanoutCreateWithTargetFileSizeViaOption() throws
IOException {
+ File parent = temp.newFolder(format.toString());
+ File location = new File(parent, "test");
+
+ HadoopTables tables = new HadoopTables(CONF);
+ PartitionSpec spec =
PartitionSpec.builderFor(SCHEMA).identity("data").build();
+ Table table = tables.create(SCHEMA, spec, location.toString());
+ table.updateProperties()
+ .set(WRITE_PARTITIONED_FANOUT_ENABLED, "true")
+ .commit();
+
+ List<SimpleRecord> expected = Lists.newArrayListWithCapacity(8000);
+ for (int i = 0; i < 2000; i++) {
+ expected.add(new SimpleRecord(i, "a"));
+ expected.add(new SimpleRecord(i, "b"));
+ expected.add(new SimpleRecord(i, "c"));
+ expected.add(new SimpleRecord(i, "d"));
+ }
+
+ Dataset<Row> df = spark.createDataFrame(expected, SimpleRecord.class);
+
+ df.select("id", "data").sort("data").write()
Review comment:
For partitioned fanout case, we don't have to sort based on `data`
column ? Otherwise, what's the difference compared to `PartitionedWriter` ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]