openinx commented on a change in pull request #1774:
URL: https://github.com/apache/iceberg/pull/1774#discussion_r530731975
##########
File path:
spark/src/test/java/org/apache/iceberg/spark/source/TestSparkDataWrite.java
##########
@@ -327,51 +328,12 @@ public void
testUnpartitionedCreateWithTargetFileSizeViaTableProperties() throws
@Test
public void testPartitionedCreateWithTargetFileSizeViaOption() throws
IOException {
- File parent = temp.newFolder(format.toString());
- File location = new File(parent, "test");
-
- HadoopTables tables = new HadoopTables(CONF);
- PartitionSpec spec =
PartitionSpec.builderFor(SCHEMA).identity("data").build();
- Table table = tables.create(SCHEMA, spec, location.toString());
-
- List<SimpleRecord> expected = Lists.newArrayListWithCapacity(8000);
- for (int i = 0; i < 2000; i++) {
- expected.add(new SimpleRecord(i, "a"));
- expected.add(new SimpleRecord(i, "b"));
- expected.add(new SimpleRecord(i, "c"));
- expected.add(new SimpleRecord(i, "d"));
- }
-
- Dataset<Row> df = spark.createDataFrame(expected, SimpleRecord.class);
-
- df.select("id", "data").sort("data").write()
- .format("iceberg")
- .option("write-format", format.toString())
- .mode("append")
- .option("target-file-size-bytes", 4) // ~4 bytes; low enough to trigger
- .save(location.toString());
-
- table.refresh();
-
- Dataset<Row> result = spark.read()
- .format("iceberg")
- .load(location.toString());
-
- List<SimpleRecord> actual =
result.orderBy("id").as(Encoders.bean(SimpleRecord.class)).collectAsList();
- Assert.assertEquals("Number of rows should match", expected.size(),
actual.size());
- Assert.assertEquals("Result rows should match", expected, actual);
+ partitionedCreateWithTargetFileSizeViaOption(false);
+ }
- List<DataFile> files = Lists.newArrayList();
- for (ManifestFile manifest : table.currentSnapshot().allManifests()) {
- for (DataFile file : ManifestFiles.read(manifest, table.io())) {
- files.add(file);
- }
- }
- // TODO: ORC file now not support target file size
- if (!format.equals(FileFormat.ORC)) {
- Assert.assertEquals("Should have 8 DataFiles", 8, files.size());
- Assert.assertTrue("All DataFiles contain 1000 rows",
files.stream().allMatch(d -> d.recordCount() == 1000));
- }
+ @Test
+ public void testPartitionedFanoutCreateWithTargetFileSizeViaOption() throws
IOException {
+ partitionedCreateWithTargetFileSizeViaOption(true);
Review comment:
Do we need an unit test to cover the spark's
`partitioned.fanout.enabled` option ? I saw there's an unit test which use
the table's `write.partitioned.fanout.enabled` property to define the `fanout`
behavior.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]