sudssf commented on a change in pull request #882: fix: Failed to get status 
issue because of s3 eventual consistency
URL: https://github.com/apache/incubator-iceberg/pull/882#discussion_r401960590
 
 

 ##########
 File path: 
spark/src/test/java/org/apache/iceberg/spark/source/TestDataSourceOptions.java
 ##########
 @@ -201,6 +201,36 @@ public void testSplitOptionsOverridesTableProperties() 
throws IOException {
     Assert.assertEquals("Spark partitions should match", 2, 
resultDf.javaRDD().getNumPartitions());
   }
 
+  @Test
+  public void testSplitOptionsOverridesTablePropertiesWithWriterLength() 
throws IOException {
+    String tableLocation = temp.newFolder("iceberg-table").toString();
+
+    HadoopTables tables = new HadoopTables(CONF);
+    PartitionSpec spec = PartitionSpec.unpartitioned();
+    Map<String, String> options = Maps.newHashMap();
+    options.put(TableProperties.SPLIT_SIZE, String.valueOf(128L * 1024 * 
1024)); // 128Mb
+    tables.create(SCHEMA, spec, options, tableLocation);
+
+    List<SimpleRecord> expectedRecords = Lists.newArrayList(
+        new SimpleRecord(1, "a"),
+        new SimpleRecord(2, "b")
+    );
+    Dataset<Row> originalDf = spark.createDataFrame(expectedRecords, 
SimpleRecord.class);
+    originalDf.select("id", "data").write()
+        .format("iceberg")
+        .mode("append")
+        .option("use-writer-length-as-file-size", true)
+        .save(tableLocation);
+
+    Dataset<Row> resultDf = spark.read()
+        .format("iceberg")
+        .option("split-size", String.valueOf(611 + 103)) // 611 bytes is the 
size of SimpleRecord(1,"a")
 
 Review comment:
   I think this happens only for parquet. 
   
https://github.com/apache/incubator-iceberg/blob/master/parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java#L142
 
   `writeStore` seems to return non zero results for `getBufferedSize` after 
close.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to