RussellSpitzer edited a comment on issue #2195:
URL: https://github.com/apache/iceberg/issues/2195#issuecomment-771811055


   I wrote a programmatic repo which fits into TestRewriteDataFilesAction, 
(although I would like to shrink the test file size). Oddly this is not 
deterministic, It does not fail every time ...
   
   ```java
     @Test
     public void testRewriteDataFilesLargeFile() {
       PartitionSpec spec = PartitionSpec.unpartitioned();
       Map<String, String> options = Maps.newHashMap();
       Table table = TABLES.create(SCHEMA, spec, options, tableLocation);
   
       Assert.assertNull("Table must be empty", table.currentSnapshot());
   
       List<ThreeColumnRecord> records1 = Lists.newArrayList();
   
       IntStream.range(0, 2000000).forEach(i -> records1.add(new 
ThreeColumnRecord(i, "foo" + i, "bar" + i)));
       Dataset<Row> df = spark.createDataFrame(records1, 
ThreeColumnRecord.class).repartition(1);
       writeDF(df);
   
       List<ThreeColumnRecord> records2 = Lists.newArrayList(
           new ThreeColumnRecord(2, "CCCCCCCCCC", "CCCC"),
           new ThreeColumnRecord(2, "DDDDDDDDDD", "DDDD")
       );
       writeRecords(records2);
   
       table.refresh();
   
       Actions actions = Actions.forTable(table);
   
       long originalNumRecords = 
spark.read().format("iceberg").load(tableLocation).count();
   
       table.updateProperties().set(TableProperties.SPLIT_SIZE, 
String.valueOf(1024 * 1024 * 10));
       table.refresh();
   
       actions.rewriteDataFiles()
           .targetSizeInBytes(1024 * 1024 * 10)
           .execute();
   
       long postRewriteNumRecords = 
spark.read().format("iceberg").load(tableLocation).count();
   
       Assert.assertEquals(originalNumRecords, postRewriteNumRecords);
     }
     ```
     This code occasionally drops all of the records from the first write.
     
     ```
     java.lang.AssertionError: expected:<2000002> but was:<2>
   ```
   
   
![image](https://user-images.githubusercontent.com/413025/106638385-6d8e6a80-6549-11eb-9fff-2fc15421a428.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to