openinx commented on pull request #3365: URL: https://github.com/apache/iceberg/pull/3365#issuecomment-951534337
> @openinx, why would a manifest merge be triggered? It seems like this should not happen, or should happen consistently. I don't understand why this could cause a test to be flaky. Reconsidered this question and checked[ the condition to trigger real manifest file merge](https://github.com/apache/iceberg/blob/90225d6c9413016d611e2ce5eff37db1bc1b4fc5/core/src/main/java/org/apache/iceberg/ManifestMergeManager.java#L134) in MergeAppend, in this testHashDistributeMode unit tests, we will produce 9 data files at most, so it's unlikely to trigger the manifests merge ( the default threshold is 100) . ```java public static final String MANIFEST_MIN_MERGE_COUNT = "commit.manifest.min-count-to-merge"; public static final int MANIFEST_MIN_MERGE_COUNT_DEFAULT = 100; ``` I also wrote a small case to verify this: ```java @RunWith(Parameterized.class) public class TestSimpleDataUtil extends TableTestBase { @Parameterized.Parameters(name = "formatVersion = {0}") public static Object[] parameters() { return new Object[] {1}; // We don't actually use the format version since everything is mock } public TestSimpleDataUtil(int formatVersion) { super(formatVersion); } @Test public void testDataFiles() throws IOException { table.newAppend() .appendFile(FILE_A) .commit(); table.newAppend() .appendFile(FILE_B) .commit(); table.newAppend() .appendFile(FILE_C) .commit(); Map<Long, List<DataFile>> files = SimpleDataUtil.snapshotToDataFiles(table); Assert.assertEquals(3, files.size()); for (Map.Entry<Long, List<DataFile>> entry : files.entrySet()) { Assert.assertEquals(1, entry.getValue().size()); } } } ``` So I think @rdblue is right, we still don't get the real root cause why does it break the UT. Will need more careful work to catch it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
