keith-turner commented on code in PR #4162:
URL: https://github.com/apache/accumulo/pull/4162#discussion_r1456394445
##########
test/src/main/java/org/apache/accumulo/test/functional/BulkNewIT.java:
##########
@@ -621,6 +623,65 @@ public void testManyFiles() throws Exception {
}
}
+ @Test
+ public void testConcurrentCompactions() throws Exception {
+ // Tests compactions running concurrently with bulk import to ensure that
data is not bulk
+ // imported twice. Doing a large number of bulk imports should naturally
cause compactions to
+ // happen. This test ensures that compactions running concurrently with
bulk import does not
+ // cause duplicate imports of a files. For example if a files is imported
into a tablet and then
+ // compacted away then the file should not be imported again by the FATE
operation doing the
+ // bulk import. The test is structured in such a way that duplicate
imports would be detected.
+
+ try (AccumuloClient c =
Accumulo.newClient().from(getClientProps()).build()) {
+ c.tableOperations().delete(tableName);
+ // Create table without versioning iterator. This done to detect the
same file being imported
+ // more than once.
+ c.tableOperations().create(tableName, new
NewTableConfiguration().withoutDefaultIterators());
+
+ addSplits(c, tableName, "0999 1999 2999 3999 4999 5999 6999 7999 8999");
+
+ String dir = getDir("/testBulkFile-");
+
+ final int N = 100;
+
+ // Do N bulk imports of the exact same data.
+ for (int i = 0; i < N; i++) {
+ // Create 10 files for the bulk import.
+ for (int f = 0; f < 10; f++) {
+ writeData(dir + "/f" + f + ".", aconf, f * 1000, (f + 1) * 1000 - 1);
+ }
+
c.tableOperations().importDirectory(dir).to(tableName).tableTime(true).load();
+ getCluster().getFileSystem().delete(new Path(dir), true);
+ }
Review Comment:
@DomGarguilo I updated the test in 2798d19 to run the bulk imports in
parallel and serially. I did not use the `parallel()` stream method as it uses
a JVM wide thread pool and I am uncertain about the implications of placing
task that do I/O on that thread pool.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]