keith-turner commented on code in PR #4162:
URL: https://github.com/apache/accumulo/pull/4162#discussion_r1456102678


##########
test/src/main/java/org/apache/accumulo/test/functional/BulkNewIT.java:
##########
@@ -621,6 +623,65 @@ public void testManyFiles() throws Exception {
     }
   }
 
+  @Test
+  public void testConcurrentCompactions() throws Exception {
+    // Tests compactions running concurrently with bulk import to ensure that 
data is not bulk
+    // imported twice. Doing a large number of bulk imports should naturally 
cause compactions to
+    // happen. This test ensures that compactions running concurrently with 
bulk import does not
+    // cause duplicate imports of a files. For example if a files is imported 
into a tablet and then
+    // compacted away then the file should not be imported again by the FATE 
operation doing the
+    // bulk import. The test is structured in such a way that duplicate 
imports would be detected.
+
+    try (AccumuloClient c = 
Accumulo.newClient().from(getClientProps()).build()) {
+      c.tableOperations().delete(tableName);
+      // Create table without versioning iterator. This done to detect the 
same file being imported
+      // more than once.
+      c.tableOperations().create(tableName, new 
NewTableConfiguration().withoutDefaultIterators());
+
+      addSplits(c, tableName, "0999 1999 2999 3999 4999 5999 6999 7999 8999");
+
+      String dir = getDir("/testBulkFile-");
+
+      final int N = 100;
+
+      // Do N bulk imports of the exact same data.
+      for (int i = 0; i < N; i++) {
+        // Create 10 files for the bulk import.
+        for (int f = 0; f < 10; f++) {
+          writeData(dir + "/f" + f + ".", aconf, f * 1000, (f + 1) * 1000 - 1);
+        }
+        
c.tableOperations().importDirectory(dir).to(tableName).tableTime(true).load();
+        getCluster().getFileSystem().delete(new Path(dir), true);
+      }

Review Comment:
   > Not sure if its a good idea or not but could do something like this to 
kick off the bulk imports in parallel.
   
   Its an interesting idea.  Running the imports serially vs in parallel each 
have the opportunity to expose different bugs. Going to experiment with doing 
both in two test to see what that looks like for code sharing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to