[GitHub] [accumulo-testing] keith-turner commented on a change in pull request #166: Add deletes to continuous ingest

GitBox Wed, 03 Nov 2021 11:58:03 -0700


keith-turner commented on a change in pull request #166:
URL: https://github.com/apache/accumulo-testing/pull/166#discussion_r742244701




##########
File path: 
src/main/java/org/apache/accumulo/testing/continuous/ContinuousIngest.java
##########
@@ -175,19 +191,47 @@ public static void main(String[] args) throws Exception {
 
         // generate subsequent sets of nodes that link to previous set of nodes
         for (int depth = 1; depth < maxDepth; depth++) {
+
+          // random chance that the entries will be deleted
+          boolean deletePrevious = deletesEnabled && r.nextInt(100) < 
deleteProbability;
+
+          // stack to hold mutations. stack ensures they are deleted in 
reverse order
+          Stack<Mutation> mutationStack = new Stack<>();
+
           for (int index = 0; index < flushInterval; index++) {
             long rowLong = genLong(rowMin, rowMax, r);
             byte[] prevRow = genRow(prevRows[index]);
             prevRows[index] = rowLong;
-            Mutation m = genMutation(rowLong, r.nextInt(maxColF), 
r.nextInt(maxColQ), cv,
-                ingestInstanceId, count, prevRow, checksum);
+            int cfInt = r.nextInt(maxColF);
+            int cqInt = r.nextInt(maxColQ);
+            Mutation m = genMutation(rowLong, cfInt, cqInt, cv, 
ingestInstanceId, count, prevRow,
+                checksum);
             count++;
             bw.addMutation(m);
+
+            // add a new delete mutation to the stack when applicable
+            if (deletePrevious) {
+              Mutation mutation = new Mutation(genRow(rowLong));
+              mutation.putDelete(genCol(cfInt), genCol(cqInt), cv);
+              mutationStack.add(mutation);
+            }
           }
 
           lastFlushTime = flush(bw, count, flushInterval, lastFlushTime);
           if (count >= numEntries)
             break out;
+
+          // delete last set of entries in reverse order
+          if (deletePrevious) {
+            log.info("Deleting previous set of entries");
+            while (!mutationStack.empty()) {
+              Mutation m = mutationStack.pop();
+              count--;
+              bw.addMutation(m);

Review comment:
       > We need to flush the batch writer between some of the mutation for 
safety.
   
   To expand on this.  Currently in this PR the batch writer is being given 
mutations in reverse order.  However, there must be flush between nodes in the 
linked that point to each other.  This is because the batch writer may write 
mutations out in a different order in which it was given and if the process 
running the batch writer dies then that could look like lost data instead of 
deleted data.
   
   For example assume we have the nodes NA->N9, N9->N4, and N4->N1.    If the 
batch writer is asked to delete NA,N9,N4 in that order it may actually delete 
them in another order like N9,NA,N4.  If the process dies and only N9 is 
deleted then when running the verification map reduce job we will see that 
NA->N9 but not see N9 so it looks like data was lost. If we do 
delete(NA),flush(),delete(N9),flush(),delete(N1) flush then the flushes will 
make things happen in the order we want.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [accumulo-testing] keith-turner commented on a change in pull request #166: Add deletes to continuous ingest

Reply via email to