keith-turner opened a new issue, #3006:
URL: https://github.com/apache/accumulo/issues/3006

   Saw the following while running bulk import RW test.
   
   The test added splits to the table
   
   ```
   2022-10-07T07:39:40,584 [randomwalk.bulk.Split] INFO : split for [r008c7, 
r00ee2, r011a5, r0531f, r0929e, r0942e, r0b580, r0d6d4, r10134, r13a86, r15eb5, 
r16920] finished
   ```
   
   Then later a bulk import failed complaining about one of the splits created 
above.  In the test logs below it was also running a concurrent merge, so I 
suspect that caused the problem.
   
   ```
   2022-10-07T07:40:56,728 [randomwalk.bulk.BulkPlusOne] DEBUG: preparing bulk 
files with start rows [r00000, r05042, r09cea, r13ea4, r180f5] last row r1869f 
marker 0000153
   2022-10-07T07:40:56,728 [randomwalk.bulk.BulkPlusOne] DEBUG: Creating 
hdfs://localhost:8020/tmp/bulk_b66dbfcc-6eea-44b0-a4e7-2b36c2a9babe/part_0.rf
   2022-10-07T07:40:56,902 [randomwalk.bulk.BulkPlusOne] DEBUG: Creating 
hdfs://localhost:8020/tmp/bulk_b66dbfcc-6eea-44b0-a4e7-2b36c2a9babe/part_1.rf
   2022-10-07T07:40:56,982 [randomwalk.bulk.BulkMinusOne] DEBUG: Creating 
hdfs://localhost:8020/tmp/bulk_766fe741-1437-4136-a3fa-135b1c503aa3/part_3.rf
   2022-10-07T07:40:57,079 [randomwalk.bulk.BulkMinusOne] DEBUG: Creating 
hdfs://localhost:8020/tmp/bulk_766fe741-1437-4136-a3fa-135b1c503aa3/part_4.rf
   2022-10-07T07:40:57,086 [randomwalk.bulk.BulkPlusOne] DEBUG: Creating 
hdfs://localhost:8020/tmp/bulk_b66dbfcc-6eea-44b0-a4e7-2b36c2a9babe/part_2.rf
   2022-10-07T07:40:57,173 [randomwalk.bulk.BulkMinusOne] DEBUG: Creating 
hdfs://localhost:8020/tmp/bulk_766fe741-1437-4136-a3fa-135b1c503aa3/part_5.rf
   2022-10-07T07:40:57,237 [randomwalk.bulk.BulkMinusOne] DEBUG: Creating 
hdfs://localhost:8020/tmp/bulk_766fe741-1437-4136-a3fa-135b1c503aa3/part_6.rf
   2022-10-07T07:40:57,370 [randomwalk.bulk.BulkPlusOne] DEBUG: Creating 
hdfs://localhost:8020/tmp/bulk_b66dbfcc-6eea-44b0-a4e7-2b36c2a9babe/part_3.rf
   2022-10-07T07:40:57,498 [randomwalk.bulk.BulkPlusOne] DEBUG: Creating 
hdfs://localhost:8020/tmp/bulk_b66dbfcc-6eea-44b0-a4e7-2b36c2a9babe/part_4.rf
   2022-10-07T07:40:57,570 [randomwalk.bulk.Verify] INFO : Waiting for 103 
nodes to complete
   2022-10-07T07:40:57,752 [randomwalk.bulk.Merge] INFO : merging (-inf -> 
r05e6e] complete
   2022-10-07T07:40:57,752 [randomwalk.bulk.ConsistencyCheck] INFO : Checking 
r06864
   2022-10-07T07:40:57,794 [randomwalk.bulk.BulkPlusOne] ERROR: 
java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
org.apache.accumulo.core.metadata.schema.TabletDeletedException: Tablet 18;r
   0531f was deleted while iterating
   java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
org.apache.accumulo.core.metadata.schema.TabletDeletedException: Tablet 
18;r0531f was deleted while iterating
           at 
org.apache.accumulo.core.clientImpl.bulk.BulkImport.computeFileToTabletMappings(BulkImport.java:579)
 ~[accumulo-testing-shaded.jar:?]
           at 
org.apache.accumulo.core.clientImpl.bulk.BulkImport.computeMappingFromFiles(BulkImport.java:487)
 ~[accumulo-testing-shaded.jar:?]
           at 
org.apache.accumulo.core.clientImpl.bulk.BulkImport.load(BulkImport.java:152) 
~[accumulo-testing-shaded.jar:?]
           at 
org.apache.accumulo.testing.randomwalk.bulk.BulkPlusOne.bulkLoadLots(BulkPlusOne.java:96)
 ~[accumulo-testing-shaded.jar:?]
           at 
org.apache.accumulo.testing.randomwalk.bulk.BulkPlusOne.runLater(BulkPlusOne.java:105)
 ~[accumulo-testing-shaded.jar:?]
           at 
org.apache.accumulo.testing.randomwalk.bulk.BulkTest.lambda$visit$0(BulkTest.java:31)
 ~[accumulo-testing-shaded.jar:?]
           at 
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
 ~[accumulo-testing-shaded.jar:?]
           at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
           at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
           at 
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
 ~[accumulo-testing-shaded.jar:?]
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]
           at 
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
 ~[accumulo-testing-shaded.jar:?]
           at java.lang.Thread.run(Thread.java:829) ~[?:?]
   Caused by: java.util.concurrent.ExecutionException: 
org.apache.accumulo.core.metadata.schema.TabletDeletedException: Tablet 
18;r0531f was deleted while iterating
           at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) 
~[?:?]
           at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) ~[?:?]
           at 
org.apache.accumulo.core.clientImpl.bulk.BulkImport.computeFileToTabletMappings(BulkImport.java:573)
 ~[accumulo-testing-shaded.jar:?]
           ... 13 more
   Caused by: org.apache.accumulo.core.metadata.schema.TabletDeletedException: 
Tablet 18;r0531f was deleted while iterating
           at 
org.apache.accumulo.core.metadata.schema.LinkingIterator.resetSource(LinkingIterator.java:127)
 ~[accumulo-testing-shaded.jar:?]
           at 
org.apache.accumulo.core.metadata.schema.LinkingIterator.next(LinkingIterator.java:180)
 ~[accumulo-testing-shaded.jar:?]
           at 
org.apache.accumulo.core.metadata.schema.LinkingIterator.next(LinkingIterator.java:50)
 ~[accumulo-testing-shaded.jar:?]
           at 
java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812) 
~[?:?]
           at 
java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
 ~[?:?]
           at 
java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
 ~[?:?]
           at 
java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169)
 ~[?:?]
           at 
java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
 ~[?:?]
           at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681) 
~[?:?]
           at 
org.apache.accumulo.core.clientImpl.bulk.ConcurrentKeyExtentCache.lookup(ConcurrentKeyExtentCache.java:124)
 ~[accumulo-testing-shaded.jar:?]
           at 
org.apache.accumulo.core.clientImpl.bulk.BulkImport.findOverlappingTablets(BulkImport.java:331)
 ~[accumulo-testing-shaded.jar:?]
           at 
org.apache.accumulo.core.clientImpl.bulk.BulkImport.findOverlappingTablets(BulkImport.java:356)
 ~[accumulo-testing-shaded.jar:?]
           at 
org.apache.accumulo.core.clientImpl.bulk.BulkImport.lambda$computeFileToTabletMappings$7(BulkImport.java:549)
 ~[accumulo-testing-shaded.jar:?]
           at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
 ~[?:?]
           ... 5 more
   2022-10-07T07:40:57,804 [randomwalk.bulk.BulkMinusOne] INFO : Decrementing
   ```
   
   The test kept running after the failure above and failed later during verify.
   
   ```
   2022-10-07T07:43:28,304 [testing.randomwalk.Framework] ERROR: Error during 
random walk
   java.lang.Exception: Error running node bulk.Verify
           at 
org.apache.accumulo.testing.randomwalk.Module.visit(Module.java:360) 
~[accumulo-testing-shaded.jar:?]
           at 
org.apache.accumulo.testing.randomwalk.Framework.run(Framework.java:54) 
~[accumulo-testing-shaded.jar:?]
           at 
org.apache.accumulo.testing.randomwalk.Framework.main(Framework.java:98) 
~[accumulo-testing-shaded.jar:?]
   Caused by: java.lang.Exception: Bad key at r00000 cf:000 [] 1665128566432 
false=-1
           at 
org.apache.accumulo.testing.randomwalk.bulk.Verify.visit(Verify.java:73) 
~[accumulo-testing-shaded.jar:?]
           at 
org.apache.accumulo.testing.randomwalk.Module.lambda$visit$0(Module.java:295) 
~[accumulo-testing-shaded.jar:?]
           at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
           at 
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
 ~[accumulo-testing-shaded.jar:?]
           at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
           at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
           at 
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
 ~[accumulo-testing-shaded.jar:?]
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]
           at 
org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
 ~[accumulo-testing-shaded.jar:?]
           at java.lang.Thread.run(Thread.java:829) ~[?:?]
   2022-10-07T07:43:28,305 [testing.randomwalk.Framework] INFO : Test finished
   ```
   
   I analyzed the markers in the the test table and found that marker 00153 was 
missing which corresponds to the bulk import that failed.
   
   I suspect the bulk import code need to catch TabletDeletedException and 
completely restart the operation.   That exception is thrown by the 
LinkingIterator that checks the consistency of the metadata table while 
scanning.  The LinkingIterator can handle concurrent splits, but there is no 
way for it to handle concurrent merges so it throws that exception.  Higher 
level operations need to start over when they see that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to