milleruntime opened a new issue #1844:
URL: https://github.com/apache/accumulo/issues/1844


   I saw 2 metadata tablets fail to recover during a long running session of 
the Randomwalk test.  The error that was reported read:
   "Error recovering tablet !0;51e7e10b1827d193;490d5011198f9047 from log files"
   Here is the stacktrace from the tserver trying to load the tablet:
   ```
   2020-12-20 00:45:53,512 [tserver.TabletServer] WARN : exception trying to 
assign tablet !0;51e7e10b1827d193;490d5011198f9047 
hdfs://muchoshacluster/accumulo/tables/!0/t-0000yt4
   java.lang.RuntimeException: Error recovering tablet 
!0;51e7e10b1827d193;490d5011198f9047 from log files
           at org.apache.accumulo.tserver.tablet.Tablet.<init>(Tablet.java:499)
           at 
org.apache.accumulo.tserver.TabletServer$AssignmentHandler.run(TabletServer.java:2413)
           at 
org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
           at 
org.apache.accumulo.tserver.ActiveAssignmentRunnable.run(ActiveAssignmentRunnable.java:64)
           at 
org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at 
org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.IOException: Unable to find recovery files for extent 
!0;51e7e10b1827d193;490d5011198f9047 logEntry: !0;51e7e10b1827d193; 
hdfs://muchoshacluster/accumulo/wal/worker1+9997/e41008f5-edcf-4e18-aa
   92-ba5ff021685c
           at 
org.apache.accumulo.tserver.TabletServer.recover(TabletServer.java:3311)
           at org.apache.accumulo.tserver.tablet.Tablet.<init>(Tablet.java:437)
           ... 8 more
   ```
   
   At some point before this the Master went into SAFE_MODE to 
UNLOAD_METADATA_TABLETS to UNLOAD_ROOT_TABLET to STOP so it could shut down 
cleanly.  I am not sure why exactly it did this, memory perhaps but it repeated 
shutting down a few times after starting back up.  It was during this time that 
the tserver reported the above error.
   
   **Versions (OS, Maven, Java, and others, as appropriate):**
    - Affected version(s) of this project: 1.10.0, 1.10.1
    - Platform:  4 node Amazon EC2 Cluster created using Muchos
    - OS: CentOS 7.5.1804
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to