milleruntime opened a new issue #1844:
URL: https://github.com/apache/accumulo/issues/1844
I saw 2 metadata tablets fail to recover during a long running session of
the Randomwalk test. The error that was reported read:
"Error recovering tablet !0;51e7e10b1827d193;490d5011198f9047 from log files"
Here is the stacktrace from the tserver trying to load the tablet:
```
2020-12-20 00:45:53,512 [tserver.TabletServer] WARN : exception trying to
assign tablet !0;51e7e10b1827d193;490d5011198f9047
hdfs://muchoshacluster/accumulo/tables/!0/t-0000yt4
java.lang.RuntimeException: Error recovering tablet
!0;51e7e10b1827d193;490d5011198f9047 from log files
at org.apache.accumulo.tserver.tablet.Tablet.<init>(Tablet.java:499)
at
org.apache.accumulo.tserver.TabletServer$AssignmentHandler.run(TabletServer.java:2413)
at
org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
at
org.apache.accumulo.tserver.ActiveAssignmentRunnable.run(ActiveAssignmentRunnable.java:64)
at
org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at
org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Unable to find recovery files for extent
!0;51e7e10b1827d193;490d5011198f9047 logEntry: !0;51e7e10b1827d193;
hdfs://muchoshacluster/accumulo/wal/worker1+9997/e41008f5-edcf-4e18-aa
92-ba5ff021685c
at
org.apache.accumulo.tserver.TabletServer.recover(TabletServer.java:3311)
at org.apache.accumulo.tserver.tablet.Tablet.<init>(Tablet.java:437)
... 8 more
```
At some point before this the Master went into SAFE_MODE to
UNLOAD_METADATA_TABLETS to UNLOAD_ROOT_TABLET to STOP so it could shut down
cleanly. I am not sure why exactly it did this, memory perhaps but it repeated
shutting down a few times after starting back up. It was during this time that
the tserver reported the above error.
**Versions (OS, Maven, Java, and others, as appropriate):**
- Affected version(s) of this project: 1.10.0, 1.10.1
- Platform: 4 node Amazon EC2 Cluster created using Muchos
- OS: CentOS 7.5.1804
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]