[ https://issues.apache.org/jira/browse/ACCUMULO-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295300#comment-13295300 ]
Keith Turner commented on ACCUMULO-623: --------------------------------------- I tried another experiment. Instead of killing all java processes on my single node instance, I did the following. * start HDFS and zookeeper * init & start Accumulo * created a table and insert some data * kill data node * kill all accumulo processes * restart datanode * restart accumulo * recovery fails Under this scenario recovery fails differently. The following is from the tablet server logs, I get an NPE in hdfs client code. {noformat} 14 16:26:17,253 [log.LogSorter] INFO : Zookeeper references 1 recoveries, attempting locks14 16:26:17,254 [log.LogSorter] DEBUG: Attempting to lock b67eb806-6ef1-4ecc-b739-a4ee90e08086 14 16:26:17,262 [log.LogSorter] INFO : got lock for b67eb806-6ef1-4ecc-b739-a4ee90e08086 14 16:26:17,264 [log.LogSorter] INFO : Copying /accumulo/wal/127.0.0.1+40200/b67eb806-6ef1-4ecc-b739-a4ee90e08086 to /accumulo/recovery/b67eb806-6ef1-4ecc-b739-a4ee90e08086 14 16:26:17,300 [log.LogSorter] ERROR: Unexpected error java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.updateBlockInfo(DFSClient.java:1885) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1858) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1834) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427) at org.apache.accumulo.server.tabletserver.log.LogSorter.startSort(LogSorter.java:295) at org.apache.accumulo.server.tabletserver.log.LogSorter.attemptRecoveries(LogSorter.java:266) at org.apache.accumulo.server.tabletserver.log.LogSorter.access$200(LogSorter.java:60) at org.apache.accumulo.server.tabletserver.log.LogSorter$1.process(LogSorter.java:204) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) {noformat} > Data lost with hdfs write ahead log > ----------------------------------- > > Key: ACCUMULO-623 > URL: https://issues.apache.org/jira/browse/ACCUMULO-623 > Project: Accumulo > Issue Type: Bug > Environment: MacOSX, Hadoop 1.0.3, zookeeper 3.3.3 > Reporter: Keith Turner > Assignee: Eric Newton > Priority: Blocker > Fix For: 1.5.0 > > > I shut my machine down with Accumulo, Zookeeper, and HDFS running. When I > restarted it, Accumulo failed to recover its write ahead log because it was > zero length. I wondered if this was because I shutdown HDFS so I tried the > following on my single node Accumulo instance. > * start HDFS and zookeeper > * init & start Accumulo > * created a table and insert some data > * pkill -f java > * restart everything > * Accumulo fails to start because walog is zero length > Saw excpetions like the following > {noformat} > 06 18:58:44,581 [log.SortedLogRecovery] INFO : Looking at mutations from > /accumulo/recovery/def72721-5c64-4755-87cc-2e8cfc3002b7 for !0;!0<< > 06 18:58:44,590 [tabletserver.TabletServer] WARN : exception trying to assign > tablet !0;!0<< /root_tablet > java.lang.RuntimeException: java.io.IOException: java.lang.RuntimeException: > Unable to read log entries > at > org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1458) > at > org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1295) > at > org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1134) > at > org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1121) > at > org.apache.accumulo.server.tabletserver.TabletServer$AssignmentHandler.run(TabletServer.java:2477) > at > org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.io.IOException: java.lang.RuntimeException: Unable to read > log entries > at > org.apache.accumulo.server.tabletserver.log.TabletServerLogger.recover(TabletServerLogger.java:428) > at > org.apache.accumulo.server.tabletserver.TabletServer.recover(TabletServer.java:3206) > at > org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1426) > ... 6 more > Caused by: java.lang.RuntimeException: Unable to read log entries > at > org.apache.accumulo.server.tabletserver.log.SortedLogRecovery.findLastStartToFinish(SortedLogRecovery.java:125) > at > org.apache.accumulo.server.tabletserver.log.SortedLogRecovery.recover(SortedLogRecovery.java:89) > at > org.apache.accumulo.server.tabletserver.log.TabletServerLogger.recover(TabletServerLogger.java:426) > ... 8 more > {noformat} > When trying to run LogReader on the files, it prints nothing. > {noformat} > $ ./bin/accumulo org.apache.accumulo.server.logger.LogReader > /accumulo/recovery/def72721-5c64-4755-87cc-2e8cfc3002b7 > 06 19:04:37,147 [util.NativeCodeLoader] WARN : Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > $ ./bin/accumulo org.apache.accumulo.server.logger.LogReader > /accumulo/wal/127.0.0.1+40200/def72721-5c64-4755-87cc-2e8cfc3002b7 > $ > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira