Keith Turner created ACCUMULO-2406:
--------------------------------------

             Summary: GarbageCollectorIT.dontGCRootLog() fails
                 Key: ACCUMULO-2406
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2406
             Project: Accumulo
          Issue Type: Bug
         Environment: 2ef2d88598f5e14f8f96b77fecca66dcd7196448
            Reporter: Keith Turner
            Assignee: Keith Turner
             Fix For: 1.6.0


Saw the following failure while running ITs.  The test is not using MiniDFS, 
which is what caused the problem.  Because minidfs was not used, recent walog 
updates were lost.  Below are my notes from investigating.  To fix this I am 
going to look into using RawLocalFS like I did w/ VolumeIT.

{noformat}
dontGCRootLog(org.apache.accumulo.test.functional.GarbageCollectorIT)  Time 
elapsed: 108.685 sec  <<< ERROR!
java.lang.RuntimeException: 
org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server 
rd6ul-14723v.infosec.tycho.ncsc.mil:34130
        at 
org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:187)
        at 
org.apache.accumulo.test.functional.GarbageCollectorIT.dontGCRootLog(GarbageCollectorIT.java:163)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: org.apache.accumulo.core.client.impl.AccumuloServerException: Error 
on server rd6ul-14723v.infosec.tycho.ncsc.mil:34130
        at 
org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:285)
        at 
org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:84)
        at 
org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:177)
        ... 10 more
Caused by: org.apache.thrift.TApplicationException: Internal error processing 
startScan
        at 
org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
        at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startScan(TabletClientService.java:226)
        at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startScan(TabletClientService.java:202)
        at 
org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:399)
        at 
org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:277)
        ... 12 more
{noformat}

Looking in the tserver logs, I found this error was caused by a missing file.

{noformat}
TabletServer_1756174045.out:java.io.IOException: Failed to open 
V/accumulo/tables/!0/table_info/A0000004.rf
{noformat}

The file was garbage collected

{noformat}
SimpleGarbageCollector_1083069032.out:2014-02-25 12:48:12,897 
[gc.SimpleGarbageCollector] DEBUG: Deleting 
V/accumulo/tables/!0/table_info/A0000004.rf
{noformat}

The file was created on another tablet server and compacted away.  That tablet 
server was killed by the test code right after doing the compaction.  The 
tserver is killed shortly after compaction 4 into 5.

{noformat}
TabletServer_1242447446.out:2014-02-25 12:48:07,256 [tserver.Tablet] DEBUG: 
Starting MajC !0;~< (USER) [V/accumulo/tables/!0/table_info/A0000003.rf] --> 
V/accumulo/tables/!0/table_info/A0000004.rf_tmp  []
TabletServer_1242447446.out:2014-02-25 12:48:07,290 [tserver.Tablet] 
TABLET_HIST: !0;~< MajC [V/accumulo/tables/!0/table_info/A0000003.rf] --> 
V/accumulo/tables/!0/table_info/A0000004.rf
TabletServer_1242447446.out:2014-02-25 12:48:10,007 [tserver.Tablet] DEBUG: 
Starting MajC !0;~< (USER) [V/accumulo/tables/!0/table_info/A0000004.rf] --> 
V/accumulo/tables/!0/table_info/A0000005.rf_tmp  []
TabletServer_1242447446.out:2014-02-25 12:48:10,044 [tserver.Tablet] 
TABLET_HIST: !0;~< MajC [V/accumulo/tables/!0/table_info/A0000004.rf] --> 
V/accumulo/tables/!0/table_info/A0000005.rf
{noformat}

So when the root tablet recovered it brought back old data. Looking in the root 
tablet write ahead log, it only contained the following after the crash.  This 
is why A0000004.rf was brought back and A0000005.rf was forgotten.

{noformat}
COMPACTION_START 1 8 
file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/+r/root_tablet/F000004m.rf

COMPACTION_FINISH 1 9

DEFINE_TABLET 1 9 +r<<

MANY_MUTATIONS 1 9
1 mutations:
  
~delfile:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/!0/table_info/A0000002.rf
      : [system]:23 [] <deleted>

MANY_MUTATIONS 1 9
1 mutations:
  !0<
      srv:flush [system]:24 [] 4
      srv:lock [system]:24 [] 
tservers/host1:44035/zlock-0000000000$1446a28bfef0003

MANY_MUTATIONS 1 9
1 mutations:
  !0;~
      srv:flush [system]:25 [] 4
      srv:lock [system]:25 [] 
tservers/host1:57501/zlock-0000000000$1446a28bfef0001

MANY_MUTATIONS 1 9
1 mutations:
  !0<
      srv:compact [system]:26 [] 4
      srv:lock [system]:26 [] 
tservers/host1:44035/zlock-0000000000$1446a28bfef0003

MANY_MUTATIONS 1 9
1 mutations:
  
~delfile:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/!0/table_info/A0000003.rf
      : [system]:27 [] 

MANY_MUTATIONS 1 9
1 mutations:
  !0;~
      
file:file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/!0/table_info/A0000003.rf
 [system]:28 [] <deleted>
      
file:file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/!0/table_info/A0000004.rf
 [system]:28 [] 578,5
      srv:compact [system]:28 [] 4
      last:1446a28bfef0001 [system]:28 [] host1:57501
      srv:lock [system]:28 [] 
tservers/host1:57501/zlock-0000000000$1446a28bfef0001

{noformat}

below is some info about the root tablet recovery from the tserver logs

{noformat}
2014-02-25 12:49:26,353 [tserver.Tablet] INFO : Starting Write-Ahead Log 
recovery for +r<<
2014-02-25 12:49:26,356 [tserver.TabletServer] INFO : Looking for 
V/accumulo/recovery/770c5d8a-c598-4e3c-8b22-9ce27eee5f40/finished
2014-02-25 12:49:26,360 [log.SortedLogRecovery] INFO : Looking at mutations 
from V/accumulo/recovery/770c5d8a-c598-4e3c-8b22-9ce27eee5f40 for +r<<
2014-02-25 12:49:26,376 [log.SortedLogRecovery] DEBUG: Found tid, seq 1 1
2014-02-25 12:49:26,378 [log.SortedLogRecovery] DEBUG: minor compaction into 
V/accumulo/tables/+r/root_tablet/F000004g.rf finished, but was still in the 
METADATA
2014-02-25 12:49:26,378 [log.SortedLogRecovery] DEBUG: minor compaction into 
V/accumulo/tables/+r/root_tablet/F000004i.rf finished, but was still in the 
METADATA
2014-02-25 12:49:26,378 [log.SortedLogRecovery] DEBUG: minor compaction into 
V/accumulo/tables/+r/root_tablet/F000004k.rf finished, but was still in the 
METADATA
2014-02-25 12:49:26,378 [log.SortedLogRecovery] DEBUG: minor compaction into 
V/accumulo/tables/+r/root_tablet/F000004m.rf finished, but was still in the 
METADATA
2014-02-25 12:49:26,382 [log.SortedLogRecovery] INFO : Scanning for mutations 
starting at sequence number 8 for tid 1
2014-02-25 12:49:26,386 [log.SortedLogRecovery] INFO : Recovery complete for 
+r<< using V/accumulo/recovery/770c5d8a-c598-4e3c-8b22-9ce27eee5f40
2014-02-25 12:49:26,387 [tserver.Tablet] INFO : Write-Ahead Log recovery 
complete for +r<< (6 mutations applied, 13 entries created)
2014-02-25 12:49:26,396 [tserver.Tablet] TABLET_HIST: +r<< opened
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to