[ 
https://issues.apache.org/jira/browse/ACCUMULO-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912064#comment-13912064
 ] 

ASF subversion and git services commented on ACCUMULO-2406:
-----------------------------------------------------------

Commit b23408faea79a9839cbb39932004254229beac79 in accumulo's branch 
refs/heads/1.6.0-SNAPSHOT from [~keith_turner]
[ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=b23408f ]

ACCUMULO-2406 make GarbageCollectorIT use RawLocalFileSystem so walog writes 
are not lost


> GarbageCollectorIT.dontGCRootLog() fails
> ----------------------------------------
>
>                 Key: ACCUMULO-2406
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2406
>             Project: Accumulo
>          Issue Type: Bug
>         Environment: 2ef2d88598f5e14f8f96b77fecca66dcd7196448
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.6.0
>
>
> Saw the following failure while running ITs.  The test is not using MiniDFS, 
> which is what caused the problem.  Because minidfs was not used, recent walog 
> updates were lost.  Below are my notes from investigating.  To fix this I am 
> going to look into using RawLocalFS like I did w/ VolumeIT.
> {noformat}
> dontGCRootLog(org.apache.accumulo.test.functional.GarbageCollectorIT)  Time 
> elapsed: 108.685 sec  <<< ERROR!
> java.lang.RuntimeException: 
> org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server 
> rd6ul-14723v.infosec.tycho.ncsc.mil:34130
>       at 
> org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:187)
>       at 
> org.apache.accumulo.test.functional.GarbageCollectorIT.dontGCRootLog(GarbageCollectorIT.java:163)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: org.apache.accumulo.core.client.impl.AccumuloServerException: 
> Error on server rd6ul-14723v.infosec.tycho.ncsc.mil:34130
>       at 
> org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:285)
>       at 
> org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:84)
>       at 
> org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:177)
>       ... 10 more
> Caused by: org.apache.thrift.TApplicationException: Internal error processing 
> startScan
>       at 
> org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
>       at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
>       at 
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startScan(TabletClientService.java:226)
>       at 
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startScan(TabletClientService.java:202)
>       at 
> org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:399)
>       at 
> org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:277)
>       ... 12 more
> {noformat}
> Looking in the tserver logs, I found this error was caused by a missing file.
> {noformat}
> TabletServer_1756174045.out:java.io.IOException: Failed to open 
> V/accumulo/tables/!0/table_info/A0000004.rf
> {noformat}
> The file was garbage collected
> {noformat}
> SimpleGarbageCollector_1083069032.out:2014-02-25 12:48:12,897 
> [gc.SimpleGarbageCollector] DEBUG: Deleting 
> V/accumulo/tables/!0/table_info/A0000004.rf
> {noformat}
> The file was created on another tablet server and compacted away.  That 
> tablet server was killed by the test code right after doing the compaction.  
> The tserver is killed shortly after compaction 4 into 5.
> {noformat}
> TabletServer_1242447446.out:2014-02-25 12:48:07,256 [tserver.Tablet] DEBUG: 
> Starting MajC !0;~< (USER) [V/accumulo/tables/!0/table_info/A0000003.rf] --> 
> V/accumulo/tables/!0/table_info/A0000004.rf_tmp  []
> TabletServer_1242447446.out:2014-02-25 12:48:07,290 [tserver.Tablet] 
> TABLET_HIST: !0;~< MajC [V/accumulo/tables/!0/table_info/A0000003.rf] --> 
> V/accumulo/tables/!0/table_info/A0000004.rf
> TabletServer_1242447446.out:2014-02-25 12:48:10,007 [tserver.Tablet] DEBUG: 
> Starting MajC !0;~< (USER) [V/accumulo/tables/!0/table_info/A0000004.rf] --> 
> V/accumulo/tables/!0/table_info/A0000005.rf_tmp  []
> TabletServer_1242447446.out:2014-02-25 12:48:10,044 [tserver.Tablet] 
> TABLET_HIST: !0;~< MajC [V/accumulo/tables/!0/table_info/A0000004.rf] --> 
> V/accumulo/tables/!0/table_info/A0000005.rf
> {noformat}
> So when the root tablet recovered it brought back old data. Looking in the 
> root tablet write ahead log, it only contained the following after the crash. 
>  This is why A0000004.rf was brought back and A0000005.rf was forgotten.
> {noformat}
> COMPACTION_START 1 8 
> file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/+r/root_tablet/F000004m.rf
> COMPACTION_FINISH 1 9
> DEFINE_TABLET 1 9 +r<<
> MANY_MUTATIONS 1 9
> 1 mutations:
>   
> ~delfile:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/!0/table_info/A0000002.rf
>       : [system]:23 [] <deleted>
> MANY_MUTATIONS 1 9
> 1 mutations:
>   !0<
>       srv:flush [system]:24 [] 4
>       srv:lock [system]:24 [] 
> tservers/host1:44035/zlock-0000000000$1446a28bfef0003
> MANY_MUTATIONS 1 9
> 1 mutations:
>   !0;~
>       srv:flush [system]:25 [] 4
>       srv:lock [system]:25 [] 
> tservers/host1:57501/zlock-0000000000$1446a28bfef0001
> MANY_MUTATIONS 1 9
> 1 mutations:
>   !0<
>       srv:compact [system]:26 [] 4
>       srv:lock [system]:26 [] 
> tservers/host1:44035/zlock-0000000000$1446a28bfef0003
> MANY_MUTATIONS 1 9
> 1 mutations:
>   
> ~delfile:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/!0/table_info/A0000003.rf
>       : [system]:27 [] 
> MANY_MUTATIONS 1 9
> 1 mutations:
>   !0;~
>       
> file:file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/!0/table_info/A0000003.rf
>  [system]:28 [] <deleted>
>       
> file:file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/!0/table_info/A0000004.rf
>  [system]:28 [] 578,5
>       srv:compact [system]:28 [] 4
>       last:1446a28bfef0001 [system]:28 [] host1:57501
>       srv:lock [system]:28 [] 
> tservers/host1:57501/zlock-0000000000$1446a28bfef0001
> {noformat}
> below is some info about the root tablet recovery from the tserver logs
> {noformat}
> 2014-02-25 12:49:26,353 [tserver.Tablet] INFO : Starting Write-Ahead Log 
> recovery for +r<<
> 2014-02-25 12:49:26,356 [tserver.TabletServer] INFO : Looking for 
> V/accumulo/recovery/770c5d8a-c598-4e3c-8b22-9ce27eee5f40/finished
> 2014-02-25 12:49:26,360 [log.SortedLogRecovery] INFO : Looking at mutations 
> from V/accumulo/recovery/770c5d8a-c598-4e3c-8b22-9ce27eee5f40 for +r<<
> 2014-02-25 12:49:26,376 [log.SortedLogRecovery] DEBUG: Found tid, seq 1 1
> 2014-02-25 12:49:26,378 [log.SortedLogRecovery] DEBUG: minor compaction into 
> V/accumulo/tables/+r/root_tablet/F000004g.rf finished, but was still in the 
> METADATA
> 2014-02-25 12:49:26,378 [log.SortedLogRecovery] DEBUG: minor compaction into 
> V/accumulo/tables/+r/root_tablet/F000004i.rf finished, but was still in the 
> METADATA
> 2014-02-25 12:49:26,378 [log.SortedLogRecovery] DEBUG: minor compaction into 
> V/accumulo/tables/+r/root_tablet/F000004k.rf finished, but was still in the 
> METADATA
> 2014-02-25 12:49:26,378 [log.SortedLogRecovery] DEBUG: minor compaction into 
> V/accumulo/tables/+r/root_tablet/F000004m.rf finished, but was still in the 
> METADATA
> 2014-02-25 12:49:26,382 [log.SortedLogRecovery] INFO : Scanning for mutations 
> starting at sequence number 8 for tid 1
> 2014-02-25 12:49:26,386 [log.SortedLogRecovery] INFO : Recovery complete for 
> +r<< using V/accumulo/recovery/770c5d8a-c598-4e3c-8b22-9ce27eee5f40
> 2014-02-25 12:49:26,387 [tserver.Tablet] INFO : Write-Ahead Log recovery 
> complete for +r<< (6 mutations applied, 13 entries created)
> 2014-02-25 12:49:26,396 [tserver.Tablet] TABLET_HIST: +r<< opened
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to