[
https://issues.apache.org/jira/browse/ACCUMULO-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912064#comment-13912064
]
ASF subversion and git services commented on ACCUMULO-2406:
-----------------------------------------------------------
Commit b23408faea79a9839cbb39932004254229beac79 in accumulo's branch
refs/heads/1.6.0-SNAPSHOT from [~keith_turner]
[ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=b23408f ]
ACCUMULO-2406 make GarbageCollectorIT use RawLocalFileSystem so walog writes
are not lost
> GarbageCollectorIT.dontGCRootLog() fails
> ----------------------------------------
>
> Key: ACCUMULO-2406
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2406
> Project: Accumulo
> Issue Type: Bug
> Environment: 2ef2d88598f5e14f8f96b77fecca66dcd7196448
> Reporter: Keith Turner
> Assignee: Keith Turner
> Fix For: 1.6.0
>
>
> Saw the following failure while running ITs. The test is not using MiniDFS,
> which is what caused the problem. Because minidfs was not used, recent walog
> updates were lost. Below are my notes from investigating. To fix this I am
> going to look into using RawLocalFS like I did w/ VolumeIT.
> {noformat}
> dontGCRootLog(org.apache.accumulo.test.functional.GarbageCollectorIT) Time
> elapsed: 108.685 sec <<< ERROR!
> java.lang.RuntimeException:
> org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server
> rd6ul-14723v.infosec.tycho.ncsc.mil:34130
> at
> org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:187)
> at
> org.apache.accumulo.test.functional.GarbageCollectorIT.dontGCRootLog(GarbageCollectorIT.java:163)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: org.apache.accumulo.core.client.impl.AccumuloServerException:
> Error on server rd6ul-14723v.infosec.tycho.ncsc.mil:34130
> at
> org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:285)
> at
> org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:84)
> at
> org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:177)
> ... 10 more
> Caused by: org.apache.thrift.TApplicationException: Internal error processing
> startScan
> at
> org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
> at
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startScan(TabletClientService.java:226)
> at
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startScan(TabletClientService.java:202)
> at
> org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:399)
> at
> org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:277)
> ... 12 more
> {noformat}
> Looking in the tserver logs, I found this error was caused by a missing file.
> {noformat}
> TabletServer_1756174045.out:java.io.IOException: Failed to open
> V/accumulo/tables/!0/table_info/A0000004.rf
> {noformat}
> The file was garbage collected
> {noformat}
> SimpleGarbageCollector_1083069032.out:2014-02-25 12:48:12,897
> [gc.SimpleGarbageCollector] DEBUG: Deleting
> V/accumulo/tables/!0/table_info/A0000004.rf
> {noformat}
> The file was created on another tablet server and compacted away. That
> tablet server was killed by the test code right after doing the compaction.
> The tserver is killed shortly after compaction 4 into 5.
> {noformat}
> TabletServer_1242447446.out:2014-02-25 12:48:07,256 [tserver.Tablet] DEBUG:
> Starting MajC !0;~< (USER) [V/accumulo/tables/!0/table_info/A0000003.rf] -->
> V/accumulo/tables/!0/table_info/A0000004.rf_tmp []
> TabletServer_1242447446.out:2014-02-25 12:48:07,290 [tserver.Tablet]
> TABLET_HIST: !0;~< MajC [V/accumulo/tables/!0/table_info/A0000003.rf] -->
> V/accumulo/tables/!0/table_info/A0000004.rf
> TabletServer_1242447446.out:2014-02-25 12:48:10,007 [tserver.Tablet] DEBUG:
> Starting MajC !0;~< (USER) [V/accumulo/tables/!0/table_info/A0000004.rf] -->
> V/accumulo/tables/!0/table_info/A0000005.rf_tmp []
> TabletServer_1242447446.out:2014-02-25 12:48:10,044 [tserver.Tablet]
> TABLET_HIST: !0;~< MajC [V/accumulo/tables/!0/table_info/A0000004.rf] -->
> V/accumulo/tables/!0/table_info/A0000005.rf
> {noformat}
> So when the root tablet recovered it brought back old data. Looking in the
> root tablet write ahead log, it only contained the following after the crash.
> This is why A0000004.rf was brought back and A0000005.rf was forgotten.
> {noformat}
> COMPACTION_START 1 8
> file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/+r/root_tablet/F000004m.rf
> COMPACTION_FINISH 1 9
> DEFINE_TABLET 1 9 +r<<
> MANY_MUTATIONS 1 9
> 1 mutations:
>
> ~delfile:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/!0/table_info/A0000002.rf
> : [system]:23 [] <deleted>
> MANY_MUTATIONS 1 9
> 1 mutations:
> !0<
> srv:flush [system]:24 [] 4
> srv:lock [system]:24 []
> tservers/host1:44035/zlock-0000000000$1446a28bfef0003
> MANY_MUTATIONS 1 9
> 1 mutations:
> !0;~
> srv:flush [system]:25 [] 4
> srv:lock [system]:25 []
> tservers/host1:57501/zlock-0000000000$1446a28bfef0001
> MANY_MUTATIONS 1 9
> 1 mutations:
> !0<
> srv:compact [system]:26 [] 4
> srv:lock [system]:26 []
> tservers/host1:44035/zlock-0000000000$1446a28bfef0003
> MANY_MUTATIONS 1 9
> 1 mutations:
>
> ~delfile:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/!0/table_info/A0000003.rf
> : [system]:27 []
> MANY_MUTATIONS 1 9
> 1 mutations:
> !0;~
>
> file:file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/!0/table_info/A0000003.rf
> [system]:28 [] <deleted>
>
> file:file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/!0/table_info/A0000004.rf
> [system]:28 [] 578,5
> srv:compact [system]:28 [] 4
> last:1446a28bfef0001 [system]:28 [] host1:57501
> srv:lock [system]:28 []
> tservers/host1:57501/zlock-0000000000$1446a28bfef0001
> {noformat}
> below is some info about the root tablet recovery from the tserver logs
> {noformat}
> 2014-02-25 12:49:26,353 [tserver.Tablet] INFO : Starting Write-Ahead Log
> recovery for +r<<
> 2014-02-25 12:49:26,356 [tserver.TabletServer] INFO : Looking for
> V/accumulo/recovery/770c5d8a-c598-4e3c-8b22-9ce27eee5f40/finished
> 2014-02-25 12:49:26,360 [log.SortedLogRecovery] INFO : Looking at mutations
> from V/accumulo/recovery/770c5d8a-c598-4e3c-8b22-9ce27eee5f40 for +r<<
> 2014-02-25 12:49:26,376 [log.SortedLogRecovery] DEBUG: Found tid, seq 1 1
> 2014-02-25 12:49:26,378 [log.SortedLogRecovery] DEBUG: minor compaction into
> V/accumulo/tables/+r/root_tablet/F000004g.rf finished, but was still in the
> METADATA
> 2014-02-25 12:49:26,378 [log.SortedLogRecovery] DEBUG: minor compaction into
> V/accumulo/tables/+r/root_tablet/F000004i.rf finished, but was still in the
> METADATA
> 2014-02-25 12:49:26,378 [log.SortedLogRecovery] DEBUG: minor compaction into
> V/accumulo/tables/+r/root_tablet/F000004k.rf finished, but was still in the
> METADATA
> 2014-02-25 12:49:26,378 [log.SortedLogRecovery] DEBUG: minor compaction into
> V/accumulo/tables/+r/root_tablet/F000004m.rf finished, but was still in the
> METADATA
> 2014-02-25 12:49:26,382 [log.SortedLogRecovery] INFO : Scanning for mutations
> starting at sequence number 8 for tid 1
> 2014-02-25 12:49:26,386 [log.SortedLogRecovery] INFO : Recovery complete for
> +r<< using V/accumulo/recovery/770c5d8a-c598-4e3c-8b22-9ce27eee5f40
> 2014-02-25 12:49:26,387 [tserver.Tablet] INFO : Write-Ahead Log recovery
> complete for +r<< (6 mutations applied, 13 entries created)
> 2014-02-25 12:49:26,396 [tserver.Tablet] TABLET_HIST: +r<< opened
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)