[
https://issues.apache.org/jira/browse/ACCUMULO-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825352#comment-13825352
]
Eric Newton commented on ACCUMULO-1831:
---------------------------------------
This should be governed by {{master.recovery.max.age}}. That is, we don't
really GC recovery files, we just remove them when they have been sitting there
for an hour (by default). Did you set this to some very low value?
> Write ahead logs from upgrade prematurely GCed
> ----------------------------------------------
>
> Key: ACCUMULO-1831
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1831
> Project: Accumulo
> Issue Type: Sub-task
> Components: master, tserver
> Reporter: Keith Turner
> Assignee: Eric Newton
> Priority: Blocker
> Fix For: 1.6.0
>
>
> I was running {{test/system/upgrade_test.sh dirty}} and the test hung. Upon
> inspection, the wals from 1.5 were deleted before all tablets were recovered.
>
> Some tablets from 1.5 recovered fine.
> {noformat}
> 2013-10-29 20:29:26,475 [log.SortedLogRecovery] INFO : Recovery complete for
> !!R<< using
> hdfs://nnhost:6093/rktl/accumulo-upt/recovery/754f171b-c260-42dd-b17e-bd15064608c7
> {noformat}
> Then the GC kicked in and deleted files before tablets were finished
> recovering.
> {noformat}
> 2013-10-29 20:29:30,421 [gc.GarbageCollectWriteAheadLogs] DEBUG: Removing WAL
> for offline server
> hdfs://nnhost:6093/rktl/accumulo-upt/wal/127.0.0.1+9997/754f171b-c260-42dd-b17e-bd15064608c7
> 2013-10-29 20:29:30,428 [gc.GarbageCollectWriteAheadLogs] DEBUG: Removing
> sorted WAL
> hdfs://nnhost:6093/rktl/accumulo-upt/recovery/754f171b-c260-42dd-b17e-bd15064608c7
> {noformat}
> Tablet failed to recover.
> {noformat}
> 2013-10-29 20:29:30,858 [tabletserver.TabletServer] WARN : exception trying
> to assign tablet 1<;row_0000180000 /default_tablet
> java.lang.RuntimeException: java.io.IOException: Unable to find recovery
> files for extent 1<;row_0000180000 logEntry: 1<;
> 754f171b-c260-42dd-b17e-bd15064608c7 (19)
> at
> org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1398)
> at
> org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1233)
> at
> org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1088)
> at
> org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1076)
> {noformat}
> I had set my gc delay to 30 secs while testing another issue and thats why I
> ran into this issue.
> Looking at the code, I do not think its properly converting relative paths
> from 1.5 to absolute paths. I think the code should convert everything to
> relative paths (just UUIDs) to avoid problems caused by differing
> configurations.
--
This message was sent by Atlassian JIRA
(v6.1#6144)