[
https://issues.apache.org/jira/browse/ACCUMULO-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14803336#comment-14803336
]
Eric Newton commented on ACCUMULO-4000:
---------------------------------------
New theory: the open blocks were not replicated during the decommissioning, and
nothing knew about it.
* are blocks open for writing re-replicated?
* if a WAL isn't used, and all the datanodes in the pipline go away, will the
writer get an error?
* will a datanode decommission if there are still open writers?
> log recovery failed after hard reset
> ------------------------------------
>
> Key: ACCUMULO-4000
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4000
> Project: Accumulo
> Issue Type: Bug
> Affects Versions: 1.6.2
> Environment: very large cluster, accumulo 1.6.2, hadoop 2.5.0 (cdh
> 5.3)
> Reporter: Eric Newton
> Assignee: Eric Newton
>
> Had a hardware failure on a single node within a large cluster. Tablets were
> migrated away, but one tablet would not recover. The Closer run by the
> master to release the write lease on the WAL failed repeatedly.
> Afterwards, it was determined the file was small, probably just opened and
> used at the moment the machine failed. The block could not be recovered from
> any replicas.
> One question raised: does the write pipeline acknowledge the sync, before the
> write pipeline completes?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)