[
https://issues.apache.org/jira/browse/ACCUMULO-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796461#comment-15796461
]
Christopher Tubbs commented on ACCUMULO-4542:
---------------------------------------------
This seems really hard to reproduce. [~kturner] tells me he believes there is a
final check before it moves, and it might do a copy instead of a move, if it
has failed for some tablets but not others (in the case of the file overlapping
several tablets). If he's right, then it's possible there was a failure reading
the metadata table to confirm, and the system treated this failure to validate
as a false-positive failure to assign. I'm not sure there's a sane way to
handle that case... which is better than the result you saw.
> Tablet left in bad state after bulk import timeout
> --------------------------------------------------
>
> Key: ACCUMULO-4542
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4542
> Project: Accumulo
> Issue Type: Bug
> Affects Versions: 1.7.2
> Reporter: John Vines
>
> On a cluster we saw a large amount of network issues at one point. Cause
> still has not been pinpointed, but it did result in us seeing a lot of rpc
> exceptions and the like.
> While these network issues happened, a bulk import was kicked off for a
> single file. This single file was assigned to two tablets (which both
> happened to be on the same server). Unfortunately, in the 3 attempts bulk
> import made to assign this file to this tablet, there were 3 rpc exceptions
> due to a socket timeout. After the three failures the bulk import went ahead
> and moved this file to the failures directory and carried on.
> Unfortunately, this file was actually assigned to the tablet succesfully on
> the first attempt. The following 2 attempts logged about how the server had
> already been assigned this file. It was shortly afterward a query came in
> (and then later major compactions) which then complained about how the file
> could not be found because the bulk import moved it to the failures directory.
> I think in this event we need some sort of final validation the record didn't
> end up in the metadata table before we move it to the failures.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)