[ 
https://issues.apache.org/jira/browse/ACCUMULO-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Tubbs resolved ACCUMULO-4542.
-----------------------------------------
    Resolution: Cannot Reproduce

Can't reproduce, and this is OBE, with the new 2.0 bulk import API.

> Tablet left in bad state after bulk import timeout
> --------------------------------------------------
>
>                 Key: ACCUMULO-4542
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4542
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.7.2
>            Reporter: John Vines
>            Priority: Major
>
> On a cluster we saw a large amount of network issues at one point. Cause 
> still has not been pinpointed, but it did result in us seeing a lot of rpc 
> exceptions and the like.
> While these network issues happened, a bulk import was kicked off for a 
> single file. This single file was assigned to two tablets (which both 
> happened to be on the same server). Unfortunately, in the 3 attempts bulk 
> import made to assign this file to this tablet, there were 3 rpc exceptions 
> due to a socket timeout. After the three failures the bulk import went ahead 
> and moved this file to the failures directory and carried on.
> Unfortunately, this file was actually assigned to the tablet succesfully on 
> the first attempt. The following 2 attempts logged about how the server had 
> already been assigned this file. It was shortly afterward a query came in 
> (and then later major compactions) which then complained about how the file 
> could not be found because the bulk import moved it to the failures directory.
> I think in this event we need some sort of final validation the record didn't 
> end up in the metadata table before we move it to the failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to