Bulk import failing when tablet server dies
-------------------------------------------

                 Key: ACCUMULO-422
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-422
             Project: Accumulo
          Issue Type: Bug
         Environment: 10 node cluster running 1.4.0-SNAPSHOT
            Reporter: Keith Turner
             Fix For: 1.4.0


Saw this issue while running random walk test w/ agitation.  The bulk import 
code picks random tablet servers and ask them to bulk load files.  If a tablet 
server dies it takes 30 seconds for the master to see the zookeeper lock was 
lost.  During this 30 second period the bulk import code will still try to use 
the tserver and fail. After it fails three times it will mark the file as a 
failure.  This all happens within a second.

The bulk import code should probably catch TTransportException and black list 
the tablet server for that bulk import transaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to