[ 
https://issues.apache.org/jira/browse/ACCUMULO-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13878772#comment-13878772
 ] 

Sean Busbey commented on ACCUMULO-2227:
---------------------------------------

I think it'd be fine to put a note in the README for randomwalk that if you're 
going to test HA failover, you should expect some failures like this if you are 
using an HDFS version prior to 2.1.0. should also include a link to HADOOP-9792 
for more details.

Since our recommended version for Hadoop 2 is 2.2.0, I think that'd be fine. 
AFAIK, our non-test code already has retries built in for when we need to 
delete something (e.g. the GC).

> Concurrent randomwalk fails when namenode dies after bulk import step
> ---------------------------------------------------------------------
>
>                 Key: ACCUMULO-2227
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2227
>             Project: Accumulo
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 1.4.4
>            Reporter: Bill Havanki
>              Labels: ha, randomwalk, test
>
> Running Concurrent randomwalk under HDFS HA, if the active namenode is killed:
> {noformat}
> 20 12:27:51,119 [retry.RetryInvocationHandler] WARN : Exception while 
> invoking class 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete. 
> Not retrying because the invoked method is not idempotent, and unable to 
> determine whether it was invoked
> java.io.IOException: Failed on local exception: java.io.IOException: Response 
> is null.; Host Details : local host is: "slave.domain.com/10.20.200.113"; 
> destination host is: "namenode.domain.com":8020;
> ...
>  at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1487)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:355)
> at 
> org.apache.accumulo.server.test.randomwalk.concurrent.BulkImport.visit(BulkImport.java:140)
> ...
> Caused by: java.io.IOException: Response is null.
> at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:952)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:847)
> {noformat}
> This arises from an HDFS path delete call that cleans up from the bulk 
> import. The test should be resilient here (and when the paths are made 
> earlier in the test) so that the test can continue once failover has 
> completed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to