[
https://issues.apache.org/jira/browse/ACCUMULO-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13878758#comment-13878758
]
Bill Havanki commented on ACCUMULO-2227:
----------------------------------------
The failure here is ultimately due to running the test under Hadoop 2.0.0. Up
until then, client-to-namenode calls like {{delete}}, annotated as
{{AtMostOnce}} in {{org.apache.hadoop.hdfs.protocol.ClientProtocol}}, were not
retried; only operations marked {{Idempotent}} were. Starting with the
implementation of HADOOP-9792 in Hadoop 2.1.0, {{AtMostOnce}}-annotated
operations are also retried. So, I expect that upgrading my cluster to Hadoop
2.1.0 or higher would resolve this issue.
The {{mkdirs}} call is annotated as {{Idempotent}} so it should not cause this
problem, even under Hadoop 2.0.0.
I'm not sure that adding an _ad hoc_ retry here is the best idea to resolve
this, so any opinions are welcome.
> Concurrent randomwalk fails when namenode dies after bulk import step
> ---------------------------------------------------------------------
>
> Key: ACCUMULO-2227
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2227
> Project: Accumulo
> Issue Type: Bug
> Components: test
> Affects Versions: 1.4.4
> Reporter: Bill Havanki
> Labels: ha, randomwalk, test
>
> Running Concurrent randomwalk under HDFS HA, if the active namenode is killed:
> {noformat}
> 20 12:27:51,119 [retry.RetryInvocationHandler] WARN : Exception while
> invoking class
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete.
> Not retrying because the invoked method is not idempotent, and unable to
> determine whether it was invoked
> java.io.IOException: Failed on local exception: java.io.IOException: Response
> is null.; Host Details : local host is: "slave.domain.com/10.20.200.113";
> destination host is: "namenode.domain.com":8020;
> ...
> at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1487)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:355)
> at
> org.apache.accumulo.server.test.randomwalk.concurrent.BulkImport.visit(BulkImport.java:140)
> ...
> Caused by: java.io.IOException: Response is null.
> at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:952)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:847)
> {noformat}
> This arises from an HDFS path delete call that cleans up from the bulk
> import. The test should be resilient here (and when the paths are made
> earlier in the test) so that the test can continue once failover has
> completed.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)