[
https://issues.apache.org/jira/browse/HDFS-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16611039#comment-16611039
]
Chao Sun commented on HDFS-13749:
---------------------------------
The test failure is because in {{testMultiObserver}}, we shutdown a observer
and then restart it, and we expect the RPC should go to the observer once it is
restarted.
However, it's interesting that after the observer is restarted, the
{{getServiceStatus}} call will fail with EOF exception. I tried by wrapping the
proxy with a RetryPolicy like the following:
{code}
public static HAServiceProtocol createNonHAProxyWithHAServiceProtocol(
InetSocketAddress address, Configuration conf) throws IOException {
RetryPolicy timeoutPolicy = RetryPolicies.exponentialBackoffRetry(5, 200,
TimeUnit.MILLISECONDS);
HAServiceProtocol proxy =
new HAServiceProtocolClientSideTranslatorPB(
address, conf, NetUtils.getDefaultSocketFactory(conf),
30000);
Map<String,RetryPolicy> methodNameToPolicyMap = new HashMap<>();
return (HAServiceProtocol) RetryProxy.create(
HAServiceProtocol.class,
new DefaultFailoverProxyProvider<>(HAServiceProtocol.class, proxy),
methodNameToPolicyMap,
timeoutPolicy
);
{code}
but it still failed after multiple retries, with connection refused exception.
However, if I add a simple look in the {{refreshCachedState}}, then it always
succeed on the second try:
{code}
public void refreshCachedState() {
for (int i = 0; i < 3; i++) {
try {
cachedState = serviceProxy.getServiceStatus().getState();
LOG.info("Successfully set cache state to " + cachedState.name());
return;
} catch (IOException e) {
LOG.warn("Failed to connect to {}. Setting cached state to Standby",
address, e);
cachedState = HAServiceState.STANDBY;
}
}
}
{code}
> Use getServiceStatus to discover observer namenodes
> ---------------------------------------------------
>
> Key: HDFS-13749
> URL: https://issues.apache.org/jira/browse/HDFS-13749
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Chao Sun
> Assignee: Chao Sun
> Priority: Major
> Attachments: HDFS-13749-HDFS-12943.000.patch,
> HDFS-13749-HDFS-12943.001.patch, HDFS-13749-HDFS-12943.002.patch
>
>
> In HDFS-12976 currently we discover NameNode state by calling
> {{reportBadBlocks}} as a temporary solution. Here, we'll properly implement
> this by using {{HAServiceProtocol#getServiceStatus}}.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]