[ 
https://issues.apache.org/jira/browse/HDFS-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303572#comment-17303572
 ] 

Akira Ajisaka edited comment on HDFS-15900 at 3/17/21, 4:53 PM:
----------------------------------------------------------------

Thanks [~hdaikoku] for your report and analysis. [~hdaikoku] and I did more 
analysis:

In 
https://github.com/apache/hadoop/blob/rel/release-3.3.0/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/impl/MembershipStoreImpl.java#L219-L221,
 we expect the federationSpaceInfo of both AVAILABLE and UNAVAILABLE NameNodes 
are set in {{activeNamespaces}}, however, the hashCode method of 
{{FederationNamespaceInfo}} is not correctly overridden and 
{{activeNamespaces}} actually only contains the first record in a nameservice 
and the latter records are ignored in TreeSet. Therefore if the first record is 
of UNAVAILABLE NameNode, only UNAVAILABLE NameNode will exist in 
{{activeNamespaces}} and the block pool id becomes empty.

Possible fixes:

1. As he commented, we should ignore UNAVAILABLE NameNode registrations.
2. Add missing hashCode and equal methods. (Follow the well-known best practice)

Either of the two is sufficient to fix this issue. However, I think it's better 
to do both.


was (Author: ajisakaa):
Thanks [~hdaikoku] for your report and analysis. [~hdaikoku] and I did more 
analysis:

In 
https://github.com/apache/hadoop/blob/rel/release-3.3.0/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/impl/MembershipStoreImpl.java#L219-L221,
 we expect the federationSpaceInfo of both AVAILABLE and UNAVAILABLE NameNodes 
are set in {{activeNamespaces}}, however, the hashCode method of 
{{FederationNamespaceInfo}} is incorrect and {{activeNamespaces}} actually only 
contains the first record in a nameservice and the latter records are ignored 
in TreeSet. Therefore if the first record is of UNAVAILABLE NameNode, only 
UNAVAILABLE NameNode will exist in {{activeNamespaces}} and the block pool id 
becomes empty.

Possible fixes:

1. As he commented, we should ignore UNAVAILABLE NameNode registrations.
2. Add missing hashCode and equal methods. (Follow the well-known best practice)

Either of the two is sufficient to fix this issue. However, I think it's better 
to do both.

> RBF: empty blockpool id on dfsrouter caused by UNAVAILABLE NameNode
> -------------------------------------------------------------------
>
>                 Key: HDFS-15900
>                 URL: https://issues.apache.org/jira/browse/HDFS-15900
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: rbf
>    Affects Versions: 3.3.0
>            Reporter: Harunobu Daikoku
>            Priority: Major
>         Attachments: image.png
>
>
> We observed that when a NameNode becomes UNAVAILABLE, the corresponding 
> blockpool id in MembershipStoreImpl#activeNamespaces on dfsrouter 
> unintentionally sets to empty, its initial value.
>  !image.png|height=250!
> As a result of this, concat operations through dfsrouter fail with the 
> following error as it cannot resolve the block id in the recognized active 
> namespaces.
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RemoteException): 
> Cannot locate a nameservice for block pool BP-...
> {noformat}
> A possible fix is to ignore UNAVAILABLE NameNode registrations, and set 
> proper namespace information obtained from available NameNode registrations 
> when constructing the cache of active namespaces.
>  
> [https://github.com/apache/hadoop/blob/rel/release-3.3.0/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/impl/MembershipStoreImpl.java#L207-L221]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to