[ https://issues.apache.org/jira/browse/HDFS-17602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhongkun Wu updated HDFS-17602: ------------------------------- Description: We have seventy thousand nodes in our production cluster, we use rbf to do namespace federation. when uses space order in rbf, and writes to the mount point, it fails to write and generates an empty file now and then. ---- We dug into the code and found the root cause: When create a file in rbf, after create rpc was invoked,the addblock rpc was invoked till the write was done. The rbf space resolver would choose an irrelevant namespace every now and then, and the client will write data to the wrong location ---- These are the code fragments: !image-2024-08-12-10-08-58-271.png! In the MultipleDestinationMountTableResolver.java we invoke orderedResolver.getFirstNamespace(path, mountTableResult); It will then invoke this function in RouterResolver.java !image-2024-08-12-10-12-48-428.png! and now we are in chooseFirstNamespace function in AvailableSpaceResolver.java !image-2024-08-12-10-14-20-580.png! The path parameter is the destination where we want to create a file the loc parameter is the mount point we set this function will choose the most available namespace in all the namespace we have in StateStore, which is not the same as the mount point we set for our destination. As a result we will get a namespace irrelevant to the namespaces we set for the destination path !image-2024-08-12-10-25-42-863.png! in the log above: we get the namespace we don't set with our destination path, So the it will choose the first namespace it sees and it's not really the most available namespace among the namespaces we set for our destination was: We have seventy thousand nodes in our production cluster, we use rbf to do namespace federation. when uses space order in rbf, and writes to the mount point, it fails to write and generated an empty file. ---- we diged in the code and find : when create a file in rbf, after create rpc was invoked,and then the addblock rpc was invoked and rbf space resolver would choose an irrelevant namespace every now and then. it was the root cause of the problem. ---- these are the code fragments: !image-2024-08-12-10-08-58-271.png! In the MultipleDestinationMountTableResolver.java we invoke orderedResolver.getFirstNamespace(path, mountTableResult); It will then invoke this function in RouterResolver.java !image-2024-08-12-10-12-48-428.png! and now we are in chooseFirstNamespace function in AvailableSpaceResolver.java !image-2024-08-12-10-14-20-580.png! The path parameter is the destination where we want to create a file the loc parameter is the mount point we set this function will choose the most available namespace in all the namespace we have in StateStore, which is not the same as the mount point we set for our destination. As a result we will get a namespace irrelevant to the namespaces we set for the destination path !image-2024-08-12-10-25-42-863.png! in the log above: we get the namespace we don't set with our destination path, So the it will choose the first namespace it sees and it's not really the most available namespace among the namespaces we set for our destination > RBF mount point with SPACE order can not find the most available namespace, > it will choose an irrelevant namespace > ------------------------------------------------------------------------------------------------------------------ > > Key: HDFS-17602 > URL: https://issues.apache.org/jira/browse/HDFS-17602 > Project: Hadoop HDFS > Issue Type: Bug > Components: router > Reporter: Zhongkun Wu > Priority: Critical > Labels: pull-request-available > Attachments: image-2024-08-12-10-08-54-031.png, > image-2024-08-12-10-08-58-271.png, image-2024-08-12-10-12-48-428.png, > image-2024-08-12-10-14-20-580.png, image-2024-08-12-10-25-26-003.png, > image-2024-08-12-10-25-42-863.png > > > We have seventy thousand nodes in our production cluster, we use rbf to do > namespace federation. when uses space order in rbf, and writes to the mount > point, it fails to write and generates an empty file now and then. > ---- > We dug into the code and found the root cause: When create a file in rbf, > after create rpc was invoked,the addblock rpc was invoked till the write was > done. The rbf space resolver would choose an irrelevant namespace every now > and then, and the client will write data to the wrong location > ---- > These are the code fragments: > > !image-2024-08-12-10-08-58-271.png! > In the > MultipleDestinationMountTableResolver.java we invoke > orderedResolver.getFirstNamespace(path, mountTableResult); > It will then invoke this function in RouterResolver.java > !image-2024-08-12-10-12-48-428.png! > and now we are in > chooseFirstNamespace function in AvailableSpaceResolver.java > !image-2024-08-12-10-14-20-580.png! > > The path parameter is the destination where we want to create a file > the loc parameter is the mount point we set > > this function will choose the most available namespace in all the namespace > we have in StateStore, which is not the same as the mount point we set for > our destination. > > As a result we will get a namespace irrelevant to the namespaces we set for > the destination path > > !image-2024-08-12-10-25-42-863.png! > in the log above: > we get the namespace we don't set with our destination path, So the it will > choose the first namespace it sees and it's not really the most available > namespace among the namespaces we set for our destination > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org