[ 
https://issues.apache.org/jira/browse/HDFS-17602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongkun Wu updated HDFS-17602:
-------------------------------
    Description: 
We have seventy thousand nodes in our production hadoop cluster, we use rbf to 
do namespace federation. when uses space order in rbf, and writes to the mount 
point, it fails to write and generates an empty file now and then.  
----
We dug into the code and found the root cause: When create a file in rbf, after 
create rpc was invoked,the addblock rpc was invoked till the write was done. 
The rbf space resolver would choose an irrelevant namespace every now and then, 
and the client will write data to the wrong location
----
These are the code fragments:

 

!image-2024-08-12-10-08-58-271.png!

In the 
MultipleDestinationMountTableResolver.java we invoke 
orderedResolver.getFirstNamespace(path, mountTableResult);
It will then invoke this function in RouterResolver.java
!image-2024-08-12-10-12-48-428.png!
and now we are in 
chooseFirstNamespace function in AvailableSpaceResolver.java 
!image-2024-08-12-10-14-20-580.png!
 
The path parameter is the destination where we want to create a file
the loc parameter is the mount point we set
 
this function will choose the most available namespace in all the namespace we 
have in StateStore, which is not the same as the mount point we set for our 
destination.
 
As a result we will get a namespace  irrelevant to the namespaces we set for 
the destination path

 

!image-2024-08-12-10-25-42-863.png!

in the log above:

we get the namespace we don't set with our destination path, So the it will 
choose the first namespace it sees and it's not really the most available 
namespace among the namespaces we set for our destination

 

 
 
 

  was:
We have seventy thousand nodes in our production cluster, we use rbf to do 
namespace federation. when uses space order in rbf, and writes to the mount 
point, it fails to write and generates an empty file now and then.  
----
We dug into the code and found the root cause: When create a file in rbf, after 
create rpc was invoked,the addblock rpc was invoked till the write was done. 
The rbf space resolver would choose an irrelevant namespace every now and then, 
and the client will write data to the wrong location
----
These are the code fragments:

 

!image-2024-08-12-10-08-58-271.png!

In the 
MultipleDestinationMountTableResolver.java we invoke 
orderedResolver.getFirstNamespace(path, mountTableResult);
It will then invoke this function in RouterResolver.java
!image-2024-08-12-10-12-48-428.png!
and now we are in 
chooseFirstNamespace function in AvailableSpaceResolver.java 
!image-2024-08-12-10-14-20-580.png!
 
The path parameter is the destination where we want to create a file
the loc parameter is the mount point we set
 
this function will choose the most available namespace in all the namespace we 
have in StateStore, which is not the same as the mount point we set for our 
destination.
 
As a result we will get a namespace  irrelevant to the namespaces we set for 
the destination path

 

!image-2024-08-12-10-25-42-863.png!

in the log above:

we get the namespace we don't set with our destination path, So the it will 
choose the first namespace it sees and it's not really the most available 
namespace among the namespaces we set for our destination

 

 
 
 


> RBF mount point with SPACE order can not find the most available namespace, 
> it will choose an irrelevant namespace
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17602
>                 URL: https://issues.apache.org/jira/browse/HDFS-17602
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: router
>            Reporter: Zhongkun Wu
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: image-2024-08-12-10-08-54-031.png, 
> image-2024-08-12-10-08-58-271.png, image-2024-08-12-10-12-48-428.png, 
> image-2024-08-12-10-14-20-580.png, image-2024-08-12-10-25-26-003.png, 
> image-2024-08-12-10-25-42-863.png
>
>
> We have seventy thousand nodes in our production hadoop cluster, we use rbf 
> to do namespace federation. when uses space order in rbf, and writes to the 
> mount point, it fails to write and generates an empty file now and then.  
> ----
> We dug into the code and found the root cause: When create a file in rbf, 
> after create rpc was invoked,the addblock rpc was invoked till the write was 
> done. The rbf space resolver would choose an irrelevant namespace every now 
> and then, and the client will write data to the wrong location
> ----
> These are the code fragments:
>  
> !image-2024-08-12-10-08-58-271.png!
> In the 
> MultipleDestinationMountTableResolver.java we invoke 
> orderedResolver.getFirstNamespace(path, mountTableResult);
> It will then invoke this function in RouterResolver.java
> !image-2024-08-12-10-12-48-428.png!
> and now we are in 
> chooseFirstNamespace function in AvailableSpaceResolver.java 
> !image-2024-08-12-10-14-20-580.png!
>  
> The path parameter is the destination where we want to create a file
> the loc parameter is the mount point we set
>  
> this function will choose the most available namespace in all the namespace 
> we have in StateStore, which is not the same as the mount point we set for 
> our destination.
>  
> As a result we will get a namespace  irrelevant to the namespaces we set for 
> the destination path
>  
> !image-2024-08-12-10-25-42-863.png!
> in the log above:
> we get the namespace we don't set with our destination path, So the it will 
> choose the first namespace it sees and it's not really the most available 
> namespace among the namespaces we set for our destination
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to