[ 
https://issues.apache.org/jira/browse/HDFS-16485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18038604#comment-18038604
 ] 

ASF GitHub Bot commented on HDFS-16485:
---------------------------------------

github-actions[bot] closed pull request #4033: HDFS-16485. [SPS]: allow 
re-satisfy path after restarting sps process
URL: https://github.com/apache/hadoop/pull/4033




> [SPS]: allow re-satisfy path after restarting sps process
> ---------------------------------------------------------
>
>                 Key: HDFS-16485
>                 URL: https://issues.apache.org/jira/browse/HDFS-16485
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: qinyuren
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When SPSPathIdProcessor thread call getNextSPSPath(), it get the pathId from 
> namenode and namenode will also remove this pathId from pathsToBeTraveresed 
> queue.
> {code:java}
> public Long getNextPathId() {
>   synchronized (pathsToBeTraveresed) {
>     return pathsToBeTraveresed.poll();
>   }
> } {code}
> If SPS process restart, this path will not continue the move operation until 
> namenode restart.
> So we want to provide a way for the SPS to continue performing the move 
> operation after SPS restart.
> First solution: 
> 1) When SPSPathIdProcessor thread call getNextSPSPath(), namenode return 
> pathId and then move this pathId to a pathsBeingTraveresed queue;
> 2) After SPS finish a path movement operation, it call a rpc to namenode to 
> remove this pathId from pathsBeingTraveresed queue;
> 3) If SPS restart, SPSPathIdProcessor thread should call a rpc to namenode to 
> get all pathId from pathsBeingTraveresed queue;
> Second solution:
> We added timeout detection in the application layer, if a path does not 
> complete the movement within the specified time, we can re-satisfy this path 
> even though it has "hdfs.sps" xattr already.
> We choose the second solution because the first solution will add more rpc 
> operation and may affect namenode performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to