[
https://issues.apache.org/jira/browse/HDFS-16484?focusedWorklogId=755602&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755602
]
ASF GitHub Bot logged work on HDFS-16484:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 12/Apr/22 05:09
Start Date: 12/Apr/22 05:09
Worklog Time Spent: 10m
Work Description: tasanuma commented on code in PR #4032:
URL: https://github.com/apache/hadoop/pull/4032#discussion_r847973070
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/sps/BlockStorageMovementNeeded.java:
##########
@@ -248,13 +251,22 @@ public void run() {
pendingWorkForDirectory.get(startINode);
if (dirPendingWorkInfo != null
&& dirPendingWorkInfo.isDirWorkDone()) {
- ctxt.removeSPSHint(startINode);
+ try {
+ ctxt.removeSPSHint(startINode);
+ } catch (FileNotFoundException e) {
+ // ignore if the file doesn't already exist
+ startINode = null;
+ }
pendingWorkForDirectory.remove(startINode);
}
}
startINode = null; // Current inode successfully scanned.
}
} catch (Throwable t) {
+ retryCount++;
+ if (retryCount >= 3) {
+ startINode = null;
+ }
Review Comment:
@liubingxing
- Let's define the constant of the max retry count (`private static final
int MAX_RETRY_COUNT = 3;`) in `SPSPathIdProcessor`.
- How about logging a message when skipping the inode?
```suggestion
retryCount++;
if (retryCount >= MAX_RETRY_COUNT) {
LOG.warn("Skipping this inode {} due to too many retries.",
startINode);
startINode = null;
}
```
- And I think it's better to move the retry logic to the end of the catch
block.
Issue Time Tracking
-------------------
Worklog Id: (was: 755602)
Time Spent: 3h 10m (was: 3h)
> [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread
> -------------------------------------------------------------
>
> Key: HDFS-16484
> URL: https://issues.apache.org/jira/browse/HDFS-16484
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: qinyuren
> Assignee: qinyuren
> Priority: Major
> Labels: pull-request-available
> Attachments: image-2022-02-25-14-35-42-255.png
>
> Time Spent: 3h 10m
> Remaining Estimate: 0h
>
> Currently, we ran SPS in our cluster and found this log. The
> SPSPathIdProcessor thread enters an infinite loop and prints the same log all
> the time.
> !image-2022-02-25-14-35-42-255.png|width=682,height=195!
> In SPSPathIdProcessor thread, if it get a inodeId which path does not exist,
> then the SPSPathIdProcessor thread entry infinite loop and can't work
> normally.
> The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not
> exist. The inodeId will not be set to null, causing the thread hold this
> inodeId forever.
> {code:java}
> public void run() {
> LOG.info("Starting SPSPathIdProcessor!.");
> Long startINode = null;
> while (ctxt.isRunning()) {
> try {
> if (!ctxt.isInSafeMode()) {
> if (startINode == null) {
> startINode = ctxt.getNextSPSPath();
> } // else same id will be retried
> if (startINode == null) {
> // Waiting for SPS path
> Thread.sleep(3000);
> } else {
> ctxt.scanAndCollectFiles(startINode);
> // check if directory was empty and no child added to queue
> DirPendingWorkInfo dirPendingWorkInfo =
> pendingWorkForDirectory.get(startINode);
> if (dirPendingWorkInfo != null
> && dirPendingWorkInfo.isDirWorkDone()) {
> ctxt.removeSPSHint(startINode);
> pendingWorkForDirectory.remove(startINode);
> }
> }
> startINode = null; // Current inode successfully scanned.
> }
> } catch (Throwable t) {
> String reClass = t.getClass().getName();
> if (InterruptedException.class.getName().equals(reClass)) {
> LOG.info("SPSPathIdProcessor thread is interrupted. Stopping..");
> break;
> }
> LOG.warn("Exception while scanning file inodes to satisfy the policy",
> t);
> try {
> Thread.sleep(3000);
> } catch (InterruptedException e) {
> LOG.info("Interrupted while waiting in SPSPathIdProcessor", t);
> break;
> }
> }
> }
> } {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]