[
https://issues.apache.org/jira/browse/HDFS-16484?focusedWorklogId=754738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-754738
]
ASF GitHub Bot logged work on HDFS-16484:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 08/Apr/22 17:40
Start Date: 08/Apr/22 17:40
Worklog Time Spent: 10m
Work Description: tasanuma commented on code in PR #4032:
URL: https://github.com/apache/hadoop/pull/4032#discussion_r846350026
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/sps/BlockStorageMovementNeeded.java:
##########
@@ -232,6 +233,7 @@ public synchronized void clearQueuesWithNotification() {
public void run() {
LOG.info("Starting SPSPathIdProcessor!.");
Long startINode = null;
+ int retryCount = 0;
Review Comment:
@liubingxing Thanks for the explanation. I got it.
Issue Time Tracking
-------------------
Worklog Id: (was: 754738)
Time Spent: 2h (was: 1h 50m)
> [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread
> -------------------------------------------------------------
>
> Key: HDFS-16484
> URL: https://issues.apache.org/jira/browse/HDFS-16484
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: qinyuren
> Assignee: qinyuren
> Priority: Major
> Labels: pull-request-available
> Attachments: image-2022-02-25-14-35-42-255.png
>
> Time Spent: 2h
> Remaining Estimate: 0h
>
> Currently, we ran SPS in our cluster and found this log. The
> SPSPathIdProcessor thread enters an infinite loop and prints the same log all
> the time.
> !image-2022-02-25-14-35-42-255.png|width=682,height=195!
> In SPSPathIdProcessor thread, if it get a inodeId which path does not exist,
> then the SPSPathIdProcessor thread entry infinite loop and can't work
> normally.
> The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not
> exist. The inodeId will not be set to null, causing the thread hold this
> inodeId forever.
> {code:java}
> public void run() {
> LOG.info("Starting SPSPathIdProcessor!.");
> Long startINode = null;
> while (ctxt.isRunning()) {
> try {
> if (!ctxt.isInSafeMode()) {
> if (startINode == null) {
> startINode = ctxt.getNextSPSPath();
> } // else same id will be retried
> if (startINode == null) {
> // Waiting for SPS path
> Thread.sleep(3000);
> } else {
> ctxt.scanAndCollectFiles(startINode);
> // check if directory was empty and no child added to queue
> DirPendingWorkInfo dirPendingWorkInfo =
> pendingWorkForDirectory.get(startINode);
> if (dirPendingWorkInfo != null
> && dirPendingWorkInfo.isDirWorkDone()) {
> ctxt.removeSPSHint(startINode);
> pendingWorkForDirectory.remove(startINode);
> }
> }
> startINode = null; // Current inode successfully scanned.
> }
> } catch (Throwable t) {
> String reClass = t.getClass().getName();
> if (InterruptedException.class.getName().equals(reClass)) {
> LOG.info("SPSPathIdProcessor thread is interrupted. Stopping..");
> break;
> }
> LOG.warn("Exception while scanning file inodes to satisfy the policy",
> t);
> try {
> Thread.sleep(3000);
> } catch (InterruptedException e) {
> LOG.info("Interrupted while waiting in SPSPathIdProcessor", t);
> break;
> }
> }
> }
> } {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]