[
https://issues.apache.org/jira/browse/HDFS-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
qinyuren updated HDFS-16484:
----------------------------
Description:
Currently, we ran SPS in our cluster and found this log. The SPSPathIdProcessor
thread enters an infinite loop and prints the same log all the time.
!image-2022-02-25-14-35-42-255.png|width=682,height=195!
In SPSPathIdProcessor thread, if it get a inodeId which path does not exist,
then the SPSPathIdProcessor thread entry infinite loop and can't work normally.
The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not
exist. The inodeId will not be set to null, causing the thread hold this
inodeId forever.
{code:java}
public void run() {
LOG.info("Starting SPSPathIdProcessor!.");
Long startINode = null;
while (ctxt.isRunning()) {
try {
if (!ctxt.isInSafeMode()) {
if (startINode == null) {
startINode = ctxt.getNextSPSPath();
} // else same id will be retried
if (startINode == null) {
// Waiting for SPS path
Thread.sleep(3000);
} else {
ctxt.scanAndCollectFiles(startINode);
// check if directory was empty and no child added to queue
DirPendingWorkInfo dirPendingWorkInfo =
pendingWorkForDirectory.get(startINode);
if (dirPendingWorkInfo != null
&& dirPendingWorkInfo.isDirWorkDone()) {
ctxt.removeSPSHint(startINode);
pendingWorkForDirectory.remove(startINode);
}
}
startINode = null; // Current inode successfully scanned.
}
} catch (Throwable t) {
String reClass = t.getClass().getName();
if (InterruptedException.class.getName().equals(reClass)) {
LOG.info("SPSPathIdProcessor thread is interrupted. Stopping..");
break;
}
LOG.warn("Exception while scanning file inodes to satisfy the policy",
t);
try {
Thread.sleep(3000);
} catch (InterruptedException e) {
LOG.info("Interrupted while waiting in SPSPathIdProcessor", t);
break;
}
}
}
} {code}
was:
In SPSPathIdProcessor thread, if it get a inodeId which path does not exist,
then the SPSPathIdProcessor thread entry infinite loop and can't work normally.
!image-2022-02-25-14-35-42-255.png|width=682,height=195!
> [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread
> -------------------------------------------------------------
>
> Key: HDFS-16484
> URL: https://issues.apache.org/jira/browse/HDFS-16484
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: qinyuren
> Assignee: qinyuren
> Priority: Major
> Labels: pull-request-available
> Attachments: image-2022-02-25-14-35-42-255.png
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Currently, we ran SPS in our cluster and found this log. The
> SPSPathIdProcessor thread enters an infinite loop and prints the same log all
> the time.
> !image-2022-02-25-14-35-42-255.png|width=682,height=195!
> In SPSPathIdProcessor thread, if it get a inodeId which path does not exist,
> then the SPSPathIdProcessor thread entry infinite loop and can't work
> normally.
> The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not
> exist. The inodeId will not be set to null, causing the thread hold this
> inodeId forever.
> {code:java}
> public void run() {
> LOG.info("Starting SPSPathIdProcessor!.");
> Long startINode = null;
> while (ctxt.isRunning()) {
> try {
> if (!ctxt.isInSafeMode()) {
> if (startINode == null) {
> startINode = ctxt.getNextSPSPath();
> } // else same id will be retried
> if (startINode == null) {
> // Waiting for SPS path
> Thread.sleep(3000);
> } else {
> ctxt.scanAndCollectFiles(startINode);
> // check if directory was empty and no child added to queue
> DirPendingWorkInfo dirPendingWorkInfo =
> pendingWorkForDirectory.get(startINode);
> if (dirPendingWorkInfo != null
> && dirPendingWorkInfo.isDirWorkDone()) {
> ctxt.removeSPSHint(startINode);
> pendingWorkForDirectory.remove(startINode);
> }
> }
> startINode = null; // Current inode successfully scanned.
> }
> } catch (Throwable t) {
> String reClass = t.getClass().getName();
> if (InterruptedException.class.getName().equals(reClass)) {
> LOG.info("SPSPathIdProcessor thread is interrupted. Stopping..");
> break;
> }
> LOG.warn("Exception while scanning file inodes to satisfy the policy",
> t);
> try {
> Thread.sleep(3000);
> } catch (InterruptedException e) {
> LOG.info("Interrupted while waiting in SPSPathIdProcessor", t);
> break;
> }
> }
> }
> } {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]