[ 
https://issues.apache.org/jira/browse/HDFS-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16484.
-------------------------------------
    Fix Version/s: 3.4.0
                   3.2.4
                   3.3.4
       Resolution: Fixed

> [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread 
> -------------------------------------------------------------
>
>                 Key: HDFS-16484
>                 URL: https://issues.apache.org/jira/browse/HDFS-16484
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: qinyuren
>            Assignee: qinyuren
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.0, 3.2.4, 3.3.4
>
>         Attachments: image-2022-02-25-14-35-42-255.png
>
>          Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently, we ran SPS in our cluster and found this log. The 
> SPSPathIdProcessor thread enters an infinite loop and prints the same log all 
> the time.
> !image-2022-02-25-14-35-42-255.png|width=682,height=195!
> In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, 
> then the SPSPathIdProcessor thread entry infinite loop and can't work 
> normally. 
> The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not 
> exist. The inodeId will not be set to null, causing the thread hold this 
> inodeId forever.
> {code:java}
> public void run() {
>   LOG.info("Starting SPSPathIdProcessor!.");
>   Long startINode = null;
>   while (ctxt.isRunning()) {
>     try {
>       if (!ctxt.isInSafeMode()) {
>         if (startINode == null) {
>           startINode = ctxt.getNextSPSPath();
>         } // else same id will be retried
>         if (startINode == null) {
>           // Waiting for SPS path
>           Thread.sleep(3000);
>         } else {
>           ctxt.scanAndCollectFiles(startINode);
>           // check if directory was empty and no child added to queue
>           DirPendingWorkInfo dirPendingWorkInfo =
>               pendingWorkForDirectory.get(startINode);
>           if (dirPendingWorkInfo != null
>               && dirPendingWorkInfo.isDirWorkDone()) {
>             ctxt.removeSPSHint(startINode);
>             pendingWorkForDirectory.remove(startINode);
>           }
>         }
>         startINode = null; // Current inode successfully scanned.
>       }
>     } catch (Throwable t) {
>       String reClass = t.getClass().getName();
>       if (InterruptedException.class.getName().equals(reClass)) {
>         LOG.info("SPSPathIdProcessor thread is interrupted. Stopping..");
>         break;
>       }
>       LOG.warn("Exception while scanning file inodes to satisfy the policy",
>           t);
>       try {
>         Thread.sleep(3000);
>       } catch (InterruptedException e) {
>         LOG.info("Interrupted while waiting in SPSPathIdProcessor", t);
>         break;
>       }
>     }
>   }
> } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to