[ 
https://issues.apache.org/jira/browse/HDFS-16484?focusedWorklogId=756148&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-756148
 ]

ASF GitHub Bot logged work on HDFS-16484:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Apr/22 03:22
            Start Date: 13/Apr/22 03:22
    Worklog Time Spent: 10m 
      Work Description: liubingxing commented on PR #4032:
URL: https://github.com/apache/hadoop/pull/4032#issuecomment-1097512478

   @tasanuma Thanks for your review and merge




Issue Time Tracking
-------------------

    Worklog Id:     (was: 756148)
    Time Spent: 4h 10m  (was: 4h)

> [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread 
> -------------------------------------------------------------
>
>                 Key: HDFS-16484
>                 URL: https://issues.apache.org/jira/browse/HDFS-16484
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: qinyuren
>            Assignee: qinyuren
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.0, 3.2.4, 3.3.4
>
>         Attachments: image-2022-02-25-14-35-42-255.png
>
>          Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Currently, we ran SPS in our cluster and found this log. The 
> SPSPathIdProcessor thread enters an infinite loop and prints the same log all 
> the time.
> !image-2022-02-25-14-35-42-255.png|width=682,height=195!
> In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, 
> then the SPSPathIdProcessor thread entry infinite loop and can't work 
> normally. 
> The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not 
> exist. The inodeId will not be set to null, causing the thread hold this 
> inodeId forever.
> {code:java}
> public void run() {
>   LOG.info("Starting SPSPathIdProcessor!.");
>   Long startINode = null;
>   while (ctxt.isRunning()) {
>     try {
>       if (!ctxt.isInSafeMode()) {
>         if (startINode == null) {
>           startINode = ctxt.getNextSPSPath();
>         } // else same id will be retried
>         if (startINode == null) {
>           // Waiting for SPS path
>           Thread.sleep(3000);
>         } else {
>           ctxt.scanAndCollectFiles(startINode);
>           // check if directory was empty and no child added to queue
>           DirPendingWorkInfo dirPendingWorkInfo =
>               pendingWorkForDirectory.get(startINode);
>           if (dirPendingWorkInfo != null
>               && dirPendingWorkInfo.isDirWorkDone()) {
>             ctxt.removeSPSHint(startINode);
>             pendingWorkForDirectory.remove(startINode);
>           }
>         }
>         startINode = null; // Current inode successfully scanned.
>       }
>     } catch (Throwable t) {
>       String reClass = t.getClass().getName();
>       if (InterruptedException.class.getName().equals(reClass)) {
>         LOG.info("SPSPathIdProcessor thread is interrupted. Stopping..");
>         break;
>       }
>       LOG.warn("Exception while scanning file inodes to satisfy the policy",
>           t);
>       try {
>         Thread.sleep(3000);
>       } catch (InterruptedException e) {
>         LOG.info("Interrupted while waiting in SPSPathIdProcessor", t);
>         break;
>       }
>     }
>   }
> } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to