[ 
https://issues.apache.org/jira/browse/HDFS-16623?focusedWorklogId=779260&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779260
 ]

ASF GitHub Bot logged work on HDFS-16623:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Jun/22 20:58
            Start Date: 07/Jun/22 20:58
    Worklog Time Spent: 10m 
      Work Description: cnauroth commented on PR #4409:
URL: https://github.com/apache/hadoop/pull/4409#issuecomment-1149161121

   @ZanderXu , yes, I was thinking of just testing that `getLifelineWaitTime()` 
only returns non-negative numbers. There is a similar kind of test in 
`TestBpServiceActorScheduler#testScheduleLifeline`, but it doesn't yet cover 
the case that would lead to a negative value.
   
   I think testing for LifelineSender thread exit would be more complete, but 
also a lot more complex. Testing directly against the `getLifelineWaitTime()` 
return values is a good compromise.
   
   Thanks!




Issue Time Tracking
-------------------

    Worklog Id:     (was: 779260)
    Time Spent: 40m  (was: 0.5h)

> IllegalArgumentException in LifelineSender
> ------------------------------------------
>
>                 Key: HDFS-16623
>                 URL: https://issues.apache.org/jira/browse/HDFS-16623
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> In our production environment, an IllegalArgumentException occurred in the 
> LifelineSender at one DataNode which was undergoing GC at that time. 
> And the bug code is at line 1060 in BPServiceActor.java, because the sleep 
> time is negative.
> {code:java}
> while (shouldRun()) {
>      try {
>         if (lifelineNamenode == null) {
>           lifelineNamenode = dn.connectToLifelineNN(lifelineNnAddr);
>         }
>         sendLifelineIfDue();
>         Thread.sleep(scheduler.getLifelineWaitTime());
>       } catch (InterruptedException e) {
>         Thread.currentThread().interrupt();
>       } catch (IOException e) {
>         LOG.warn("IOException in LifelineSender for " + BPServiceActor.this, 
> e);
>      }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to