[ 
https://issues.apache.org/jira/browse/DL-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769322#comment-15769322
 ] 

Liang Xie commented on DL-145:
------------------------------

ping [~si...@apache.org] :)

> Fix the flaky testServiceTimeout
> --------------------------------
>
>                 Key: DL-145
>                 URL: https://issues.apache.org/jira/browse/DL-145
>             Project: DistributedLog
>          Issue Type: Test
>          Components: distributedlog-service
>    Affects Versions: 0.4.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>
> The TestDistributedLogService#testServiceTimeout case is not stable, e.g. 
> https://builds.apache.org/job/distributedlog-precommit-pullrequest/22/com.twitter$distributedlog-service/testReport/com.twitter.distributedlog.service/TestDistributedLogService/testServiceTimeout/
> It could be reproduced on my box occasionally, and the failures were stable 
> if i tuned the ServiceTimeoutMs from 200 to 150, and always passed if tuned 
> to a larger value, e.g. 1000(btw, my disk is SSD type)
> After digging into it, shows it related with starting a new log segment 
> corner case.
> For a good case, once service time out occurs, steam status : ERROR -> 
> CLOSING -> CLOSED, calling Abortables.asyncAbort will trigger the cached 
> logsegment be aborted, then writeOp will be injected an exception, e.g. write 
> cancel exception.
> For a bad case, since no log records be written before, so there'll be an 
> async start new log segment, once the timeout occurs, the segment starting 
> still not be done, so no cache, then asyncAbort has no change to abort that 
> segment.
> I think change the test timeout value to a larger one should be fine for this 
> special test corner case.
> will attach a minor patch later.  Any suggestions are welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to