[ https://issues.apache.org/jira/browse/DL-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769322#comment-15769322 ]
Liang Xie commented on DL-145: ------------------------------ ping [~si...@apache.org] :) > Fix the flaky testServiceTimeout > -------------------------------- > > Key: DL-145 > URL: https://issues.apache.org/jira/browse/DL-145 > Project: DistributedLog > Issue Type: Test > Components: distributedlog-service > Affects Versions: 0.4.0 > Reporter: Liang Xie > Assignee: Liang Xie > > The TestDistributedLogService#testServiceTimeout case is not stable, e.g. > https://builds.apache.org/job/distributedlog-precommit-pullrequest/22/com.twitter$distributedlog-service/testReport/com.twitter.distributedlog.service/TestDistributedLogService/testServiceTimeout/ > It could be reproduced on my box occasionally, and the failures were stable > if i tuned the ServiceTimeoutMs from 200 to 150, and always passed if tuned > to a larger value, e.g. 1000(btw, my disk is SSD type) > After digging into it, shows it related with starting a new log segment > corner case. > For a good case, once service time out occurs, steam status : ERROR -> > CLOSING -> CLOSED, calling Abortables.asyncAbort will trigger the cached > logsegment be aborted, then writeOp will be injected an exception, e.g. write > cancel exception. > For a bad case, since no log records be written before, so there'll be an > async start new log segment, once the timeout occurs, the segment starting > still not be done, so no cache, then asyncAbort has no change to abort that > segment. > I think change the test timeout value to a larger one should be fine for this > special test corner case. > will attach a minor patch later. Any suggestions are welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)