[ https://issues.apache.org/jira/browse/DL-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15763511#comment-15763511 ]
ASF GitHub Bot commented on DL-145: ----------------------------------- GitHub user xieliang opened a pull request: https://github.com/apache/incubator-distributedlog/pull/78 DL-145 : the write requests should be error out immediately even if the rolling writer still be creating Passed all test cases locally, now TestDistributedLogService#testServiceTimeout case is stable on my box You can merge this pull request into a Git repository by running: $ git pull https://github.com/xieliang/incubator-distributedlog DL-145 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-distributedlog/pull/78.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #78 ---- commit 6be8ca4d01c4c40947d4b901f0299c8dcc97c509 Author: xieliang <xieliang...@gmail.com> Date: 2016-12-20T07:19:38Z the write requests should be error out immediately even if the rolling writer still be creating ---- > Fix the flaky testServiceTimeout > -------------------------------- > > Key: DL-145 > URL: https://issues.apache.org/jira/browse/DL-145 > Project: DistributedLog > Issue Type: Test > Components: distributedlog-service > Affects Versions: 0.4.0 > Reporter: Liang Xie > Assignee: Liang Xie > > The TestDistributedLogService#testServiceTimeout case is not stable, e.g. > https://builds.apache.org/job/distributedlog-precommit-pullrequest/22/com.twitter$distributedlog-service/testReport/com.twitter.distributedlog.service/TestDistributedLogService/testServiceTimeout/ > It could be reproduced on my box occasionally, and the failures were stable > if i tuned the ServiceTimeoutMs from 200 to 150, and always passed if tuned > to a larger value, e.g. 1000(btw, my disk is SSD type) > After digging into it, shows it related with starting a new log segment > corner case. > For a good case, once service time out occurs, steam status : ERROR -> > CLOSING -> CLOSED, calling Abortables.asyncAbort will trigger the cached > logsegment be aborted, then writeOp will be injected an exception, e.g. write > cancel exception. > For a bad case, since no log records be written before, so there'll be an > async start new log segment, once the timeout occurs, the segment starting > still not be done, so no cache, then asyncAbort has no change to abort that > segment. > I think change the test timeout value to a larger one should be fine for this > special test corner case. > will attach a minor patch later. Any suggestions are welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)