On Fri, Aug 3, 2018 at 3:07 PM Karthik Subrahmanya <[email protected]> wrote:
> > > On Fri, Aug 3, 2018 at 2:12 PM Karthik Subrahmanya <[email protected]> > wrote: > >> >> >> On Thu, Aug 2, 2018 at 11:00 PM Karthik Subrahmanya <[email protected]> >> wrote: >> >>> >>> >>> On Tue 31 Jul, 2018, 10:17 PM Atin Mukherjee, <[email protected]> >>> wrote: >>> >>>> I just went through the nightly regression report of brick mux runs and >>>> here's what I can summarize. >>>> >>>> >>>> ========================================================================================================================================================================= >>>> Fails only with brick-mux >>>> >>>> ========================================================================================================================================================================= >>>> tests/bugs/core/bug-1432542-mpx-restart-crash.t - Times out even after >>>> 400 secs. Refer >>>> https://fstat.gluster.org/failure/209?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all, >>>> specifically the latest report >>>> https://build.gluster.org/job/regression-test-burn-in/4051/consoleText >>>> . Wasn't timing out as frequently as it was till 12 July. But since 27 >>>> July, it has timed out twice. Beginning to believe commit >>>> 9400b6f2c8aa219a493961e0ab9770b7f12e80d2 has added the delay and now 400 >>>> secs isn't sufficient enough (Mohit?) >>>> >>>> tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t >>>> (Ref - >>>> https://build.gluster.org/job/regression-test-with-multiplex/814/console) >>>> - Test fails only in brick-mux mode, AI on Atin to look at and get back. >>>> >>>> tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t ( >>>> https://build.gluster.org/job/regression-test-with-multiplex/813/console) >>>> - Seems like failed just twice in last 30 days as per >>>> https://fstat.gluster.org/failure/251?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all. >>>> Need help from AFR team. >>>> >>>> tests/bugs/quota/bug-1293601.t ( >>>> https://build.gluster.org/job/regression-test-with-multiplex/812/console) >>>> - Hasn't failed after 26 July and earlier it was failing regularly. Did we >>>> fix this test through any patch (Mohit?) >>>> >>>> tests/bitrot/bug-1373520.t - ( >>>> https://build.gluster.org/job/regression-test-with-multiplex/811/console) >>>> - Hasn't failed after 27 July and earlier it was failing regularly. Did we >>>> fix this test through any patch (Mohit?) >>>> >>>> tests/bugs/glusterd/remove-brick-testcases.t - Failed once with a core, >>>> not sure if related to brick mux or not, so not sure if brick mux is >>>> culprit here or not. Ref - >>>> https://build.gluster.org/job/regression-test-with-multiplex/806/console >>>> . Seems to be a glustershd crash. Need help from AFR folks. >>>> >>>> >>>> ========================================================================================================================================================================= >>>> Fails for non-brick mux case too >>>> >>>> ========================================================================================================================================================================= >>>> tests/bugs/distribute/bug-1122443.t 0 Seems to be failing at my setup >>>> very often, with out brick mux as well. Refer >>>> https://build.gluster.org/job/regression-test-burn-in/4050/consoleText >>>> . There's an email in gluster-devel and a BZ 1610240 for the same. >>>> >>>> tests/bugs/bug-1368312.t - Seems to be recent failures ( >>>> https://build.gluster.org/job/regression-test-with-multiplex/815/console) >>>> - seems to be a new failure, however seen this for a non-brick-mux case too >>>> - >>>> https://build.gluster.org/job/regression-test-burn-in/4039/consoleText >>>> . Need some eyes from AFR folks. >>>> >>>> tests/00-geo-rep/georep-basic-dr-tarssh.t - this isn't specific to >>>> brick mux, have seen this failing at multiple default regression runs. >>>> Refer >>>> https://fstat.gluster.org/failure/392?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all >>>> . We need help from geo-rep dev to root cause this earlier than later >>>> >>>> tests/00-geo-rep/georep-basic-dr-rsync.t - this isn't specific to brick >>>> mux, have seen this failing at multiple default regression runs. Refer >>>> https://fstat.gluster.org/failure/393?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all >>>> . We need help from geo-rep dev to root cause this earlier than later >>>> >>>> tests/bugs/glusterd/validating-server-quorum.t ( >>>> https://build.gluster.org/job/regression-test-with-multiplex/810/console) >>>> - Fails for non-brick-mux cases too, >>>> https://fstat.gluster.org/failure/580?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all >>>> . Atin has a patch https://review.gluster.org/20584 which resolves it >>>> but patch is failing regression for a different test which is unrelated. >>>> >>>> tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t >>>> (Ref - >>>> https://build.gluster.org/job/regression-test-with-multiplex/809/console) >>>> - fails for non brick mux case too - >>>> https://build.gluster.org/job/regression-test-burn-in/4049/consoleText >>>> - Need some eyes from AFR folks. >>>> >>> I am looking at this. It is not reproducible locally. Trying to do this >>> on soft serve. >>> >> >> In soft serve machine also it is not failing where the regression has >> failed. But I found some other problem in the script. >> Will fix that and add some extra logs so that it should be easier to >> debug when it fails next time. >> > > RCA for > tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t > failure: > This test case basically completely fills 2 out of 3 bricks and provisions > one brick with some extra space so that entry creation succeeds only on one > brick and fails on other bricks. > Since 2 of the bricks gets filled, only the entry creation succeeds on > those bricks but the creation of gfid hard link inside the ".glusterfs" > fails. This is a bug in "posix" code with entry transactions. > If the gfid link creation fails we are just logging an error message and > continuing. Since we depend on that gfid, the entry should be deleted if > this fails. > When the shd tries to heal those files it sees that the gfid link is not > present for those files and it fails to heal. > > I will send a fix for this, which deletes the entry if it fails to create > the link inside .glusterfs. > Patches posted for bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t failure: https://review.gluster.org/#/c/20630/ https://review.gluster.org/#/c/20631/ > > Regards, > Karthik > >> >>> Regards, >>> Karthik >>> >>>> _______________________________________________ >>>> Gluster-devel mailing list >>>> [email protected] >>>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>> >>>
_______________________________________________ maintainers mailing list [email protected] https://lists.gluster.org/mailman/listinfo/maintainers
