On Fri, Aug 3, 2018 at 4:01 PM, Kotresh Hiremath Ravishankar < [email protected]> wrote:
> Hi Du/Poornima, > > I was analysing bitrot and geo-rep failures and I suspect there is a bug > in some perf xlator > that was one of the cause. I was seeing following behaviour in few runs. > > 1. Geo-rep synced data to slave. It creats empty file and then rsync syncs > data. > But test does "stat --format "%F" <file>" to confirm. If it's empty, > it returns > "regular empty file" else "regular file". I believe it did get the > "regular empty file" > instead of "regular file" until timeout. > https://review.gluster.org/20549 might be relevant. > 2. Other behaviour is with bitrot, with brick-mux. If a file is deleted on > the back end on one brick > and the look up is done. What all performance xlators needs to be > disabled to get the lookup/revalidate > on the brick where the file was deleted. Earlier, only md-cache was > disable and it used to work. > No it's failing intermittently. > You need to disable readdirplus in the entire stack. Refer to https://lists.gluster.org/pipermail/gluster-users/2017-March/030148.html > Are there any pending patches around these areas that needs to be merged ? > If there are, then it could be affecting other tests as well. > > Thanks, > Kotresh HR > > On Fri, Aug 3, 2018 at 3:07 PM, Karthik Subrahmanya <[email protected]> > wrote: > >> >> >> On Fri, Aug 3, 2018 at 2:12 PM Karthik Subrahmanya <[email protected]> >> wrote: >> >>> >>> >>> On Thu, Aug 2, 2018 at 11:00 PM Karthik Subrahmanya <[email protected]> >>> wrote: >>> >>>> >>>> >>>> On Tue 31 Jul, 2018, 10:17 PM Atin Mukherjee, <[email protected]> >>>> wrote: >>>> >>>>> I just went through the nightly regression report of brick mux runs >>>>> and here's what I can summarize. >>>>> >>>>> ============================================================ >>>>> ============================================================ >>>>> ================================================= >>>>> Fails only with brick-mux >>>>> ============================================================ >>>>> ============================================================ >>>>> ================================================= >>>>> tests/bugs/core/bug-1432542-mpx-restart-crash.t - Times out even >>>>> after 400 secs. Refer https://fstat.gluster.org/fail >>>>> ure/209?state=2&start_date=2018-06-30&end_date=2018-07-31&branch=all, >>>>> specifically the latest report https://build.gluster.org/job/ >>>>> regression-test-burn-in/4051/consoleText . Wasn't timing out as >>>>> frequently as it was till 12 July. But since 27 July, it has timed out >>>>> twice. Beginning to believe commit >>>>> 9400b6f2c8aa219a493961e0ab9770b7f12e80d2 >>>>> has added the delay and now 400 secs isn't sufficient enough (Mohit?) >>>>> >>>>> tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t >>>>> (Ref - https://build.gluster.org/job/regression-test-with-multiplex >>>>> /814/console) - Test fails only in brick-mux mode, AI on Atin to >>>>> look at and get back. >>>>> >>>>> tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t ( >>>>> https://build.gluster.org/job/regression-test-with-multiple >>>>> x/813/console) - Seems like failed just twice in last 30 days as per >>>>> https://fstat.gluster.org/failure/251?state=2&start_date= >>>>> 2018-06-30&end_date=2018-07-31&branch=all. Need help from AFR team. >>>>> >>>>> tests/bugs/quota/bug-1293601.t (https://build.gluster.org/job >>>>> /regression-test-with-multiplex/812/console) - Hasn't failed after 26 >>>>> July and earlier it was failing regularly. Did we fix this test through >>>>> any >>>>> patch (Mohit?) >>>>> >>>>> tests/bitrot/bug-1373520.t - (https://build.gluster.org/job >>>>> /regression-test-with-multiplex/811/console) - Hasn't failed after >>>>> 27 July and earlier it was failing regularly. Did we fix this test through >>>>> any patch (Mohit?) >>>>> >>>>> tests/bugs/glusterd/remove-brick-testcases.t - Failed once with a >>>>> core, not sure if related to brick mux or not, so not sure if brick mux is >>>>> culprit here or not. Ref - https://build.gluster.org/job/ >>>>> regression-test-with-multiplex/806/console . Seems to be a glustershd >>>>> crash. Need help from AFR folks. >>>>> >>>>> ============================================================ >>>>> ============================================================ >>>>> ================================================= >>>>> Fails for non-brick mux case too >>>>> ============================================================ >>>>> ============================================================ >>>>> ================================================= >>>>> tests/bugs/distribute/bug-1122443.t 0 Seems to be failing at my setup >>>>> very often, with out brick mux as well. Refer >>>>> https://build.gluster.org/job/regression-test-burn-in/4050/consoleText >>>>> . There's an email in gluster-devel and a BZ 1610240 for the same. >>>>> >>>>> tests/bugs/bug-1368312.t - Seems to be recent failures ( >>>>> https://build.gluster.org/job/regression-test-with-multiple >>>>> x/815/console) - seems to be a new failure, however seen this for a >>>>> non-brick-mux case too - https://build.gluster.org/job/ >>>>> regression-test-burn-in/4039/consoleText . Need some eyes from AFR >>>>> folks. >>>>> >>>>> tests/00-geo-rep/georep-basic-dr-tarssh.t - this isn't specific to >>>>> brick mux, have seen this failing at multiple default regression runs. >>>>> Refer https://fstat.gluster.org/failure/392?state=2&start_date= >>>>> 2018-06-30&end_date=2018-07-31&branch=all . We need help from geo-rep >>>>> dev to root cause this earlier than later >>>>> >>>>> tests/00-geo-rep/georep-basic-dr-rsync.t - this isn't specific to >>>>> brick mux, have seen this failing at multiple default regression runs. >>>>> Refer https://fstat.gluster.org/failure/393?state=2&start_date= >>>>> 2018-06-30&end_date=2018-07-31&branch=all . We need help from geo-rep >>>>> dev to root cause this earlier than later >>>>> >>>>> tests/bugs/glusterd/validating-server-quorum.t ( >>>>> https://build.gluster.org/job/regression-test-with-multiple >>>>> x/810/console) - Fails for non-brick-mux cases too, >>>>> https://fstat.gluster.org/failure/580?state=2&start_date= >>>>> 2018-06-30&end_date=2018-07-31&branch=all . Atin has a patch >>>>> https://review.gluster.org/20584 which resolves it but patch is >>>>> failing regression for a different test which is unrelated. >>>>> >>>>> tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t >>>>> (Ref - https://build.gluster.org/job/regression-test-with-multiplex >>>>> /809/console) - fails for non brick mux case too - >>>>> https://build.gluster.org/job/regression-test-burn-in/4049/consoleText >>>>> - Need some eyes from AFR folks. >>>>> >>>> I am looking at this. It is not reproducible locally. Trying to do this >>>> on soft serve. >>>> >>> >>> In soft serve machine also it is not failing where the regression has >>> failed. But I found some other problem in the script. >>> Will fix that and add some extra logs so that it should be easier to >>> debug when it fails next time. >>> >> >> RCA for >> tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t >> failure: >> This test case basically completely fills 2 out of 3 bricks and >> provisions one brick with some extra space so that entry creation succeeds >> only on one brick and fails on other bricks. >> Since 2 of the bricks gets filled, only the entry creation succeeds on >> those bricks but the creation of gfid hard link inside the ".glusterfs" >> fails. This is a bug in "posix" code with entry transactions. >> If the gfid link creation fails we are just logging an error message and >> continuing. Since we depend on that gfid, the entry should be deleted if >> this fails. >> When the shd tries to heal those files it sees that the gfid link is not >> present for those files and it fails to heal. >> >> I will send a fix for this, which deletes the entry if it fails to create >> the link inside .glusterfs. >> >> Regards, >> Karthik >> >>> >>>> Regards, >>>> Karthik >>>> >>>>> _______________________________________________ >>>>> Gluster-devel mailing list >>>>> [email protected] >>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>>> >>>> >> _______________________________________________ >> Gluster-devel mailing list >> [email protected] >> https://lists.gluster.org/mailman/listinfo/gluster-devel >> > > > > -- > Thanks and Regards, > Kotresh H R >
_______________________________________________ maintainers mailing list [email protected] https://lists.gluster.org/mailman/listinfo/maintainers
