Hi, I’m working on getting a test without bitmap. To make things simple for myself: is it helpful if I just use “mdadm --grow --bitmap=none” to disable it or is that futile?
Christian > On 23. Oct 2024, at 03:13, Yu Kuai <yuku...@huaweicloud.com> wrote: > > Hi, > > 在 2024/10/22 23:02, Christian Theune 写道: >> Hi, >> I had to put this issue aside and as Yu indicated he was busy I didn’t >> follow up yet. >> @Yu: I don’t have new insights, but I have a basically identical machine >> that I will start adding new data with a similar structure soon. >> I couldn’t directly reproduce the issue there - likely because the network >> is a bit slower as it’s connected from a remote side and has only 1G instead >> of 10G, due to the long distances. >> Let me know if you’re interested in following up here and I’ll try to make >> room on my side to get you more input as needed. > > Yes, sorry that I was totally busy with other things. :( > > BTW, what is the result after bypassing bitmap(disable bitmap by > kernel hacking)? > > Thanks, > Kuai > >> Christian >>> On 15. Aug 2024, at 13:14, Yu Kuai <yuku...@huaweicloud.com> wrote: >>> >>> Hi, >>> >>> 在 2024/08/15 18:03, Christian Theune 写道: >>>> Hi, >>>> small insight: even given my dataset that can reliably trigger this (after >>>> around 1.5 hours of rsyncing) it does not trigger on a specific set of >>>> files. I’ve deleted the data and started the rsync on a fresh directory >>>> (not a fresh filesystem, I can’t delete that as it carries important data) >>>> but it doesn’t always get stuck on the same files, even though rsync >>>> processes them in a repeatable order. >>>> I’m wondering how to generate more insights from that. Maybe keeping a >>>> blktrace log might help? >>>> It sounds like the specific pattern relies on XFS doing a specific thing >>>> there … >>>> Wild idea: maybe running the xfstest suite on an in-memory raid 6 setup >>>> could reproduce this? >>>> I’m guessing that the xfs people do not regularly run their test suite on >>>> a layered setup like mine with encryption and software raid? >>> >>> That sounds greate. >>>> Christian >>>>> On 15. Aug 2024, at 08:19, Christian Theune <c...@flyingcircus.io> wrote: >>>>> >>>>> Hi, >>>>> >>>>>> On 14. Aug 2024, at 10:53, Christian Theune <c...@flyingcircus.io> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>>> On 12. Aug 2024, at 20:37, John Stoffel <j...@stoffel.org> wrote: >>>>>>> >>>>>>> I'd probably just do the RAID6 tests first, get them out of the way. >>>>>> >>>>>> Alright, those are running right now - I’ll let you know what happens. >>>>> >>>>> I’m not making progress here. I can’t reproduce those on in-memory >>>>> loopback raid 6. However: i can’t fully produce the rsync. For me this >>>>> only triggered after around 1.5hs of progress on the NVMe which resulted >>>>> in the hangup. I can only create around 20 GiB worth of raid 6 volume on >>>>> this machine. I’ve tried running rsync until it exhausts the space, >>>>> deleting the content and running rsync again, but I feel like this isn’t >>>>> suffient to trigger the issue. :( >>>>> >>>>> I’m trying to find whether any specific pattern in the files around the >>>>> time it locks up might be relevant here and try to run the rsync over that >>>>> portion. >>>>> >>>>> On the plus side, I have a script now that can create the various >>>>> loopback settings quickly, so I can try out things as needed. Not that >>>>> valuable without a reproducer, yet, though. >>>>> >>>>> @Yu: you mentioned that you might be able to provide me a kernel that >>>>> produces more error logging to diagnose this? Any chance we could try >>>>> that route? >>> >>> Yes, however, I still need some time to sort out the internal process of >>> raid5. I'm quite busy with some other work stuff and I'm familiar with >>> raid1/10, but not too much about raid5. :( >>> >>> Main idea is to figure out why IO are not dispatched to underlying >>> disks. >>> >>> Thanks, >>> Kuai >>> >>>>> >>>>> Christian >>>>> >>>>> -- >>>>> Christian Theune · c...@flyingcircus.io · +49 345 219401 0 >>>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io >>>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland >>>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian >>>>> Zagrodnick >>>> Liebe Grüße, >>>> Christian Theune >> Liebe Grüße, >> Christian Theune Liebe Grüße, Christian Theune -- Christian Theune · c...@flyingcircus.io · +49 345 219401 0 Flying Circus Internet Operations GmbH · https://flyingcircus.io Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick