Hi,

small insight: even given my dataset that can reliably trigger this (after 
around 1.5 hours of rsyncing) it does not trigger on a specific set of files. 
I’ve deleted the data and started the rsync on a fresh directory (not a fresh 
filesystem, I can’t delete that as it carries important data) but it doesn’t 
always get stuck on the same files, even though rsync processes them in a 
repeatable order.

I’m wondering how to generate more insights from that. Maybe keeping a blktrace 
log might help? 

It sounds like the specific pattern relies on XFS doing a specific thing there 
… 

Wild idea: maybe running the xfstest suite on an in-memory raid 6 setup could 
reproduce this?

I’m guessing that the xfs people do not regularly run their test suite on a 
layered setup like mine with encryption and software raid?
 
Christian

> On 15. Aug 2024, at 08:19, Christian Theune <c...@flyingcircus.io> wrote:
> 
> Hi,
> 
>> On 14. Aug 2024, at 10:53, Christian Theune <c...@flyingcircus.io> wrote:
>> 
>> Hi,
>> 
>>> On 12. Aug 2024, at 20:37, John Stoffel <j...@stoffel.org> wrote:
>>> 
>>> I'd probably just do the RAID6 tests first, get them out of the way.  
>> 
>> Alright, those are running right now - I’ll let you know what happens.
> 
> I’m not making progress here. I can’t reproduce those on in-memory loopback 
> raid 6. However: i can’t fully produce the rsync. For me this only triggered 
> after around 1.5hs of progress on the NVMe which resulted in the hangup. I 
> can only create around 20 GiB worth of raid 6 volume on this machine. I’ve 
> tried running rsync until it exhausts the space, deleting the content and 
> running rsync again, but I feel like this isn’t suffient to trigger the 
> issue. :(
> 
> I’m trying to find whether any specific pattern in the files around the time 
> it locks up might be relevant here and try to run the rsync over that 
> portion.
> 
> On the plus side, I have a script now that can create the various loopback 
> settings quickly, so I can try out things as needed. Not that valuable 
> without a reproducer, yet, though.
> 
> @Yu: you mentioned that you might be able to provide me a kernel that 
> produces more error logging to diagnose this? Any chance we could try that 
> route?
> 
> Christian
> 
> -- 
> Christian Theune · c...@flyingcircus.io · +49 345 219401 0
> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick


Liebe Grüße,
Christian Theune

-- 
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick


Reply via email to