Re: [OpenAFS] OpenAFS client softlockup on concurrential file-system patterns (100% CPU in kernel mode)

2019-11-20 Thread Ciprian Dorin Craciun
On Wed, Nov 20, 2019 at 9:37 PM Ciprian Dorin Craciun wrote: > Now the client works OK, however if I start the `afsd` client on the > server itself (i.e. over `loopback` network), where previously (with > `-jumbo`) I was able to max-out the disks (~300 MiB/s), now seems to > be capped at around

Re: [OpenAFS] OpenAFS client softlockup on concurrential file-system patterns (100% CPU in kernel mode)

2019-11-20 Thread Ciprian Dorin Craciun
Before replying, I want to note that I think I've stumbled upon three (perhaps related) issues (some of which might just be configuration error): * AFS file access getting stuck; (seems to be solved by increasing the number of `fileserver` threads from `-p 4` to `-p 128`;) * trying to `SIGTERM`

Re: [OpenAFS] OpenAFS client softlockup on concurrential file-system patterns (100% CPU in kernel mode)

2019-11-20 Thread Kostas Liakakis
Στις 20 Νοε 2019 19:17, ο χρήστης Ciprian Dorin Craciun έγραψε: (Yesterday over wireless I didn't use Jumbo frames, but the day before, where the same thing happened, I was using them.) Does this mean that '"the other day with jumbo frames" was over GigE ? Does this happen

Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-20 Thread Ciprian Dorin Craciun
On Wed, Nov 20, 2019 at 7:49 PM Mark Vitale wrote: > > The following are the arguments of `fileserver`: > > -syslog -sync always -p 4 -b 524288 -l 524288 -s 1048576 -vc 4096 -cb > > 1048576 -vhandle-max-cachesize 32768 -jumbo -udpsize 67108864 > > -sendsize 67108864 -rxmaxmtu 9000 -rxpck 4096

Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-20 Thread Mark Vitale
> On Nov 20, 2019, at 12:17 PM, Ciprian Dorin Craciun > wrote: > > >> Do you have FileLogs and/or fileserver audit logs for the time in question? > > Yes, I do have access to them. > > The following is the syslog output from OpenAFS server in a 5 minute > time-window to the stacktrace

Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-20 Thread Ciprian Dorin Craciun
On Wed, Nov 20, 2019 at 7:03 PM Mark Vitale wrote: > Thank you for the backtraces. I agree that 'gm' is the problematic thread; > it appears to be stuck in rxi_WriteProc waiting for the Rx packet transmit > window > to advance. That is, it's waiting for acknowledgments - probably from the >

Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-20 Thread Mark Vitale
Ciprian, > On Nov 19, 2019, at 4:37 PM, Ciprian Dorin Craciun > wrote: > > On Tue, Nov 19, 2019 at 10:38 PM Ciprian Dorin Craciun > wrote: >> At the following link you can find an extract of `dmesg` after the >> sysrq trigger. >> >> >>