Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-25 Thread Ciprian Dorin Craciun
On Mon, Nov 25, 2019 at 2:53 AM Benjamin Kaduk wrote: > > * I suspect that perhaps the issue is due to the latest kernel version, > > because I have run similar patterns a few weeks ago on an older kernel (but > > still from the `5.x` family), but can't say for sure; > > I see the diagnostics and

Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-24 Thread Benjamin Kaduk
On Tue, Nov 19, 2019 at 01:53:59PM +0200, Ciprian Dorin Craciun wrote: > > My setup is as follows: > > * OpenSUSE Tumbleweed, kernel 5.3.9-1-default, client package > `openafs-client` and `openafs-kmp-default` at `1.8.5_k5.3.9_1-1.3` as > provided by OpenSUSE; > > * `afsd` parameters (neither

Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-20 Thread Ciprian Dorin Craciun
On Wed, Nov 20, 2019 at 7:49 PM Mark Vitale wrote: > > The following are the arguments of `fileserver`: > > -syslog -sync always -p 4 -b 524288 -l 524288 -s 1048576 -vc 4096 -cb > > 1048576 -vhandle-max-cachesize 32768 -jumbo -udpsize 67108864 > > -sendsize 67108864 -rxmaxmtu 9000 -rxpck 4096

Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-20 Thread Mark Vitale
> On Nov 20, 2019, at 12:17 PM, Ciprian Dorin Craciun > wrote: > > >> Do you have FileLogs and/or fileserver audit logs for the time in question? > > Yes, I do have access to them. > > The following is the syslog output from OpenAFS server in a 5 minute > time-window to the stacktrace

Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-20 Thread Ciprian Dorin Craciun
On Wed, Nov 20, 2019 at 7:03 PM Mark Vitale wrote: > Thank you for the backtraces. I agree that 'gm' is the problematic thread; > it appears to be stuck in rxi_WriteProc waiting for the Rx packet transmit > window > to advance. That is, it's waiting for acknowledgments - probably from the >

Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-20 Thread Mark Vitale
Ciprian, > On Nov 19, 2019, at 4:37 PM, Ciprian Dorin Craciun > wrote: > > On Tue, Nov 19, 2019 at 10:38 PM Ciprian Dorin Craciun > wrote: >> At the following link you can find an extract of `dmesg` after the >> sysrq trigger. >> >> >>

Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-19 Thread Mark Vitale
Ciprian, > On Nov 19, 2019, at 6:53 AM, Ciprian Dorin Craciun > wrote: > > A few days ago I have encountered a very strange OpenAFS client issue that > basically exhibits in two ways: > > * either the processes accessing the file-system get "stuck" reading (or > perhaps opening) the files;

[OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

2019-11-19 Thread Ciprian Dorin Craciun
A few days ago I have encountered a very strange OpenAFS client issue that basically exhibits in two ways: * either the processes accessing the file-system get "stuck" reading (or perhaps opening) the files; (although if one waits "long" enough, sometimes those processes will finally complete