Re: [gentoo-user] Kernel panic on 5.4.2 - not sure of cause yet

2019-12-14 Thread Andrew Udvare
On 15/12/2019 01:54, Andrew Udvare wrote:
> On 12/12/2019 10:18, Daniel Frey wrote:
>>
>> I have just installed fresh new gentoo installs with 5.4.2 and both
>> machines use nvidia-drivers - I have not seen this at all.
>>
>> I have been doing a fair bit of compiling on one of the machines and
>> haven't had any hiccups whatsoever.
>>
>> Dan
>>
> 
> Thanks for the reply.
> 
> I think I have found the issue in another 3rd party blob-ish driver from
> Magewell, which is for my capture card. I removed the capture card
> (since this would stop the module from loading) and my system has not
> had a panic since. I have not tried to upgrade back to 5.4.2 yet but as
> I said before I was getting the error on 5.4.0 too.
> 
> Andrew
> 

For anyone interested, here are some pictures of the panics:

https://i.imgsafe.org/5e/5e20756985.jpeg (unknown)
https://i.imgsafe.org/5e/5e20754111.jpeg (find_css_set)

The driver in question is the one named ProCapture. I am fairly certain
this is the cause of the recent panics.

I have had some issues with overclocking my system (both CPU and memory)
but I do not think this is related. Just for good measure I reseated all
my RAM modules. I am currently running 3.7 GHz (from 3.5) with memory
overclocked to 2667 MHz. Does not seem to have an issue.

Andrew



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-user] Kernel panic on 5.4.2 - not sure of cause yet

2019-12-14 Thread Andrew Udvare
On 12/12/2019 10:18, Daniel Frey wrote:
> 
> I have just installed fresh new gentoo installs with 5.4.2 and both
> machines use nvidia-drivers - I have not seen this at all.
> 
> I have been doing a fair bit of compiling on one of the machines and
> haven't had any hiccups whatsoever.
> 
> Dan
> 

Thanks for the reply.

I think I have found the issue in another 3rd party blob-ish driver from
Magewell, which is for my capture card. I removed the capture card
(since this would stop the module from loading) and my system has not
had a panic since. I have not tried to upgrade back to 5.4.2 yet but as
I said before I was getting the error on 5.4.0 too.

Andrew



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-user] Kernel panic on 5.4.2 - not sure of cause yet

2019-12-12 Thread Daniel Frey

On 2019-12-10 21:31, Andrew Udvare wrote:

I have been getting relatively consistent kernel panics on some call to
find_css_set and sometimes a stack trace that mentions cgroups.

On 5.4.0 I don't get this same crash and I added blocking of
auto-loading nvidia under the ramdisk just in case that's the issue, as
I was sometimes getting a similar crash on 5.4.0.

/etc/default/grub:

GRUB_PRELOAD_MODULES=lvm
GRUB_CMDLINE_LINUX="init=/usr/lib/systemd/systemd
systemd.legacy_systemd_cgroup_controller=yes rd.driver.blacklist=nvidia
rd.driver.blacklist=nvidia_modeset rd.driver.blacklist=nvidia_drm"
GRUB_GFXPAYLOAD_LINUX="keep"

The reason the legacy argument is there is because Docker won't work
under the new cgroups, for now.

I have a couple of modules, but the one that sticks out most is nvidia.
This is the one I see in the stack trace. I have not seen a bug report
on Gentoo or Nvidia's end.

For now I've masked >5.4.0 gentoo-sources.

Anyone else getting a similar issue?

Thanks
Andrew



I have just installed fresh new gentoo installs with 5.4.2 and both 
machines use nvidia-drivers - I have not seen this at all.


I have been doing a fair bit of compiling on one of the machines and 
haven't had any hiccups whatsoever.


Dan



Re: [gentoo-user] Kernel panic on 5.4.2 - not sure of cause yet

2019-12-11 Thread J. Roeleveld
On Wednesday, 11 December 2019 06:31:04 CET Andrew Udvare wrote:
> I have been getting relatively consistent kernel panics on some call to
> find_css_set and sometimes a stack trace that mentions cgroups.

Can you provide the full kernel panic (a picture taken with phone or camera 
attached to an email is fine as well) and tell us at which point this happens?

> On 5.4.0 I don't get this same crash and I added blocking of
> auto-loading nvidia under the ramdisk just in case that's the issue, as
> I was sometimes getting a similar crash on 5.4.0.

I am "still" on 5.3.14, but not encountering any kernel panics myself.

> /etc/default/grub:
> 
> GRUB_PRELOAD_MODULES=lvm
> GRUB_CMDLINE_LINUX="init=/usr/lib/systemd/systemd

Not using systemd myself.

> systemd.legacy_systemd_cgroup_controller=yes rd.driver.blacklist=nvidia
> rd.driver.blacklist=nvidia_modeset rd.driver.blacklist=nvidia_drm"
> GRUB_GFXPAYLOAD_LINUX="keep"
> 
> The reason the legacy argument is there is because Docker won't work
> under the new cgroups, for now.
> 
> I have a couple of modules, but the one that sticks out most is nvidia.
> This is the one I see in the stack trace. I have not seen a bug report
> on Gentoo or Nvidia's end.

Nvidia (and some other modules) "taint" the kernel. Any module that "taints" 
the kernel WILL be listed when a kernel-panic occurs. I never found that info 
really helpful as I never had those modules to be the cause.

> For now I've masked >5.4.0 gentoo-sources.
> 
> Anyone else getting a similar issue?

No, but an upgrade to 5.4.x is on my todo-list.

--
Joost