Hello,
We got the following error messages on our 1.4.9 Lustre clients. Our
Lustre servers are also running 1.4.9.
The first two incidents happened at the same instance where both clients
were busy running two IOR jobs per each. The last incident happened when
the client was not running any jobs and was idle.
These errors did not create any other bug/logs on clients.
We are not at this point sure if this is a Lustre problem or something
else, but the RIP line ({:ksocklnd:ksocknal_process_transmit+969}) makes
us think it might be a Lustre problem.
Has anyone seen something like this?
Mar 19 18:38:52 pinto0002-admin kernel: Unable to handle kernel paging
request at 0000000000100108 RIP:
Mar 19 18:38:52 pinto0002-admin kernel: <7>Losing some ticks... checking
if CPU frequency changed.
Mar 19 18:38:52 pinto0002-admin kernel: Oops: 0002 [1] SMP
Mar 19 18:38:52 pinto0002-admin kernel: Oops: 0002 [1] SMP
Mar 19 18:38:52 pinto0002-admin kernel: RIP
<ffffffffa016be29>{:ksocklnd:ksocknal_process_transmit+969} RSP
<00000100de20be58>
Mar 19 18:38:52 pinto0002-admin kernel: CR2: 0000000000100108
Mar 19 18:38:52 pinto0002-admin kernel: CR2: 0000000000100108
Mar 19 18:38:26 pinto0009-admin kernel: Unable to handle kernel paging
request at 0000000000100108 RIP:
Mar 19 18:38:26 pinto0009-admin kernel: <7>Losing some ticks... checking
if CPU frequency changed.
Mar 19 18:38:26 pinto0009-admin kernel:
<ffffffffa0238e29>{:ksocklnd:ksocknal_process_transmit+969}
Mar 19 18:38:26 pinto0009-admin kernel: Oops: 0002 [1] SMP
Mar 19 18:38:26 pinto0009-admin kernel: Oops: 0002 [1] SMP
Mar 19 18:38:26 pinto0009-admin kernel: RIP
<ffffffffa0238e29>{:ksocklnd:ksocknal_process_transmit+969} RSP
<00000100c2647e58>
Mar 19 18:38:26 pinto0009-admin kernel: CR2: 0000000000100108
Mar 19 18:38:26 pinto0009-admin kernel: CR2: 0000000000100108
Mar 20 16:22:40 pinto0060-admin kernel: Unable to handle kernel paging
request at 0000000000100108 RIP:
Mar 20 16:22:40 pinto0060-admin kernel: <7>Losing some ticks... checking
if CPU frequency changed.
Mar 20 16:22:40 pinto0060-admin kernel: Oops: 0002 [1] SMP
Mar 20 16:22:40 pinto0060-admin kernel: Oops: 0002 [1] SMP
Mar 20 16:22:40 pinto0060-admin kernel: RIP
<ffffffffa0238e29>{:ksocklnd:ksocknal_process_transmit+969} RSP
<00000100c1c83e58>
Mar 20 16:22:40 pinto0060-admin kernel: CR2: 0000000000100108
Mar 20 16:22:40 pinto0060-admin kernel: CR2: 0000000000100108
PS: The same hardware was running without any problems before these
errors and after a reboot, they are still running fine and no hardware
configuration changes have been made on these clients.
Thanks,
Sarp
--------------------
Sarp Oral, Ph.D.
National Center for Computational Sciences (NCCS)
Oak Ridge National Lab, Oak Ridge, Tennessee 37831
865-574-2173, [EMAIL PROTECTED]
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss