Sarp, Shane,
 
Is the primary problem an access violation?  The "Losing some ticks" messages 
seem secondary (and we've actually seen that printk
with interrupts disabled can be a cause of this).  We really need to determine 
the source code line to work out what has screwed up
here.  A stacktrace helps, but kernel core dumps are even better.  Is it 
possible to arrange that?  Can you file a lustre bug with
all this info?

Cheers, 
                   Eric 

Eric Barton     Barton Software         
9 York Gardens
Clifton
Bristol, BS8 4LL
United Kingdom  Tel:
Mobile:
Fax:
Email:  +44 (117) 330 1575
+44 (7909) 680 356
Call to arrange
[EMAIL PROTECTED]  <mailto:[EMAIL PROTECTED]>   
 


  _____  

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Oral, H. Sarp
Sent: 22 March 2007 7:39 PM
To: [email protected]
Subject: [Lustre-discuss] 1.4.9. client errors of unknown source



Hello,

 

 

 

 

 

We got the following error messages on our 1.4.9 Lustre clients. Our Lustre 
servers are also running 1.4.9.

 

 

 

The first two incidents happened at the same instance where both clients were 
busy running two IOR jobs per each. The last incident
happened when the client was not running any jobs and was idle.

 

 

 

These errors did not create any other bug/logs on clients.

 

 

 

We are not at this point sure if this is a Lustre problem or something else, 
but the RIP line
({:ksocklnd:ksocknal_process_transmit+969}) makes us think it might be a Lustre 
problem.

 

 

Has anyone seen something like this?

 

 

 

Mar 19 18:38:52 pinto0002-admin kernel: Unable to handle kernel paging request 
at 0000000000100108 RIP:

 

Mar 19 18:38:52 pinto0002-admin kernel: <7>Losing some ticks... checking if CPU 
frequency changed.

 

Mar 19 18:38:52 pinto0002-admin kernel: Oops: 0002 [1] SMP

 

Mar 19 18:38:52 pinto0002-admin kernel: Oops: 0002 [1] SMP

 

Mar 19 18:38:52 pinto0002-admin kernel: RIP 
<ffffffffa016be29>{:ksocklnd:ksocknal_process_transmit+969} RSP 
<00000100de20be58>

 

Mar 19 18:38:52 pinto0002-admin kernel: CR2: 0000000000100108

 

Mar 19 18:38:52 pinto0002-admin kernel: CR2: 0000000000100108

 

 

 

Mar 19 18:38:26 pinto0009-admin kernel: Unable to handle kernel paging request 
at 0000000000100108 RIP:

 

Mar 19 18:38:26 pinto0009-admin kernel: <7>Losing some ticks... checking if CPU 
frequency changed.

 

Mar 19 18:38:26 pinto0009-admin kernel:

<ffffffffa0238e29>{:ksocklnd:ksocknal_process_transmit+969}

 

Mar 19 18:38:26 pinto0009-admin kernel: Oops: 0002 [1] SMP

 

Mar 19 18:38:26 pinto0009-admin kernel: Oops: 0002 [1] SMP

 

Mar 19 18:38:26 pinto0009-admin kernel: RIP 
<ffffffffa0238e29>{:ksocklnd:ksocknal_process_transmit+969} RSP 
<00000100c2647e58>

 

Mar 19 18:38:26 pinto0009-admin kernel: CR2: 0000000000100108

 

Mar 19 18:38:26 pinto0009-admin kernel: CR2: 0000000000100108

 

 

 

 

 

Mar 20 16:22:40 pinto0060-admin kernel: Unable to handle kernel paging request 
at 0000000000100108 RIP:

 

Mar 20 16:22:40 pinto0060-admin kernel: <7>Losing some ticks... checking if CPU 
frequency changed.

 

Mar 20 16:22:40 pinto0060-admin kernel: Oops: 0002 [1] SMP

 

Mar 20 16:22:40 pinto0060-admin kernel: Oops: 0002 [1] SMP

 

Mar 20 16:22:40 pinto0060-admin kernel: RIP 
<ffffffffa0238e29>{:ksocklnd:ksocknal_process_transmit+969} RSP 
<00000100c1c83e58>

 

Mar 20 16:22:40 pinto0060-admin kernel: CR2: 0000000000100108

 

Mar 20 16:22:40 pinto0060-admin kernel: CR2: 0000000000100108

 

 

 

PS: The same hardware was running without any problems before these errors and 
after a reboot, they are still running fine and no
hardware configuration changes have been made on these clients.

 

 

 

 

 

Thanks,

 

 

 

Sarp

 

 

 

--------------------

Sarp Oral, Ph.D.

 

National Center for Computational Sciences (NCCS)

Oak Ridge National Lab, Oak Ridge, Tennessee 37831

865-574-2173, [EMAIL PROTECTED]

 

 

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to