Michael Buesch wrote:

>On Tuesday 22 December 2009 15:21:04 Chris Radek wrote:
>  
>
>>On Tue, Dec 22, 2009 at 11:51:31AM +0100, Michael Buesch wrote:
>>    
>>
>>>Here's an interesting crash.
>>>It sometimes happens on EMC load. It's more or less reproducible by fireing 
>>>up
>>>and shutting down EMC in a row. Something like 20 or 30 tries are sometimes 
>>>required to reproduce.
>>>
>>>So, what is this? Well, I'm fairly certain that we have a race condition 
>>>here.
>>>This crash does only seem to happen on SMP (I have two CPUs) when isolcpus 
>>>is _not_ used.
>>>I'm using classic RCUs.
>>>
>>>
>>>[  340.224299] Pid: 3359, comm: milltask Not tainted 2.6.29.6-rtai #2 
>>>GeForce7050M-M
>>>[  340.254912] Pid: 3352, comm: hal_manualtoolc Tainted: G      D    
>>>2.6.29.6-rtai #2
>>>[  340.255157] Pid: 2630, comm: Xorg Tainted: G      D W  2.6.29.6-rtai #2
>>>[  340.255200] Pid: 3376, comm: rsyslogd Tainted: G      D W  2.6.29.6-rtai 
>>>#2 GeForce7050M-M
>>>[  340.255714] Pid: 2630, comm: Xorg Tainted: G      D W  2.6.29.6-rtai #2 
>>>GeForce7050M-M
>>>      
>>>
>>I see five panics here - three in programs unrelated to EMC.  I think
>>you may just have a bogus kernel or rtai build.
>>    
>>
>
>Well, calling milltask and hal_manualtoolc being unrelated to EMC is... 
>interesting.
>And 4 of the 5 oopses obviously are followup-oopses to the first one.
>  
>
Note that Chris said that three of the problems are unrelated to EMC.  
Milltask and manualtoolchange are the two that are related.

>It's also interesting to get an answer like "I think several people are 
>running 
>EMC on SMP systems.". That basically means I'm an idiot and I'm just too 
>stupid to
>set it up properly.
>
No, it means that there are other data points that show that people are 
successful with this setup (or similar setups that would be called the 
same thing).

> Well, ya know what? It is not the case. 
>
We know that.  You have provided several fixes already, if my memory serves.

>As I already said, this
>oops is reproducible and it happens when EMC is loaded. So there obviously 
>_is_ _some_
>interaction with EMC.
>  
>
EMC could be at fault, or it could be a trigger for something else.  If 
the problem is in some RTAI module, then EMC is likely to be the only 
thing on your system that would trigger it, since it's likely to be the 
only realtime program you use.  This is one of the reasons why Chris 
asked for the RTAI version you're using.  The RTAI patches for certain 
kernels have been spectacularly bad on occasion, you may be 
experimenting on one of those kernels (I don't know how 2.6.29 fares 
with RTAI).

>Race conditions are hard to track down and if five people run a piece of 
>software without
>a problem, that doesn't mean there are no races. Keep that in mind, please.
>  
>
Sure.  On the flip side though, if there are 6 people running some 
software and one of them encounters a problem, the problem could be 
endemic to that system.  All of the systems have to be identical for any 
reasonable analysis.  Since the likelihood of all SMP-using EMC users 
having identical systems is nil, we need to identify the differences 
between the systems, since that's seems the most likely avenue by which 
to get good results.

>Also notice that I did _not_ say that this is a bug in EMC. It could also be a 
>bug in rtai
>or somewhere else. I'd like to get this tracked down, whether this is an EMC 
>bug or not,
>because it does make EMC unusable for me (unless I use isolcpus).
>So any real ideas are greatly appreciated.
>  
>
We're trying to help.  I think something got lost in translation.  
Emails are excellent at doing that.

>I am running rtai 3.7.1 and latest EMC stable release.
>
Just to be very clear here - does this mean that you compiled emc 2.3.3 
from source against your SMP kernel?

One thing I noticed is that the GeForce driver is explicitly mentioned 
or implicit on basically every one of those panic lines (I'm assuming 
there's a relation between Xorg and Geforce7050M-M).  I don't know if 
that's a coincidence or not.  (and you should know that I'm no expert at 
reading kernel panics)

Thanks
- Steve


------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Emc-developers mailing list
Emc-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/emc-developers

Reply via email to