Thanks Steve!


> As long as the CPU IDs are distinct then it shouldn't matter that the 
> thread IDs are all zero... the load locked/store conditional code 
> requires both to match before it considers the request as coming from 
> the same place.

I see. I also tried assigning a unique ID to each thread. The problem is still 
there though.

> 
> Can you tell from the trace that there's a point inside mutex_lock() 
> where all of your cpus do LLs to the same address followed by SCs, and 
> all of the SCs succeed?  This would be definitive evidence that there's 
> a problem with LL/SC.  Other than that it's just a guess that this is 
> where the problem lies.

As I am looking into the trace, it seems that the libc_malloc is not using 
LL/SC. The critial part looks like this:

ldl r1, 0(r0) .......... load r1, initially it is 0
bne r1,0xxx .......... if r1 is non-zero, jump to function libc__arena_get
lda r1, 1(r31).........assign r1 to be 1
stl r1, 0(r0)................store r1 to its original address

the mutex_lock and unlock functions are inlined, so it appears these all happen 
in libc_malloc. 

the race condition happens when the four threads simultaneously run this piece 
of code. If only I can change the ldl and stl to LL/SC! I'm guessing that I am 
not invoking the right mutex_lock functions, but I don't have idea about how to 
do that. Maybe I can write a mutex_lock function and link it to the program. 
Any suggestions?


> 
> Also, what happens if you run with AtomicSimpleCPU, and with or without 
> a single level of caches?

I'm currently using private L1s and shared L2s. I'll test about using a single 
level of shared L1. but are you actually interested about how LL/SC in M5 
behave under these different configurations? I'll let you know when I capture 
these instructions :) 

Thanks for the insights!

Jiayuan


> 
> Steve
> 
> Jiayuan Meng wrote:
>> Hi Ali,
>>  
>> Thanks for the quick responce.
>>  
>> I am having a master thread spawning child threads on multiple cpus. 
>> Once a thread gets allocated to a cpu, it always resides there (so far).
>>  
>> I am using AtomicTimingCPU. In my test case with racing mallocs, 
>> I have five CPUs(with id from 0 to 4). A master threads initially runs 
>> on cpu0. When it comes to a pseudo instruction, it tells the simulator 
>> to spawn four child threads on the other CPUs. Each CPU only uses one 
>> thread context(all have the id 0 by default). Will this be a 
>> problem? I'll try assigning different thread IDs.
>>  
>> To create threads, I learned from "stack_createFunc" and 
>> "init_thread_context" in  kern/tru64/tru64.hh, basically allocate a new 
>> stack, and assigns the pc and sp register. A major difference might be 
>> that I am not using pthreads. instead, I inserted a new pseudo 
>> instruction which "atomically" creates four threads on the other four 
>> CPUs, they start to execute at the same cycle.
>>  
>> I actually extended SimpleCPUs to have multiple thread contexts and the 
>> CPU can switch among them. They are tested with the splash2 FFT 
>> benchmark and things went fine. But to make the test more clear, I just 
>> set each CPU to have exactly one thread context. In the future, I may 
>> need to "migrate" a running thread context from one CPU to another.
>>  
>> I'm in trouble now... I wonder how splash2 gets around with this in SE mode?
>>  
>> Thanks again!
>>  
>> Jiayuan
>>  
>> 
>>     ----- Original Message -----
>>     *From:* Ali Saidi <mailto:[EMAIL PROTECTED]>
>>     *To:* M5 users mailing list <mailto:m5-users@m5sim.org>
>>     *Sent:* 2007年6月16日 2:41 AM
>>     *Subject:* Re: [m5-users] synchronization primitives in SE mode
>> 
>>     The Alpha ISA has a load locked and a store conditional instruction
>>     which we support. Again I don't know exactly what you're doing to
>>     create your threads, but you need to make sure that their cpu/thread
>>     ids are unique. Are you scheduling each thread on it's own cpu or
>>     are they moving around? 
>> 
>>     Ali
>> 
>> 
>> 
>>     On Jun 15, 2007, at 1:30 PM, Jiayuan Meng wrote:
>> 
>>>     Hey all,
>>>      
>>>     By using the --trace-flags=Exec debug tool, I found that there is
>>>     a race condition in the malloc function in my multithreaded
>>>     program. However, when looking into the malloc.c in the glibc, it
>>>     said it is a thread-safe version. I also noticed that in
>>>     malloc/arena.c, it uses mutex_lock(), which seems to be a
>>>     spinlock. This may still be problematic if several threads are
>>>     accessing the lock simultaneously.
>>>      
>>>     So, what kind of synchronization support does M5 have in SE mode?
>>>     Does it have store-conditional or test-and-set instructions or
>>>     I'll have to add one myself?
>>>      
>>>     Thanks!
>>>      
>>>     Jiayuan
>> 
>>     ------------------------------------------------------------------------
>> 
>>     _______________________________________________
>>     m5-users mailing list
>>     m5-users@m5sim.org
>>     http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>> 
>> 
>> ------------------------------------------------------------------------
>> 
>> _______________________________________________
>> m5-users mailing list
>> m5-users@m5sim.org
>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
> _______________________________________________
> m5-users mailing list
> m5-users@m5sim.org
> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>
_______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Reply via email to