Thanks Steve!
> As long as the CPU IDs are distinct then it shouldn't matter that the
> thread IDs are all zero... the load locked/store conditional code
> requires both to match before it considers the request as coming from
> the same place.
I see. I also tried assigning a unique ID to each thread. The problem is still
there though.
>
> Can you tell from the trace that there's a point inside mutex_lock()
> where all of your cpus do LLs to the same address followed by SCs, and
> all of the SCs succeed? This would be definitive evidence that there's
> a problem with LL/SC. Other than that it's just a guess that this is
> where the problem lies.
As I am looking into the trace, it seems that the libc_malloc is not using
LL/SC. The critial part looks like this:
ldl r1, 0(r0) .......... load r1, initially it is 0
bne r1,0xxx .......... if r1 is non-zero, jump to function libc__arena_get
lda r1, 1(r31).........assign r1 to be 1
stl r1, 0(r0)................store r1 to its original address
the mutex_lock and unlock functions are inlined, so it appears these all happen
in libc_malloc.
the race condition happens when the four threads simultaneously run this piece
of code. If only I can change the ldl and stl to LL/SC! I'm guessing that I am
not invoking the right mutex_lock functions, but I don't have idea about how to
do that. Maybe I can write a mutex_lock function and link it to the program.
Any suggestions?
>
> Also, what happens if you run with AtomicSimpleCPU, and with or without
> a single level of caches?
I'm currently using private L1s and shared L2s. I'll test about using a single
level of shared L1. but are you actually interested about how LL/SC in M5
behave under these different configurations? I'll let you know when I capture
these instructions :)
Thanks for the insights!
Jiayuan
>
> Steve
>
> Jiayuan Meng wrote:
>> Hi Ali,
>>
>> Thanks for the quick responce.
>>
>> I am having a master thread spawning child threads on multiple cpus.
>> Once a thread gets allocated to a cpu, it always resides there (so far).
>>
>> I am using AtomicTimingCPU. In my test case with racing mallocs,
>> I have five CPUs(with id from 0 to 4). A master threads initially runs
>> on cpu0. When it comes to a pseudo instruction, it tells the simulator
>> to spawn four child threads on the other CPUs. Each CPU only uses one
>> thread context(all have the id 0 by default). Will this be a
>> problem? I'll try assigning different thread IDs.
>>
>> To create threads, I learned from "stack_createFunc" and
>> "init_thread_context" in kern/tru64/tru64.hh, basically allocate a new
>> stack, and assigns the pc and sp register. A major difference might be
>> that I am not using pthreads. instead, I inserted a new pseudo
>> instruction which "atomically" creates four threads on the other four
>> CPUs, they start to execute at the same cycle.
>>
>> I actually extended SimpleCPUs to have multiple thread contexts and the
>> CPU can switch among them. They are tested with the splash2 FFT
>> benchmark and things went fine. But to make the test more clear, I just
>> set each CPU to have exactly one thread context. In the future, I may
>> need to "migrate" a running thread context from one CPU to another.
>>
>> I'm in trouble now... I wonder how splash2 gets around with this in SE mode?
>>
>> Thanks again!
>>
>> Jiayuan
>>
>>
>> ----- Original Message -----
>> *From:* Ali Saidi <mailto:[EMAIL PROTECTED]>
>> *To:* M5 users mailing list <mailto:m5-users@m5sim.org>
>> *Sent:* 2007年6月16日 2:41 AM
>> *Subject:* Re: [m5-users] synchronization primitives in SE mode
>>
>> The Alpha ISA has a load locked and a store conditional instruction
>> which we support. Again I don't know exactly what you're doing to
>> create your threads, but you need to make sure that their cpu/thread
>> ids are unique. Are you scheduling each thread on it's own cpu or
>> are they moving around?
>>
>> Ali
>>
>>
>>
>> On Jun 15, 2007, at 1:30 PM, Jiayuan Meng wrote:
>>
>>> Hey all,
>>>
>>> By using the --trace-flags=Exec debug tool, I found that there is
>>> a race condition in the malloc function in my multithreaded
>>> program. However, when looking into the malloc.c in the glibc, it
>>> said it is a thread-safe version. I also noticed that in
>>> malloc/arena.c, it uses mutex_lock(), which seems to be a
>>> spinlock. This may still be problematic if several threads are
>>> accessing the lock simultaneously.
>>>
>>> So, what kind of synchronization support does M5 have in SE mode?
>>> Does it have store-conditional or test-and-set instructions or
>>> I'll have to add one myself?
>>>
>>> Thanks!
>>>
>>> Jiayuan
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> m5-users mailing list
>> m5-users@m5sim.org
>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> m5-users mailing list
>> m5-users@m5sim.org
>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
> _______________________________________________
> m5-users mailing list
> m5-users@m5sim.org
> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>
_______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users