If seems like for whatever reason the malloc code is using this header from libc:
./sysdeps/generic/malloc-machine.h
as opposed to:
./nptl/sysdeps/pthread/malloc-machine.h

The prior just does what you have listed below, the latter version actually calls code that does a real locking. I would check your build environment before you go off trying to write your own locking code.
Ali


On Jun 16, 2007, at 7:18 PM, Jiayuan Meng wrote:

Thanks Steve!



As long as the CPU IDs are distinct then it shouldn't matter that the
thread IDs are all zero... the load locked/store conditional code
requires both to match before it considers the request as coming from
the same place.

I see. I also tried assigning a unique ID to each thread. The problem is still there though.


Can you tell from the trace that there's a point inside mutex_lock()
where all of your cpus do LLs to the same address followed by SCs, and all of the SCs succeed? This would be definitive evidence that there's
a problem with LL/SC.  Other than that it's just a guess that this is
where the problem lies.

As I am looking into the trace, it seems that the libc_malloc is not using LL/SC. The critial part looks like this:

ldl r1, 0(r0) .......... load r1, initially it is 0
bne r1,0xxx .......... if r1 is non-zero, jump to function libc__arena_get
lda r1, 1(r31).........assign r1 to be 1
stl r1, 0(r0)................store r1 to its original address

the mutex_lock and unlock functions are inlined, so it appears these all happen in libc_malloc.

the race condition happens when the four threads simultaneously run this piece of code. If only I can change the ldl and stl to LL/SC! I'm guessing that I am not invoking the right mutex_lock functions, but I don't have idea about how to do that. Maybe I can write a mutex_lock function and link it to the program. Any suggestions?



Also, what happens if you run with AtomicSimpleCPU, and with or without
a single level of caches?

I'm currently using private L1s and shared L2s. I'll test about using a single level of shared L1. but are you actually interested about how LL/SC in M5 behave under these different configurations? I'll let you know when I capture these instructions :)

Thanks for the insights!

Jiayuan



Steve

Jiayuan Meng wrote:
Hi Ali,

Thanks for the quick responce.

I am having a master thread spawning child threads on multiple cpus.
Once a thread gets allocated to a cpu, it always resides there (so far).

I am using AtomicTimingCPU. In my test case with racing mallocs,
I have five CPUs(with id from 0 to 4). A master threads initially runs on cpu0. When it comes to a pseudo instruction, it tells the simulator to spawn four child threads on the other CPUs. Each CPU only uses one
thread context(all have the id 0 by default). Will this be a
problem? I'll try assigning different thread IDs.

To create threads, I learned from "stack_createFunc" and
"init_thread_context" in kern/tru64/tru64.hh, basically allocate a new stack, and assigns the pc and sp register. A major difference might be
that I am not using pthreads. instead, I inserted a new pseudo
instruction which "atomically" creates four threads on the other four
CPUs, they start to execute at the same cycle.

I actually extended SimpleCPUs to have multiple thread contexts and the
CPU can switch among them. They are tested with the splash2 FFT
benchmark and things went fine. But to make the test more clear, I just set each CPU to have exactly one thread context. In the future, I may
need to "migrate" a running thread context from one CPU to another.

I'm in trouble now... I wonder how splash2 gets around with this in SE mode?

Thanks again!

Jiayuan


    ----- Original Message -----
    *From:* Ali Saidi <mailto:[EMAIL PROTECTED]>
    *To:* M5 users mailing list <mailto:m5-users@m5sim.org>
    *Sent:* 2007年6月16日 2:41 AM
    *Subject:* Re: [m5-users] synchronization primitives in SE mode

The Alpha ISA has a load locked and a store conditional instruction which we support. Again I don't know exactly what you're doing to create your threads, but you need to make sure that their cpu/ thread ids are unique. Are you scheduling each thread on it's own cpu or
    are they moving around?

    Ali



    On Jun 15, 2007, at 1:30 PM, Jiayuan Meng wrote:

    Hey all,

By using the --trace-flags=Exec debug tool, I found that there is
    a race condition in the malloc function in my multithreaded
program. However, when looking into the malloc.c in the glibc, it
    said it is a thread-safe version. I also noticed that in
    malloc/arena.c, it uses mutex_lock(), which seems to be a
    spinlock. This may still be problematic if several threads are
    accessing the lock simultaneously.

So, what kind of synchronization support does M5 have in SE mode?
    Does it have store-conditional or test-and-set instructions or
    I'll have to add one myself?

    Thanks!

    Jiayuan

-------------------------------------------------------------------- ----

    _______________________________________________
    m5-users mailing list
    m5-users@m5sim.org
    http://m5sim.org/cgi-bin/mailman/listinfo/m5-users


-------------------------------------------------------------------- ----

_______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
_______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
_______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

_______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Reply via email to