Hi Ali and Steve,

I am starting to realize that enable thread-safe glibc might not help enable 
hardware threads since there are more OS level system calls involved besides 
LL/SC.

I decided to modify glibc functions by adding a wrapper function to malloc and 
free. The wrapper function uses mutex locks so that only one thread can enter 
the body of malloc and free. The mutex locks use LL/SC. I recompiled glibc with 
crosstool, and now it works for a larger scale. I am still going through more 
tests though.

Thanks for your great support!

Jiayuan

----- Original Message ----- 
From: "Ali Saidi" <[EMAIL PROTECTED]>
To: "M5 users mailing list" <[email protected]>
Sent: 2007年6月24日 11:10 AM
Subject: Re: [m5-users] support for hardware threads (with call_pal rduniq?)


Hi Jiayuan,

We don't have a patch for it, but you can feel free to implement it.  
If you `man futex` you can get an idea about what is going on. Some  
of that code is in in libc and some of it is in the kernel (sys_futex 
() in kernel/futex.c).
Ali


On Jun 23, 2007, at 9:53 PM, Jiayuan Meng wrote:

> Hi Ali and Steve,
>
> Thanks for the insights!
>
> I am trying to fake it by assigning each thread's MISCREG_UNIQ  
> register to that of the main thread. A small scale test shows that  
> it actually works for two hardware threads. When I increase the  
> thread number to three, an "fatal" error prompts out:
>
> " fatal: syscall futex (#394) unimplemented."
>
> The trace shows that the system call happens at
> @__lll_lock_wait+72.
>
> Is there any patches available that implements this system call? or  
> how difficult it is to implement it?
>
> Does this fact means that the locking scheme involved is more  
> complex than LL/SC ? Is there anyway around it?
>
> Thanks!
>
> Jiayuan
>
> ----- Original Message -----
> From: "Steve Reinhardt" <[EMAIL PROTECTED]>
> To: "M5 users mailing list" <[email protected]>
> Sent: 2007年6月23日 7:19 AM
> Subject: Re: [m5-users] support for hardware threads (with call_pal  
> rduniq?)
>
>
>> The uniq register typically is used to hold a pointer to the per- 
>> thread
>> state.  I'm guessing that as part of creating a new thread you may  
>> need
>> to allocate some additional space (or reserve space on the thread's
>> stack) for that per-thread structure and then set the uniq  
>> register to
>> that value.
>>
>> The Tru64 pthreads code already does this, so you can look in
>> src/kern/tru64 for an example (grep for MISCREG_UNIQ in tru64.hh).
>> Unfortunately you'll probably have to look at the Linux pthreads  
>> library
>> source (or maybe the kernel?) to figure out exactly what Linux  
>> requires
>> (how much space to allocate, whether the space needs to be  
>> initialized,
>> etc.).
>>
>> By all means, please keep us posted...
>>
>> Steve
>>
>> Ali Saidi wrote:
>>> Hi Jiayuan,
>>>
>>> RD Uniq is a PAL code call that the unique field of the Process  
>>> Control
>>> Block (PCB). The PCB describes a process to the pal code. It doesn't
>>> really exist for running is syscall emulation mode, however we do
>>> implement the read uniq/write uniq call pals. I believe there are  
>>> two
>>> possibilities of what is going wrong. a) The kernel puts some  
>>> value in
>>> the unique area of the PCB that we don't or b) when you copy the  
>>> thread
>>> context for the new thread you don't copy the Runiq register and  
>>> that is
>>> causing the problem.
>>>
>>> You can read about it in the Alpha Architecture Reference Manual.  
>>> The
>>> code is ~718 in decoder.isa and if you look at the system code on
>>> m5sim.org you can see the real implementation of rduniq in osfpal.S
>>>
>>> Ali
>>>
>>> On Jun 22, 2007, at 10:31 AM, Jiayuan Meng wrote:
>>>
>>>> Hey all,
>>>>
>>>> continued on the synchronization mail thread...
>>>>
>>>> I tried gcc-3.4.5-glibc-2.3.5.dat to configure the cross tool. I  
>>>> added
>>>> in the following options to enable thread local storage(tls):
>>>>
>>>> GLIBC-EXTRA-CONFIG="GLIBC_EXTRA_CONFIG --with-tls --with-__thread
>>>> --enable-kernel=2.4.18"
>>>> GLIBC_ADDON_OPTIONS="=nptl"
>>>>
>>>> It compiles and worked for single threaded program. But when  
>>>> applied
>>>> to my manually created hardware threads, the malloc craches. I  
>>>> think
>>>> the problem is at the "call_pal rduniq" instruction. Here is a
>>>> comparison of what happens in single threaded and what happens in
>>>> multi-threaded programs:
>>>> ======== single threaded ===================
>>>> @__libc_malloc+64 : call_pal   rduniq          : IntAlu :
>>>> D=0x00000001200c8690
>>>> @__libc_malloc+68 : ldq        r1,-26600(r29)  : MemRead :
>>>> D=0x0000000000000038 A=0x1200b4290
>>>> @__libc_malloc+72 : addq       r0,r1,r0        : IntAlu :
>>>> D=0x00000001200c86c8
>>>> @__libc_malloc+76 : ldq        r9,0(r0)        : MemRead :
>>>> D=0x00000001200c58b8 A=0x1200c86c8
>>>> @__libc_malloc+80 : beq        r9,0x12001dc40  : IntAlu :
>>>> @__libc_malloc+84 : ldl_l      r1,0(r9)        : MemRead :
>>>> D=0x0000000000000000 A=0x1200c58b8
>>>> @__libc_malloc+88 : cmpeq      r1,0,r2         : IntAlu :
>>>> D=0x0000000000000001
>>>> @__libc_malloc+92 : beq        r2,0x12001dc38  : IntAlu :
>>>> @__libc_malloc+96 : bis        r31,1,r2        : IntAlu :
>>>> D=0x0000000000000001
>>>> @__libc_malloc+100 : stl_c      r2,0(r9)        : MemWrite :
>>>> D=0x0000000000000001 A=0x1200c58b8
>>>> .....
>>>> ========= hardwared multi-threaded =========
>>>> @__libc_malloc+64 : call_pal   rduniq          : IntAlu :
>>>> D=0x0000000000000000
>>>> @__libc_malloc+68 : ldq        r1,-26592(r29)  : MemRead :
>>>> D=0x0000000000000038 A=0x1200b42a8
>>>> @__libc_malloc+72 : addq       r0,r1,r0        : IntAlu :
>>>> D=0x0000000000000038
>>>> @__libc_malloc+76 : ldq        r9,0(r0)        : MemRead :  A=0x38
>>>> Aborted here: access invalid address 0x38
>>>> ------------------------------------------------
>>>>
>>>> So, the good news is that this version uses LL/SC. but the  
>>>> "call_pal
>>>> rduniq" becomes the next killer.
>>>> I googled and found call_pal rduniq has something to do with the
>>>> thread pointer. But I am still hazy on what it does. Maybe you can
>>>> shed some light on it ? why in the second case, the value it  
>>>> loads to
>>>> r0 is 0 ? Is it because I am creating hardware threads by just
>>>> assigning pc and sp, without using pthread calls at the software
>>>> level? Is there anyway to fix/hack this?
>>>>
>>>> Thanks!
>>>>
>>>> Jiayuan_______________________________________________
>>>> m5-users mailing list
>>>> [email protected]
>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>>>
>>> _______________________________________________
>>> m5-users mailing list
>>> [email protected]
>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>>
>> _______________________________________________
>> m5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
> _______________________________________________
> m5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Reply via email to