>>>>> "Michael" == Michael K Edwards <[EMAIL PROTECTED]> writes:
Michael> On 5/18/05, Hubert Chan <[EMAIL PROTECTED]> wrote: Hubert> Commenting out those lines, and compiling multi-threaded, gives Hubert> performance similar to the single-threaded case. So what does Hubert> this mean? I doubt that Ryan will want to disable Hubert> THREAD_LOCAL_ALLOC Debian-wide. BTW, I'll ask my upstream to try it too and see if his results agree with mine. Michael> It means someone ought to beat on the spin-then-queue locking Michael> implementation enabled by THREAD_LOCAL_ALLOC until it isn't Michael> retrograde for the common single-threaded case. That's really Michael> a job for oprofile, which I'm starting to get spun up on now; Michael> but code inspection, informed by some knowledge about NPTL, Michael> might be enough. OK. That's probably beyond me at the moment. Michael> By the way, if you want to use oprofile, you might as well use Michael> the 0.8.2 release. ... I'll take a look at that if/when I get around to looking at oprofile. It looks more complicated than what I want to look at right now. (I have other things that need to be looked at.) Hubert> I also tried compiling with THREAD_LOCAL_ALLOC, but using Hubert> GC_local_malloc instead of GC_malloc, but performance is similar Hubert> to just using GC_malloc. Michael> From http://www.hpl.hp.com/personal/Hans_Boehm/gc/scale.html : scale.html> The easiest way to switch an application to thread-local scale.html> allocation is to scale.html> 1. Define the macro GC_REDIRECT_TO_LOCAL, and then include scale.html> the gc.h header in each client source file. Yup, did that. scale.html> 2. Invoke GC_thr_init() before any allocation. That seems to be a typo. It should be GC_init(). If I just call GC_thr_init, I get a segfault when I try to allocate memory. scale.html> 3. Allocate using GC_MALLOC, GC_MALLOC_ATOMIC, and/or scale.html> GC_GCJ_MALLOC. My upstream redefines GC_MALLOC so that it throws an exception (C++) if allocation fails. So I just edited his re-definition to call GC_local_malloc instead of GC_malloc (which is what GC_REDIRECT_TO_LOCAL does anyways). Michael> Oddly, -DPARALLEL_MARK may improve the situation for UP Michael> thread-local allocation, because it results in the use of an Michael> implementation of GC_malloc_many (used to refill thread-local Michael> free lists) that may be better tuned for thread-local usage Michael> patterns (as well as more concurrent). Hmm. I'll take a look at that. -- Hubert Chan <[EMAIL PROTECTED]> - http://www.uhoreg.ca/ PGP/GnuPG key: 1024D/124B61FA Fingerprint: 96C5 012F 5F74 A5F7 1FF7 5291 AF29 C719 124B 61FA Key available at wwwkeys.pgp.net. Encrypted e-mail preferred. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

