On Mon, 19 Oct 2009 16:02:56 -0700, nathan binkert <[email protected]> wrote: >> At some point we're going to have to discuss what systems we're willing >> to >> support multi-threaded M5 on and what we're not. TLS is great in some >> cases >> (e.g. curTick), however TLS isn't supported on Mac OS X. To get similar >> functionality pthread_getspecific() is used, but it's definitely uglier >> and >> probably slower on Linux than __thread. Is OS X multithreaded support a >> priority? For development it might be sort of nice, but I don't see >> having >> a farm of OS X machines available for simulations in the near future so I >> don't think so. > > As far as TLS is concerned, I don't think we're really going to need > it much at all. In most cases, objects will have local pointers to > their thread local structures (like the event queue). If everywhere that we read curTick inside an SimObject? That is really the big place I could see it being easy to slap on a __thread and be done with it.
> >> Generally, I see the same thing for instructions like CMPXCHG16B. If it >> makes it easier to do, or provides a large speedup then I'm pretty much >> for >> it. I would rather the common case of hardware that people are going to >> run >> simulations on be fast at the expense of hardware that we're not going to >> run on (not multithreaded), just for the sake of having un-used support. >> >> On machines without CMPXCHG16B we could use a mutex, but if the machine >> has >> it and it makes it significantly easier to be correct and fast I'm for >> it. > From what I read, not all 64bit x86 machines support this instruction, > so it's a matter of whether people have modern enough systems to use > it. Both of my clusters do, so that's nice. That said, I'm still not > convinced that a lock free event queue is the right way to go. The > event queue is already a huge bottleneck in a lot of simulations, so > I'd hate to see us slow it down for the (hopefully) uncommon case of a > cross thread scheduled event. We'll see. Depending on how the quantum are done another option is each thread having a set of private producer/consumer queues between each thread in the system. That is easy to do lock free and then the thread that is being scheduled on would take care on merging the event when it felt like it. Ali _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
