Forwarding this thread to the dev list since others in the community might be interested (both in the technical issues and in the fact that someone is working in this direction).
Steve On Fri, Feb 1, 2013 at 9:01 AM, Steve Reinhardt <[email protected]> wrote: > Glad you found this, Ali. > > As far as the decode cache: a great big lock is certainly adequate just to > bring things up and get it working. In the longer term, we should take > advantage of the fact that the decode cache is read-mostly (or at least it > should be... if it's not we have bigger problems) to do something more > intelligent. I'm guessing it would be possible to make the decode cache > lock-free using cmpxchg; if not, some sort of medium-grain > multiple-reader-single-writer locking scheme could also work. But those > optimizations should be left for later; I just wanted to bring them up now > for the record while I was thinking of them. In particular, I think making > the decode cache per-thread is the wrong way to go. > > Steve > > > > On Fri, Feb 1, 2013 at 8:31 AM, Ali Saidi <[email protected]> wrote: > >> ** >> >> Hi Nilay, >> >> >> >> I finally found an email which I've been looking for since the last >> email you sent about running multiple systems in gem5. This undegrad named >> Miles got two systems running in gem5 (in 2007). None of the diffs are >> useful at this point, everything has changed, but in the process he did >> identify the areas that he had to lock around to make multiple systems >> work. I'm not sure if you've gotten past this point yet, but there are the >> areas he identified and "fixed." The fix was just a great-big-lock around >> each of them which for the decode cache really hurt performance. >> >> FastAlloc: gone, so no problem and tcmalloc at least in thread-safe >> >> RefCount: I'm not sure if this is still a problem or not. If the pointers >> you're going to exchange are reference counted they could be. Certainly >> another issue (see below) is refcounting of instructions. This might be the >> biggest reason to more toward c++11 pointers. Miles ended up using gcc >> intrinsics (__atomic_compare_and_exchange() on the incref/decref members, >> although there are now C++::atomic_add and __atomic_fetch_and_add() which >> is probably more useful that having to write a while loop for the comp and >> exchange.) >> >> Stream output (e.g. DPRINTFS from multiple threads) >> >> Decode Cache: since in can be shared cross threads (perhaps it shouldn't >> be, or maybe it should be), and the stl structures aren't threadsafe by >> default. >> >> >> >> Thanks, >> >> Ali >> >> >> > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
