An alternative is to realize that we're computer architects! We could easily have a two level cache (and basically combine all three of Nilay's ideas.) You have a per-thread cache, but each decoder would have its own pointer to that per-thread cache, so you don't have to use TLS. If you miss in your local cache, then you look in the global cache, if you hit, you add to the local cache. If you miss, you create it and update both. (Using a lock on the global cache.) The good thing is that there is no coherence since we don't actually delete or update from this cache.
Unfortunately, without two levels, I don't know how to access an unordered map without a lock. A concurrent write can cause your iterator to be invalidated because of a rehash (if the map grew in size). With the regular map, insert does not invalidate iterators, so you could do something like execute a find without a lock, if you miss, then acquire a (read?) lock and do the find again if you hit, return, if you miss, (upgrade to a write lock and) insert the new record. If we did a two level thing, I'd suggest the hash map for the per-thread thing, and the regular map thing for the global thing. All that said, there is a simpler option. The concurrent_unordered_map from TBB is quite good. All locks are internal and are on a per-bucket basis. Inserts and reads are allowed without having to do anything special. Nate On Sat, Feb 9, 2013 at 11:06 AM, Steve Reinhardt <[email protected]> wrote: > My main reaction is that we shouldn't rush into this. As long as we have a > solution that works for now, there are probably many more important things > to work on. Once we have all the other pieces in place to make a usable > parallel simulator, then we can worry about performance optimizations such > as better handling of the decode cache. > > My secondary reaction is that the only potential downside to a globally > shared cache is the cost of acquiring a lock on every read access. In the > long run, writes should be pretty rare, so the cost of updates should be > largely irrelevant. If we can come up with a lock-free way of doing > updates, then there is no downside to a globally shared cache. Thus, when > we do get to the point of wanting to optimize the decode cache, I think the > first order of business is to try and find a way to do lock-free updates. > If we're successful (and I expect we will be), then there's no reason to > consider any other organization. > > Steve > > > > > On Sat, Feb 9, 2013 at 7:01 AM, Nilay <[email protected]> wrote: > >> We need to decide on how we want to handle the decode cache. I can think >> of the following three ways -- >> >> 1. Per decoder cache: needs most space, hence more cache misses and low >> performance. >> >> 2. Per thread cache: less space then above, so less cache misses >> (hopefully). But TLS variables have access costs. Seems like it would at >> least two more instructions per access (on x86-64), more depending on the >> how bad the compiler performs in analyzing the use of the variable. An >> added advantage might be that single thread simulations would not be hurt >> at all. >> >> 3. Global cache: least space, so should have least cache misses. But >> requires protection of a lock. The costs will be several usual >> instructions + one atomic instruction (should result in some coherency >> overhead) even if the lock in not contended for (unlikely). Would require >> extra code if we are to avoid hurting single thread simulation >> performance. Some RCU-type implementation might be possible as well. >> >> In my opinion the size of the cache should decide which way to go. If it >> is less than a 100 KB or so, per simulated cpu variable seems fine to me, >> TLS if about 500 KB, and global variable above that. >> >> -- >> Nilay >> >> _______________________________________________ >> gem5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/gem5-dev >> > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
