Hi Avadh, what function does the fast_access fulfill anyways? It appears, that it just handles the request to the icache buffer and reports a hit in case a code fetch will match. For all other data/missing instruction requests it will allocate an entry in cpuController and forward the request to the cacheController in case of a store or a load miss in cacheController::access_fast_ path. Furthermore, the fast fast path access effectively bypasses the cache lookup mechanism in case of a hit, as it doesn't check for available read ports. Is there any reason to keep this structure or would it be more accurate to always pass the requests to the cacheController?
Regards, Stefan 2012/3/4 avadh patel <[email protected]> > > On Mon, Feb 27, 2012 at 7:38 AM, Stefan Neumann < > [email protected]> wrote: > >> Hi guys, >> >> I am using MARSSx86 to do some investigations on cache design and noticed >> some strange behavior in the implementation of the simulator for store >> handling. >> >> Stores are fed into the memory hierarchy during commit stage and the >> corresponding ROB and LSQ entries are deallocated immediately afterwards. >> In some situations this will cause the prendingRequest list in >> cpuController to be flooded with store requests. >> This does not seem right from the architecture point of view, right? All >> memory requests need to be tracked by some kind of hardware structure until >> the data is finally merged into the cache. >> So in MARSSx86 this would be the ROB+STQ entry. In case of a cache miss, >> finalization of the request might take some time and new requests from the >> issueQ need to be re-issued in case the LD/ST queues are full. >> >> I agree that queue in cpu controller should be tracked either from CPU > side or from Cache side. The reason I put 'Cache side' is because CPU > controllers are designed to be a purely non-architectural structures that > are used only for simplifying cache and cpu interface. > > In current implementation we don't track any pending stores in > cpucontroller's queue because when a store is committed we update the > 'data' in the RAM directly and only simulate the effect of storing data to > caches. So while the store is pending in cpucontroller's queue and another > load is requested to same address then it can read the latest data from RAM > and it doesn't break any correctness. > > I did some debugging by dumping the pendingRequest list in the >> cpuController from time to time as I was curious about the purpose and >> functionality of the pendingRequest list. >> For that reason I increased the size to 512 entries (also increased the >> pendingRequest lists of the cacheControllers) and what happens is that in >> some cases the list will fill up with store requests. >> >> Cpucontroller queue gets filled up with store requests because > 'access_fast_path' (used for fast access to L1-D cache) currently works > only for loads. We should change this function to support stores so that > we don't hog up the cpu controller queue. > > I would assume that this list can hold at max STQ_SIZE+LDQ_SIZE(+some >> entries for icache requests?) entries. >> >> Due to the fact that the ROB/STQ entries will be deallocated and the >> stores_in_flight counter is decremented as well after the store request, a >> consecutive store might allocate that ROB/STQ and send itself a new >> request, while the first store is still in flight due to a miss. >> >> Request{Memory Request: core[0] thread[0] address[0x00011db42bc0] robid[* >> 109*] init-cycle[276140] ref-counter[4] op-type[memory_op_write] >> isData[1] ownerUUID[288412] ownerRIP[0x4ca4b2] History[ {+core_0_cont} >> {+L1_D_0} {+L2_0} {+MEM_0} ] Signal[ ooo_0_0-dcache-wakeup] } idx[145] >> cycles[-428] depends[176] waitFor[-1] annuled[0] >> Request{Memory Request: core[0] thread[0] address[0x00011db42cf0] robid[* >> 109*] init-cycle[276185] ref-counter[1] op-type[memory_op_write] >> isData[1] ownerUUID[288540] ownerRIP[0x4ca4ab] History[ {+core_0_cont} ] >> Signal[ ooo_0_0-dcache-wakeup] } idx[225] cycles[-383] depends[226] >> waitFor[242] annuled[0] >> >> (I have added the robid here for debugging purposes. In the original >> sources the robid is always zero in case of a store request) >> >> I could observe some situations where over 460 store requests were >> present in the pendingRequest list of the cpuController. (size = 512) >> This will happen if, for example, a memset funtion is called to zero a >> bunch of cachelines inside a loop. >> >> What do you think about this? >> I think it would be a valid scenario if the ROB entry would be >> deallocated after the store request, but in that case the STQ entry needs >> to stay valid until the store request is finalized. I am not sure if that's >> possible as, the ROB and LSQ are closely bound together in MARSS if I >> interpreted the code correctly. >> For now I solved the issue by limiting the allocation of new pending >> requests during the call to MemoryHierarchy::is_cache_available(). I track >> the number of pending loads and stores inside the cpuController class and >> only allow allocation if store count does not exceed STQ_SIZE/load count >> does not exceed LDQ_SIZE. In case the function will return false, a load >> operation will be re-issued and a store may not commit at that point. >> Though I am not sure if that's a good solution for this. >> >> To minimize the effect of cpu controller queue on cache access, we should > start with modification to access_fast_path to allow stores. As this cpu > controller queue is not a real architectural module we don't have to > implement any tracking for pending stores. From CPU side all stores that > are committed, they are written to cache and on subsequent request to those > cache lines will return the most up-to-date data. > > I agree that this is really complicated because of queuing in > cpucontroller. Now cpu is wake-up on cache miss by a signal in the > request, so we don't need to track pending requests in cpu-controller and > we can completely remove this queue. > > - Avadh > > Regards, >> Stefan >> >> _______________________________________________ >> http://www.marss86.org >> Marss86-Devel mailing list >> [email protected] >> https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel >> >> >
_______________________________________________ http://www.marss86.org Marss86-Devel mailing list [email protected] https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
