Re: [marss86-devel] purpose of pendingRequest list in cpuController / store request issue during commit stage

Stefan Neumann Mon, 05 Mar 2012 03:57:21 -0800

Hi Avadh,

what function does the fast_access fulfill anyways? It appears, that it
just handles the request to the icache buffer and reports a hit in case a
code fetch will match.
For all other data/missing instruction requests it will allocate an entry
in cpuController and forward the request to the cacheController in case of
a store or a load miss in cacheController::access_fast_
path. Furthermore, the fast fast path access effectively bypasses the cache
lookup mechanism in case of a hit, as it doesn't check for available read
ports.
Is there any reason to keep this structure or would it be more accurate to
always pass the requests to the cacheController?


Regards,
Stefan


2012/3/4 avadh patel <[email protected]>

>
> On Mon, Feb 27, 2012 at 7:38 AM, Stefan Neumann <
> [email protected]> wrote:
>
>> Hi guys,
>>
>> I am using MARSSx86 to do some investigations on cache design and noticed
>> some strange behavior in the implementation of the simulator for store
>> handling.
>>
>> Stores are fed into the memory hierarchy during commit stage and the
>> corresponding ROB and LSQ entries are deallocated immediately afterwards.
>> In some situations this will cause the prendingRequest list in
>> cpuController to be flooded with store requests.
>> This does not seem right from the architecture point of view, right? All
>> memory requests need to be tracked by some kind of hardware structure until
>> the data is finally merged into the cache.
>> So in MARSSx86 this would be the ROB+STQ entry. In case of a cache miss,
>> finalization of the request might take some time and new requests from the
>> issueQ need to be re-issued in case the LD/ST queues are full.
>>
>> I agree that queue in cpu controller should be tracked either from CPU
> side or from Cache side.  The reason I put 'Cache side' is because CPU
> controllers are designed to be a purely non-architectural structures that
> are used only for simplifying cache and cpu interface.
>
> In current implementation we don't track any pending stores in
> cpucontroller's queue because when a store is committed we update the
> 'data' in the RAM directly and only simulate the effect of storing data to
> caches.  So while the store is pending in cpucontroller's queue and another
> load is requested to same address then it can read the latest data from RAM
> and it doesn't break any correctness.
>
> I did some debugging by dumping the pendingRequest list in the
>> cpuController from time to time as I was curious about the purpose and
>> functionality of the pendingRequest list.
>> For that reason I increased the size to 512 entries (also increased the
>> pendingRequest lists of the cacheControllers) and what happens is that in
>> some cases the list will fill up with store requests.
>>
>> Cpucontroller queue gets filled up with store requests because
> 'access_fast_path' (used for fast access to L1-D cache) currently works
> only for loads.  We should change this function to support stores so that
> we don't hog up the cpu controller queue.
>
> I would assume that this list can hold  at max STQ_SIZE+LDQ_SIZE(+some
>> entries for icache requests?) entries.
>>
>> Due to the fact that the ROB/STQ entries will be deallocated and the
>> stores_in_flight counter is decremented as well after the store request, a
>> consecutive store might allocate that ROB/STQ and send itself a new
>> request, while the first store is still in flight due to a miss.
>>
>> Request{Memory Request: core[0] thread[0] address[0x00011db42bc0] robid[*
>> 109*] init-cycle[276140] ref-counter[4] op-type[memory_op_write]
>> isData[1] ownerUUID[288412] ownerRIP[0x4ca4b2] History[ {+core_0_cont}
>> {+L1_D_0} {+L2_0} {+MEM_0} ] Signal[ ooo_0_0-dcache-wakeup] } idx[145]
>> cycles[-428] depends[176] waitFor[-1] annuled[0]
>> Request{Memory Request: core[0] thread[0] address[0x00011db42cf0] robid[*
>> 109*] init-cycle[276185] ref-counter[1] op-type[memory_op_write]
>> isData[1] ownerUUID[288540] ownerRIP[0x4ca4ab] History[ {+core_0_cont} ]
>> Signal[ ooo_0_0-dcache-wakeup] } idx[225] cycles[-383] depends[226]
>> waitFor[242] annuled[0]
>>
>> (I have added the robid here for debugging purposes. In the original
>> sources the robid is always zero in case of a store request)
>>
>> I could observe some situations where over 460 store requests were
>> present in the pendingRequest list of the cpuController. (size = 512)
>> This will happen if, for example, a memset funtion is called to zero a
>> bunch of cachelines inside a loop.
>>
>> What do you think about this?
>> I think it would be a valid scenario if the ROB entry would be
>> deallocated after the store request, but in that case the STQ entry needs
>> to stay valid until the store request is finalized. I am not sure if that's
>> possible as, the ROB and LSQ are closely bound together in MARSS if I
>> interpreted the code correctly.
>> For now I solved the issue by limiting the allocation of new pending
>> requests during the call to MemoryHierarchy::is_cache_available(). I track
>> the number of pending loads and stores inside the cpuController class and
>> only allow allocation if store count does not exceed STQ_SIZE/load count
>> does not exceed LDQ_SIZE. In case the function will return false, a load
>> operation will be re-issued and a store may not commit at that point.
>> Though I am not sure if that's a good solution for this.
>>
>> To minimize the effect of cpu controller queue on cache access, we should
> start with modification to access_fast_path to allow stores.  As this cpu
> controller queue is not a real architectural module we don't have to
> implement any tracking for pending stores.  From CPU side all stores that
> are committed, they are written to cache and on subsequent request to those
> cache lines will return the most up-to-date data.
>
> I agree that this is really complicated because of queuing in
> cpucontroller.  Now cpu is wake-up on cache miss by a signal in the
> request, so we don't need to track pending requests in cpu-controller and
> we can completely remove this queue.
>
> - Avadh
>
> Regards,
>> Stefan
>>
>> _______________________________________________
>> http://www.marss86.org
>> Marss86-Devel mailing list
>> [email protected]
>> https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
>>
>>
>

_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Re: [marss86-devel] purpose of pendingRequest list in cpuController / store request issue during commit stage

Reply via email to