Hi Nilay,
Overall, I believe we are in more agreement with each other than maybe
you think. I'm glad you included pseudo code in your latest email.
That is a great idea. I think part of our problem is we are
comprehending our textual descriptions in different ways.
Below are my responses:
Absolutely, we still need the ability to allocate or deallocate
entries within actions. I'm not advocating to completely eliminate
the set/unset cache and tbe entry functions. I just want to avoid
including those calls in the inports. I'm confused why the mandatory
queue is different than other queues. They all trigger events in the same
way.
if (L1IcacheMemory.isTagPresent(in_msg.LineAddress)) {
// The tag matches for the L1, so the L1 asks the L2 for it.
trigger(mandatory_request_type_to_event(in_msg.Type),
in_msg.LineAddress); }
Brad, mandatory queue is just an example where an inport may perform tag
lookup before cache and transaction buffer entries has been set. Above is an
excerpt from the file MOESI_CMP_directory-L1cache.sm. Before the
trigger() is called, isTagPresent() is called. This means tag look up is being
performed before cache or transaction buffer entries have been set.
Suppose the tag was present in L1Icache, then in the trigger() call, we will
again perform lookup.
Similarly, there is an inport in the Hammer's protocol implementation where
getCacheEntry() is called before a call to trigger(). Now, why should we use
getCacheEntry() in the inport and cache entry in the action?
The reason is, as you pointed out, we ideally want to call getCacheEntry
once. I believe your suggestion to use local variables in the input
ports gets us there. Below is what I'm envisioning for the MOESI_hammer
mandatory queue in_port logic (at least the IFETCH half of the logic):
ENTRY getL1ICacheEntry(Address addr) {
assert(is_valid(L1DcacheMemory.lookup(addr) == FALSE);
assert(is_valid(L2cacheMemory.lookup(addr) == FALSE);
return L1IcacheMemory.lookup(addr);
}
ENTRY getL1DCacheEntry(Address addr) {
assert(is_valid(L1IcacheMemory.lookup(addr) == FALSE);
assert(is_valid(L2cacheMemory.lookup(addr) == FALSE);
return L1DcacheMemory.lookup(addr);
}
ENTRY getL2CacheEntry(Address addr) {
assert(is_valid(L1IcacheMemory.lookup(addr) == FALSE);
assert(is_valid(L1DcacheMemory.lookup(addr) == FALSE);
return L2cacheMemory.lookup(addr);
}
in_port(mandatoryQueue_in, CacheMsg, mandatoryQueue, desc="...", rank=0) {
if (mandatoryQueue_in.isReady()) {
peek(mandatoryQueue_in, CacheMsg, block_on="LineAddress") {
// Set the local entry variables
ENTRY L1I_cache_entry = getL1ICacheEntry(in_msg.LineAddress);
ENTRY L1D_cache_entry = getL1DCacheEntry(in_msg.LineAddress);
TBE_Entry tbe_entry = getTBE(in_msg.LineAddress);
// Check for data access to blocks in I-cache and ifetchs to blocks in
D-cache
if (in_msg.Type == CacheRequestType:IFETCH) {
// ** INSTRUCTION ACCESS ***
// Check to see if it is in the OTHER L1
if (is_valid(L1D_cache_entry)) {
// The block is in the wrong L1, try to write it to the L2
if (L2cacheMemory.cacheAvail(in_msg.LineAddress)) {
trigger(Event:L1_to_L2, in_msg.LineAddress, L1D_cache_entry,
tbe_entry);
} else {
replace_addr = L2cacheMemory.cacheProbe(in_msg.LineAddress);
replace_cache_entry = getL2CacheEntry(replace_addr);
replace_tbe_entry = getTBE(replace_addr);
trigger(Event:L2_Replacement, replace_addr, replace_cache_entry,
replace_tbe_entry);
}
}
if (is_valid(L1I_cache_entry)) {
// The tag matches for the L1, so the L1 fetches the line. We know
it can't be in the L2 due to exclusion
trigger(mandatory_request_type_to_event(in_msg.Type),
in_msg.LineAddress, L1I_cache_entry, tbe_entry);
} else {
if (L1IcacheMemory.cacheAvail(in_msg.LineAddress)) {
// L1 does't have the line, but we have space for it in the L1
ENTRY L2_cache_entry = getL2CacheEntry(in_msg.LineAddress);
if (is_valid(L2_cache_entry)) {
// L2 has it (maybe not with the right permissions)
trigger(Event:Trigger_L2_to_L1I, in_msg.LineAddress,
L2_cache_entry, tbe_entry);
} else {
// We have room, the L2 doesn't have it, so the L1 fetches the
line
trigger(mandatory_request_type_to_event(in_msg.Type),
in_msg.LineAddress, L1Icache_entry, tbe_entry);
// you could also say here:
"trigger(mandatory_request_type_to_event(in_msg.Type), in_msg.LineAddress, ODD,
ODD);"
}
} else {
// No room in the L1, so we need to make room
if
(L2cacheMemory.cacheAvail(L1IcacheMemory.cacheProbe(in_msg.LineAddress))) {
// The L2 has room, so we move the line from the L1 to the L2
replace_addr = L1IcacheMemory.cacheProbe(in_msg.LineAddress);
replace_cache_entry = getL2CacheEntry(replace_addr);
replace_tbe_entry = getTBE(replace_addr);
trigger(Event:L1_to_L2, replace_addr, replace_cache_entry,
replace_tbe_entry);
} else {
// The L2 does not have room, so we replace a line from the L2
replace_addr =
L2cacheMemory.cacheProbe(L1IcacheMemory.cacheProbe(in_msg.LineAddress));
replace_cache_entry = getL2CacheEntry(replace_addr);
replace_tbe_entry = getTBE(replace_addr);
trigger(Event:L2_Replacement, replace_addr, replace_cache_entry,
replace_tbe_entry);
}
}
}
} else {
// *** DATA ACCESS ***
...
}
// The tag matches for the L1, so the L1 asks the L2 for it.
trigger(mandatory_request_type_to_event(in_msg.Type),
in_msg.LineAddress); }
In MESI_CMP_directory, getState() is called from an inport. This means we
cannot have an implementation of getState() that makes use of cache entry
variable since it would not have been set. Now, different implementations
for setState() and getState() simply does not make sense to me, so in my
opinion setState() will also not use cache entry. These two function (I just
went through the profile output for MOESI hammer), account for about half
of the calls to isTagPresent(), the function that we are trying to get rid of.
I think we need to modify MESI_CMP_directories getState function to use
the cache_entry variable. It doesn't makes sense to me that it wouldn't
since the state is part of the cache_entry. How else does getState and
setState access the cache state?
Maybe I should point out that I'm assuming that getCacheEntry can
return a NULL pointer and thus that can be passed into the trigger
call when no cache or tbe entry exists.
You are correct that getCacheEntry() would have to return a NULL, another
thing which I believe we earlier preferred avoiding. As an aside, we cannot
use the keyword NULL since it is already in use. So, we will have to rename
NULL as some thing else.
On second thought, I think it may not be necessary for getCacheEntry() to
use the keyword.
Yes, I initially was hoping to avoid the NULL (or the OOD) keyword, but
over the past few weeks I have realized that isn't possible. By the
way, if all things are equal, I would prefer to use your OOD keyword
versus NULL. That way we differentiate from C++ NULL since the
getCacheEntry functions don't return pointers. At least I'm still
hoping to avoid that.
Another concern is in implementation of getCacheEntry(). If this
function has to return a pointer to a cache entry, we would have to
provide support for local variables which internally SLICC would assume to
be pointer variables.
Within SLICC understanding that certain variables are actually
pointers is a little bit of a nuisance, but there already exists
examples where we make that distinction. For instance, look at the "if
para.pointer"
conditionals in StateMachine.py. We just have to treat cache and tbe
entries in the same fashion.
I know it is possible to declare pointers in SLICC, CacheMemory being the
foremost example. But we only allow declaration of pointers. We tacitly
assume that when ever they will be used, they will only be used after being
dereferenced. Now, in case of getCacheEntry(), I do not see this happening.
Below is the current implementation of getCacheEntry() from
MOESI_CMP_directory-L1cache.sm.
Entry getCacheEntry(Address addr), return_by_ref="yes" {
if (L1DcacheMemory.isTagPresent(addr)) {
return static_cast(Entry, L1DcacheMemory[addr]);
} else {
return static_cast(Entry, L1IcacheMemory[addr]);
}
}
I would probably convert it to some thing like this.
AbstractCacheEntry getCacheEntry(Address addr) {
AbstractCacheEntry * cache_entry = L1DcacheMemory.lookup(addr);
if(is_valid(cache_entry)) return cache_entry;
return L1IcacheMemory.lookup(addr);
}
Now, if we assume that cache_entry will always be used in its dereferenced
form, then we will face problem in returning cache entry. We can introduce a
'return_by_pointer' annotation for the function to handle this situtation. But
you have pointed out in your previous emails that it is possible for
getCacheEntry() to do more things, which may cause problems for the
compiler.
I think we always assume cache_entry is dereference whenever one of its
members or member functions is called. However cache_entry itself is a
pointer and getCacheEntry always returns a pointer.
In my opinion, we should maintain one method for looking up cache
entries.
My own experience informs me that it is not difficult to incorporate
calls to set/unset_cache_entry () in already existing protocol
implementations.
For implementing new protocols, I think the greater obstacle will be
in implementing the protocol correctly and not in using entry
variables correctly. If we document this change lucidly, there is no
reason to believe a SLICC programmer will be exceptionally pushed
because of this change.
Assuming that this change does introduce some complexity in
progamming with SLICC, does that complexity out weigh the performance
improvements?
My position is we can leverage SLICC as an intermediate language and
achieve the performance benefits of your change without significantly
impacting the programmability. I agree that we need the
set/unset_cache_entry calls in the allocate and deallocate actions. I
see no problem with that. I just want to treat these new implicit
cache and tbe entry variables like the existing implicit variable address.
Therefore I want to pass them into the trigger operation like the
address variable. I also want just one method for looking up cache
entries. I believe the only difference is that I would like to set
the cache and tbe entries in the trigger function, as well as allowing
them to be set in the actions.
I still don't think that we can avoid those calls in inports. By avoiding
those calls, the job of the compiler and the programmer will only become
more difficult. Currently I am of the opinion that if we avoid those
calls, we should avoid this change all together.
We definitely should not give up on this patch. You've put a lot of
hard work into it and, as your performance data has suggested, it should
give us a noticeable speedup. I sympathize with your position and I
really appreciate the effort you've put it. If you'd like, I would be
happy to take over the patch if you'd like.
Brad