Hi guys,
  I am currently tracking down a bug in large (16 cores) configurations
and the MOESI / private-L2 + directory configuration.  Is that
configuration tested under high contention?

I have walked the through large traces of this, and the problem boils
down to an issue where the directory sends an update message to an L2 it
thinks is the owner of the cacheline.  The problem is that the receiving
L2 has the cache line in Shared, which then expects update messages to
come in with a proper new cache-line stat, triggering a NULL pointer
exception when it tries to dereference the m_arg argument reading the
state in:

MOESILogic::handle_interconn_hit -> case MOESI_SHARED: -> 
*state = *(W8*)(queueEntry->m_arg);

There is a number of things I do not understand, and I would be
extremely grateful if someone could help me understand what is going on
here.

When is it legal to send an update message?
What is the purpose of the directory sending an update message?  
What is the invariant of the directory's owner / present fields?  Are
they supposed to be precise?  (They are not, i.e., owner and present
fields seem to be out of sync with the real state in the caches.)
What is the difference between all the source / origin / dest/ responder
etc. fields?

While digging through this, I have found a number of unexplained things,
which may be bugs or rooted in my misunderstanding:

* the directory sometimes wrongly interprets the sender of an evict
  message because it seems to merge it with a DirContBufferEntry
  of the still ongoing request that caused the evict
  ...
  this breaks the owner tracking

* the directory does not seem to wait for an evict etc. to be fully
  propagated to the cache hierarchy

* the content of the update message is dependent on the state in the
  receiver M/O vs. E/S, is that intentional?

* if one L2 sends its data directly to another L2, that receiving L2 is
  confused and complains about an unknown source

* there is no distinction between request / responses; it is all rather
  implicit (upper / lower interconnect, hasData) so is there a rule of
  thumb to keep the two apart?

In general, is there some high-level draft, how the directory system
should work and which kind of invariants hold when?  Has the model been
tested and is it safe to use?

Thanks,
  Stephan


_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Reply via email to