Hello

I have been debugging the assertion failure when the L3$ (residing in the HNF) 
clusivity = Mostly Exclusive.

All the failures are related when the config being modelled is L2$ for all CPUs 
is private & [L2$, L3$] clusivity == [Mostly Inclusive, Mostly_Exclusive].

Enabling some debug flags, a snippet of the trace at the failure point being:

12180159000: RubyGenerated: system.cpu2.l2: executing Pop_TriggerQueue
12180159000: RubyGenerated: system.cpu2.l2: executing Send_Data
12180159000: RubyGenerated: system.cpu2.l2: executing 
ProcessNextState_ClearPending
12180159000: RubyGenerated: system.cpu2.l2: next_state: BUSY_BLKD
ProtocolTrace:     12180159000  18      Cache             TX_Data 
BUSY_BLKD>BUSY_BLKD [0xab040, line 0xab040]
ProtocolTrace:     12180159000   7        Seq               Begin       >       
[0x24db90, line 0x24db80] LD
12180160000: RubyGenerated: system.ruby.hnf1.cntrl: [Cache_Controller 25], 
Time: 24360320, state: BUSY_BLKD, event: CompAck, addr: 0xab040
12180160000: RubyGenerated: system.ruby.hnf1.cntrl: executing Receive_ReqResp
12180160000: RubyGenerated: system.ruby.hnf1.cntrl: executing 
UpdateDirState_FromReqResp
build/ARM/mem/ruby/protocol/Cache_Controller.cc:5477: panic: Runtime Error at 
CHI-cache-actions.sm:1947: assert failure.

Looking at around line 1947 in CHI-cache-actions.sm:

action(UpdateDirState_FromReqResp, desc="") { <== HNF1 cache controller 
executing this cache action
  peek(rspInPort, CHIResponseMsg) {
    if ((in_msg.type == CHIResponseType:CompAck) && tbe.updateDirOnCompAck) {
      assert(tbe.requestor == in_msg.responder);

      tbe.dir_sharers.add(in_msg.responder);

      if (tbe.requestorToBeOwner) {
        assert(tbe.dataMaybeDirtyUpstream);
        assert(tbe.dir_ownerExists == false);
        assert(tbe.requestorToBeExclusiveOwner == false);
        tbe.dir_owner := in_msg.responder;
        tbe.dir_ownerExists := true;
        tbe.dir_ownerIsExcl := false;

      } else if (tbe.requestorToBeExclusiveOwner) {
        assert(tbe.dataMaybeDirtyUpstream);
        assert(tbe.dir_ownerExists == false);
        assert(tbe.dir_sharers.count() == 1); <== Line 1947
        tbe.dir_owner := in_msg.responder;
        tbe.dir_ownerExists := true;
        tbe.dir_ownerIsExcl := true;
      }
    }
  }
  printTBEState(tbe);
}

So the problem _seems_ to be related to updating the directory state within 
HNF1.
The L2$ wants to make the requested cache line to be exclusive. Thus 
dir_sharers.count should be zero (as the cache line now only resides in a 
single L2$).

QS: Is this possibly a CHI bug?

P.S : I have also attached the gzipped version of the log file.

Tks

JO


From: Javed Osmany
Sent: 22 April 2022 12:05
To: gem5 users mailing list <gem5-users@gem5.org>
Cc: Javed Osmany <javed.osm...@huawei.com>
Subject: RE: CHi - assertion error when modelling "mostly inclusive" for 
private L2$

Hello

An update on my previous email...

Have been simulating the multicore system with Parsec/Splash2 benchmarks for 
different permutations of clusivity for L2$ and L3$. The results being in the 
following table. Note, by L3$, I mean the L3$ within the HNF

L2$ clusivity

L3$ clusivity

Comments

Strict Inclusive (sincl) (default)

Mostly inclusive (mincl) (default)

All tests complete okay

mincl

mincl

All tests complete okay

mincl

Mostly exclusive (mexcl)

10 tests abort with the assertion failure

sincl

mexcl

10 tests abort with the assertion failure


>From the above, the deduction being that setting L3$ clusivity to mostly 
>exclusive would be the cause of the problem.
The definition of mostly_inclusive (defined by default for the HNFCache 
controller in CHI_config.py) and mostly_exclusive (based on the write up @ 
https://www.gem5.org/documentation/general_docs/ruby/CHI/ and my understanding 
being that the L3$ now becomes a victim cache of the L2$) that I have used 
being:



mostly inclusive

mostly exclusive

comments

alloc_on_seq_acc

False

False



alloc_on_seq_line_write

False

False



alloc_on_readshared

True

False



alloc_on_readunique

False

False



alloc_on_readonce

True

False



alloc_on_writeback

True

True

For the L3$, writebacks and evictions being the mechanism of allocating a cache 
line

dealloc_on_unique

True

True

Upstream $line becomes unique, then deallocate from L3$

dealloc_on_shared

False

True

Upstream $line becomes shared, then deallocate from L3$

dealloc_backinv_unique

False

False

If the L3$ line is deallocated due to replacement, then don't back invalidate 
the upstream cache line.

dealloc_backinv_shared

False

False

If the L3$ line is deallocated due to replacement, then don't back invalidate 
the upstream cache line.











Any insight as to why the above encoding for mostly exclusive might be wrong 
and thus causing the assertions to fire, would be greatly appreciated.

Thanks in advance
JO



From: Javed Osmany
Sent: 21 April 2022 16:03
To: gem5 users mailing list <gem5-users@gem5.org<mailto:gem5-users@gem5.org>>
Cc: Javed Osmany <javed.osm...@huawei.com<mailto:javed.osm...@huawei.com>>
Subject: CHi - assertion error when modelling "mostly inclusive" for private L2$

Hello

I am simulating a multicore Ruby system using CHI, using the Parsec/Splash2 
benchmarks & gem5-21.2.1.0.
It consists of three clusters :

1)      Little cluster of 4 CPUs, each CPU has private L1$ and L2$

2)      Middle cluster of 3 CPUs, each CPU has private L1$ and L2$

3)      Big cluster of 1 CPU with private L1$ and L2$.

By default, the L2$ and L3$ (residing in the HNF) have their clusivity set to 
strict_inclusive and mostly_inclusive respectively (CHI_config.py):

class CHI_L2Controller(CHI_Cache_Controller):
    '''
    Default parameters for a L2 Cache controller
    '''

    def __init__(self, ruby_system, cache, l2_clusivity, prefetcher):
        super(CHI_L2Controller, self).__init__(ruby_system)
        self.sequencer = NULL
        self.cache = cache
        self.use_prefetcher = False
        self.allow_SD = True
        self.is_HN = False
        self.enable_DMT = False
        self.enable_DCT = False
        self.send_evictions = False
        # Strict inclusive MOESI
         self.alloc_on_seq_acc = False
         self.alloc_on_seq_line_write = False
         self.alloc_on_readshared = True
         self.alloc_on_readunique = True
         self.alloc_on_readonce = True
         self.alloc_on_writeback = True
         self.dealloc_on_unique = False
         self.dealloc_on_shared = False
         self.dealloc_backinv_unique = True
         self.dealloc_backinv_shared = True

class CHI_HNFController(CHI_Cache_Controller):
    '''
    Default parameters for a coherent home node (HNF) cache controller
    '''

    #def __init__(self, ruby_system, cache, prefetcher, addr_ranges):
    def __init__(self, ruby_system, cache, prefetcher, addr_ranges, 
hnf_enable_dmt, hnf_enable_dct, \
                 num_tbe, num_repl_tbe, num_snp_tbe, unified_repl_tbe, 
l3_clusivity):
        super(CHI_HNFController, self).__init__(ruby_system)
        self.sequencer = NULL
        self.cache = cache
        self.use_prefetcher = False
        self.addr_ranges = addr_ranges
        self.allow_SD = True
        self.is_HN = True
        #self.enable_DMT = True
        #self.enable_DCT = True
        self.enable_DMT = hnf_enable_dmt
        self.enable_DCT = hnf_enable_dct
        self.send_evictions = False
        # MOESI / Mostly inclusive for shared / Exclusive for unique
        self.alloc_on_seq_acc = False
        self.alloc_on_seq_line_write = False
        self.alloc_on_readshared = True
        self.alloc_on_readunique = False
        self.alloc_on_readonce = True
        self.alloc_on_writeback = True
        self.dealloc_on_unique = True
        self.dealloc_on_shared = False
        self.dealloc_backinv_unique = False
        self.dealloc_backinv_shared = False

The simulations complete okay for the default clusivity of L2$ and L3$.
However, if I change the L2$ clusivity to "mostly_inclusive" some of the 
benchmarks are failing with an assertion error.

I took the default L3$ clusivity of mostly_inclusive to update the L2$ 
clusivity to be mostly_inclusive:

class CHI_L2Controller(CHI_Cache_Controller):
    '''
    Default parameters for a L2 Cache controller
    '''

    def __init__(self, ruby_system, cache, l2_clusivity, prefetcher):
        super(CHI_L2Controller, self).__init__(ruby_system)
        self.sequencer = NULL
        self.cache = cache
        self.use_prefetcher = False
        self.allow_SD = True
        self.is_HN = False
        self.enable_DMT = False
        self.enable_DCT = False
        self.send_evictions = False
        # Strict inclusive MOESI
        if (l2_clusivity == "sincl"):
            self.alloc_on_seq_acc = False
            self.alloc_on_seq_line_write = False
            self.alloc_on_readshared = True
            self.alloc_on_readunique = True
            self.alloc_on_readonce = True
            self.alloc_on_writeback = True
            self.dealloc_on_unique = False
            self.dealloc_on_shared = False
            self.dealloc_backinv_unique = True
            self.dealloc_backinv_shared = True
        elif (l2_clusivity == "mincl"):
            # Mostly inclusive MOESI
            self.alloc_on_seq_acc = False
            self.alloc_on_seq_line_write = False
            self.alloc_on_readshared = True
            self.alloc_on_readunique = False
            self.alloc_on_readonce = True
            self.alloc_on_writeback = True
            self.dealloc_on_unique = True
            self.dealloc_on_shared = False
            self.dealloc_backinv_unique = False
            self.dealloc_backinv_shared = False

The assertion error being:

log_parsec_volrend_134_8rnf_1snf_4hnf_3_clust_all_priv_l2.txt:build/ARM/mem/ruby/protocol/Cache_Controller.cc:5477:
 panic: Runtime Error at CHI-cache-actions.sm:1947: assert failure.

QS 1: Even though the L2$ is private, i am assuming that L2$ clusivity can be 
set to mostly_inclusive. Is that assumption correct?
QS2: If the answer to QS 1 is yes, then it would seem that the 
"mostly_inclusive" settings for the L2$ (copied from the mostly_inclusive 
settings for L3$ residing in the HNF) could be the root cause of the problem. 
Any thoughts on this ?

Thanks in advance
JO

Attachment: log_parsec_lu_cb_134_8rnf_1snf_4hnf_3_clust_all_priv_l2_mincl_mexcl_debug1.txt.gz
Description: log_parsec_lu_cb_134_8rnf_1snf_4hnf_3_clust_all_priv_l2_mincl_mexcl_debug1.txt.gz

_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to