Hi Thanks for your answer Nilay. I have run the application two times. First time, number of iteration is 100 times and in second time, for loop is run 300 times. These are number of ruby-cycles that taken for running application:
Iteration=100 and Cores 2,3 -> Ruby-cycles= *116030060 *Iteration=100 and Cores 2,4 -> Ruby-cycles= *119546137* Iteration=300 and Cores 2,3 -> Ruby-cycles= *346324353* Iteration=300 and Cores 2,4 -> Ruby-cycles= *357471770* It's clear that number of cycles when app runs on shared L2 and distinct one must be different. For first run, cycles difference is 3516077 (=*119546137 - * *116030060) *and for second one is 11147417. Could you tell me why difference of cycles is increased when number of iteration is increased? By the way, I have another question about MOESI-CMP-directory. If possible please answer it. Consider described architecture in my last post. Suppose only DL1 of Core2 has data X. At this point, Core3 (Cores 2 and 3 belong to same ship and have a shared L2) requests X. There are two approaches for responding to this request. 1- Directory set state of X to SHARE and then read data from L2. 2- Without any access to directory, data read from L2 and state of X doesn't change. I think 2 is correct. Could you help me? Thanks On Thu, Jul 14, 2011 at 11:38 AM, Nilay <[email protected]> wrote: > On Thu, July 14, 2011 1:38 am, Hamid Reza Khaleghzadeh wrote: > > Hello all, > > > > I have simulated a 8 cores CMP where consists of 4 chips and each chip > has > > 2 > > cores and one shared L2. MOESI-CMP-directory is coherency protocol. > > > > Core0 Core1 Core2 Core3 Core4 Core5 Core6 > Core7 > > |------------| |------------| |---------------| > |---------------| > > | | | > | > > L2 __ Dir0 L2 __ Dir1 L2 __ Dir2 > L2 __ Dir3 > > |-------------------------|--------------------------|-------------------------------| > > | > > Memory > > > > I have run below application two times. First time, Thread1 and Thread2 > > are > > mapped on two cores 2, 3 (there is a shared L2 between them). In another > > run, I have bound these two threads to cores 2, 4 ( L2 is not shared > > between > > them). I have a problem with this application. When number of iteration > of > > for loop is increased, difference between execution time of run1 and run2 > > is > > increased, too. But, for this application, it's clear that coherency cost > > isn't increased when iteration of for loop is increased. Could you tell > me > > why this happen? > > > > for (i=0;i<5;i++) > > { > > THREAD1; // thread1 *read* a large array. Size of the array is > > smaller than L2 cache. > > THREAD2; // thread2 *read* the array that read by thread1 > > } > > > > What is meant by execution time? If your talking about wall clock time, > that probably is not an indicator of anything. If the difference in cycles > taken is not as expected, then try to look at the break up of where the > cycles are being spent. > > -- > Nilay > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > -- Hamid Reza Khaleghzadeh
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
