Hi

Thanks for your answer Nilay. I have run the application two times. First
time, number of iteration is 100 times and in second time, for loop is run
300 times. These are number of ruby-cycles that taken for running
application:

Iteration=100 and Cores 2,3 -> Ruby-cycles= *116030060
*Iteration=100 and Cores 2,4 -> Ruby-cycles= *119546137*

Iteration=300 and Cores 2,3 -> Ruby-cycles= *346324353*
Iteration=300 and Cores 2,4 -> Ruby-cycles= *357471770*

It's clear that number of cycles when app runs on shared L2 and distinct one
must be different. For first run, cycles difference is 3516077 (=*119546137
- * *116030060) *and for second one is 11147417. Could you tell me why
difference of cycles is increased when number of iteration is increased?

By the way, I have another question about MOESI-CMP-directory. If possible
please answer it. Consider described architecture in my last post. Suppose
only DL1 of Core2 has data X. At this point, Core3 (Cores 2 and 3 belong to
same ship and have a shared L2) requests X. There are two approaches for
responding to this request.
1- Directory set state of X to SHARE and then read data from L2.
2- Without any access to directory, data read from L2 and state of X doesn't
change.

I think 2 is correct. Could you help me?

Thanks



On Thu, Jul 14, 2011 at 11:38 AM, Nilay <[email protected]> wrote:

> On Thu, July 14, 2011 1:38 am, Hamid Reza Khaleghzadeh wrote:
> > Hello all,
> >
> > I have simulated a 8 cores CMP where consists of 4 chips and each chip
> has
> > 2
> > cores and one shared L2. MOESI-CMP-directory is coherency protocol.
> >
> > Core0    Core1     Core2    Core3     Core4      Core5       Core6
> Core7
> >    |------------|             |------------|           |---------------|
>              |---------------|
> >           |                         |                         |
>                     |
> >          L2 __ Dir0           L2 __ Dir1            L2 __ Dir2
>    L2 __ Dir3
>
> |-------------------------|--------------------------|-------------------------------|
> >                                                    |
> >                                                Memory
> >
> > I have run below application two times. First time, Thread1 and Thread2
> > are
> > mapped on two cores 2, 3 (there is a shared L2 between them). In another
> > run, I have bound these two threads to cores 2, 4 ( L2 is not shared
> > between
> > them). I have a problem with this application. When number of iteration
> of
> > for loop is increased, difference between execution time of run1 and run2
> > is
> > increased, too. But, for this application, it's clear that coherency cost
> > isn't increased when iteration of for loop is increased. Could you tell
> me
> > why this happen?
> >
> > for (i=0;i<5;i++)
> > {
> >       THREAD1;     // thread1 *read* a large array. Size of the array is
> > smaller than L2 cache.
> >       THREAD2;     // thread2 *read* the array that read by thread1
> > }
> >
>
> What is meant by execution time? If your talking about wall clock time,
> that probably is not an indicator of anything. If the difference in cycles
> taken is not as expected, then try to look at the break up of where the
> cycles are being spent.
>
> --
> Nilay
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>



-- 
Hamid Reza Khaleghzadeh
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to