Thanks Steve! I tried using the run.py in configs/splash2 directory but I
keep getting segmentation fault. I am running FFT benchmark (I downloaded
the binaries from the link provided at M5 wiki's splash benchmark's page). I
made following changes in my python script (run.py):
................................
class FFT(LiveProcess):
cwd = options.rootdir + '/kernels/fft'
executable = options.rootdir + '/kernels/fft/FFT'
cmd = ['FFT', '-p', str(*options.numcpus*options.smt*), '-m18']
..............................
..............................
cpus = [DerivO3CPU(cpu_id=i,
clock=options.frequency,
*numThreads=options.smt,
smtNumFetchingThreads=options.smt,
smtFetchPolicy='RoundRobin'*)
for i in xrange(options.numcpus)]
So, if I understood what Steve said in last mail then I think I am running
it correctly. I tried debugging but it didn't help. I am using a latest
version of M5 which I downloaded from the repositories about a week ago.
Also when I used smtNumFetchingThreads=1 and numThreads=1 then it works!!
This is what I got after running gdb:
: /af20/ar8eb/Desktop/CS8501/m5-ad784e759a74 ; gdb build/ALPHA_SE/m5.debug
GNU gdb 6.8-debian
................................................................
This GDB was configured as "x86_64-linux-gnu"...
(gdb) run configs/splash2/run_test.py -n 1 --smt=2
--rootdir=../m5-stable-94c016415053/v1-splash-alpha/splash2/codes/ -d -b FFT
....................................................................
command line:
/net/af20/ar8eb/Desktop/CS8501/m5-ad784e759a74/build/ALPHA_SE/m5.debug
configs/splash2/run_test.py -n 1 --smt=2
--rootdir=../m5-stable-94c016415053/v1-splash-alpha/splash2/codes/ -d -b FFT
Global frequency set at 1000000000000 ticks per second
0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000
0: system.remote_gdb.listener: listening for remote gdb #1 on port 7001
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f6d50dc76f0 (LWP 18930)]
0x00000000004522a0 in std::vector<int, std::allocator<int> >::push_back
(this=0x40, _...@0x7fff58dea6b4) at
/usr/include/c++/4.2/bits/stl_vector.h:599
599 if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
Thanks,
Abhishek
On Fri, Apr 16, 2010 at 6:20 PM, Steve Reinhardt <[email protected]> wrote:
> If the benchmark is explicitly multithreaded and compiled against the
> m5threads library, then you can run it on an SMT O3 CPU and it should
> work.
>
> Steve
>
> On Thu, Apr 15, 2010 at 9:43 PM, abhishek rawat <[email protected]> wrote:
> > Thanks for the reply. I have a follow up question. So, lets say I am only
> > using one O3 core and If I want to run 2-threaded SMT that means I will
> have
> > to provide as input for instance 2 FFT benchmarks. And for 4-threaded
> SMT
> > similarly I will have to provide as input 4 FFT benchmark. In this case
> how
> > can I do the comparison as the benchmarks have changed, in size. So my
> > question was can I just provide lets say 1 FFT benchmark as input and
> then
> > try to run it with 2 threaded-SMT and then with 4 threaded-SMT
> > configuration. And I think you are saying it is not possible.
> >
> > Thanks,
> > --Abhishek
> >
> > On Fri, Apr 16, 2010 at 9:56 AM, Korey Sewell <[email protected]> wrote:
> >>
> >>> I have this doubt related to O3 SMT model in ALPHA_SE mode. So I
> >>> understand that for running multi-programmed benchmarks one can pass a
> >>> vector of benchmarks as the parameters and set the cpu.numThreads to
> the
> >>> size of vector. But is it possible for the O3 CPU to extract the TLP in
> a
> >>> (single) given multithreaded benchmark. So what I am trying to say is
> lets
> >>> say I just want to run a single multi-threaded benchmark and just by
> >>> specifying the number of SMT threads (say 4 SMT threads) can the O3 CPU
> >>> extract the threads from the benchmark and run in a SMT fashion? I
> would
> >>> appreciate any directions in this regard.
> >>
> >> As far as I know, O3CPU will only run the threads that you give it, so
> >> your actual benchmark needs to specify how many threads are going to be
> run
> >> in parallel in order for you to see any benefit in SE_SMT. The problem
> of
> >> automatically extracting TLP is in fact it's own subset of the research
> >> community, so if you wish, I'm sure you can find plenty of literature on
> >> that topic.
> >>
> >>>
> >>> Something tells me that it is not possible. And if that is the case
> then
> >>> how do I compare different SMT configurations for a given benchmark ?
> >>
> >> That's a bit of a abstract question and on the M5-side, I'm not sure
> there
> >> is a definite *right* answer here just user-dependent ones. For some
> people,
> >> comparing a 2-threaded SMT core v. a 4-threaded SMT core is fine. For
> >> others, comparing a 2T-SMT core vs. 2 CMP cores is their objective.
> >>
> >> Considering a configuration to be the benchmark running and the hardware
> >> that it is to be run on, It's up to the user to specify different M5
> >> configurations that are interesting to them, run M5 for different
> >> configurations, figure out what configurations are comparable to each
> other,
> >> and then decide what to do next with their results.
> >>
> >>
> >>
> >> --
> >> - Korey
> >>
> >> _______________________________________________
> >> m5-users mailing list
> >> [email protected]
> >> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
> >
> >
> >
> > --
> > -Abhi
> >
> > _______________________________________________
> > m5-users mailing list
> > [email protected]
> > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
> >
> _______________________________________________
> m5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>
--
-Abhi
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users