Hi Brad,

Thanks for spending the time on this. I have been digging a bit to try to find out the problem too. I found that restoring the checkpoint seems to be assigning the wrong symbols to certain addresses. If I trace the output from both runs, I get this. First the correct version:

2259304210000: system.cpu1: Decode: Decoded lda instruction: 0x211f3fff
2259304210000: global: Reading int reg 31 (31) as 0.
2259304210000: global: Setting int reg 8 (8) to 0x3fff.
2259304210000: system.cpu1 + T0 : @alpha_switch_to+8 : lda r8,16383(r31) : IntAlu : D=0x0000000000003fff

Now the version using ruby_fs.py:

2259492768000: system.cpu1: Decode: Decoded lda instruction: 0x211f3fff
2259492768000: global: Reading int reg 31 (31) as 0.
2259492768000: global: Setting int reg 8 (8) to 0x3fff.
2259492768000: system.cpu1 + T0 : @calibrate_delay+88 : lda r8,16383(r31) : IntAlu : D=0x0000000000003fff

You can see that in the first version the address is from alpha_switch_to+8, but in the second it thinks it is calibrate_delay+88. Later on the ruby version misses out a load of instructions with this line of output:

2259493454000: global: PC based event serviced at 0xfffffc0000311148: calibrate_delay
2259493454000: global: Reading int reg 26 (26) as 0xfffffc00006b83bc.
2259493454000: calibrate_delay: skipping calibrate_delay: pc = (0xfffffc0000311148=>0xfffffc000031114c), newpc = (0xfffffc00006b83bc=>0xfffffc00006b83c0)

I'm not sure what's going on, but looking in the src/arch/alpha/linux directory, calibrate_delay is a function that should be skipped. So, my guess is that the simulator thinks it is in this function and tries to skip it but in actual fact it isn't in this function at all and so execution goes haywire. Would that make sense? Where do I look to fix this problem of they symbols being wrong?

Cheers
Tim

Beckmann, Brad wrote:
Hi Tim,

I spent a little time trying to reproduce your error, but so far I have not.  
I'm using a slightly different Linux kernel than the default, but I'm not ready 
to declare that is the reason for the error.  Unfortunately I'm going to be 
out-of-town for the next week and a half, but I'll try look further at your 
problem when I return.  One minor question that I've been meaning to ask you is 
roughly how long did it take for you to encounter this error?

Brad


-----Original Message-----
From: [email protected] [mailto:m5-users-
[email protected]] On Behalf Of Timothy M Jones
Sent: Friday, May 06, 2011 2:00 AM
To: M5 users mailing list
Subject: Re: [m5-users] Creating ruby checkpoints

Hi Brad,

Thanks for the reply and the explanation about Ruby.

I've attached the runscript.rcS file that I was using.  I'm using the kernel 
from
the M5 website and disk image from UTexas
(http://www.cs.utexas.edu/~parsec_m5/linux-parsec-2-1-m5-with-test-
inputs.img.bz2)


I looked at the config file and checkpoint files.  The CPUs do have the same
names and I don't get any unserialization warnings at all when running from
the checkpoint.  I did notice that the CPU types were different (since I was
creating checkpoints with AtomicSimpleCPU) but adding the '-t' switch to the
creation command didn't make the error go away.

I also tried using the ruby_fs.py script to create checkpoints too by adding
support for '--script' within it using the attached patch.  This created a
checkpoint without problems.  Loading from it caused a segmentation fault in
the simulated program though.  These were the commands I used for that:

./build/ALPHA_FS/m5.fast -d ../outputs --remote-gdb-port 0
./configs/example/ruby_fs.py -n 2 --script=../scripts/runscript.rcS
--max-checkpoints=1

./build/ALPHA_FS/m5.fast -d ../outputs --remote-gdb-port 0
./configs/example/ruby_fs.py -n 2 -r 0

Thanks again
Tim

Beckmann, Brad wrote:
Hi Tim,

Before I try to help you with your specific problem, I want to point out that
is Ruby's current support for checkpointing is a little confusing and that is 
one
area that we are actively improving.  In particular Ruby currently uses
physmem as a functional memory image and thus messages within Ruby only
impact the timing of memory accesses.  Thus, loading a checkpoint with Ruby
is nothing more than loading Ruby's backing image of physmem with the
checkpointed memory image. Also there is no current support for cache
warmup. We are in the process of changing that, but that code is not yet
ready.
Having said that, I suspect that your problem is something different.  In
general, your sequence of commands should work and I can't reproduce
your specific error since I don't have your particular rcS script.  I'd be 
curious
to know if you see any unserialzation warnings complaining that certain
simobjects aren't in the loaded checkpoint.  In particular, do the cpus have
the exact same name between the config.ini file with ruby and the m5.cpt
file in your checkpoint?
Sorry I can't be more help, but if you send me your rcS script, I'd be happy
to investigate further.
Brad


-----Original Message-----
From: [email protected] [mailto:m5-users-
[email protected]]
On Behalf Of Timothy M Jones
Sent: Wednesday, May 04, 2011 5:49 AM
To: M5 users mailing list
Subject: [m5-users] Creating ruby checkpoints

Hello,

I'm trying to create checkpoints for use with ruby using ALPHA_FS.
It takes ages to boot linux with ruby enabled, and since I want
several checkpoints for different numbers of cores, I was hoping I'd
be able to create checkpoints without ruby, then run from the
checkpoints with.
This doesn't appear to work.  If I create a checkpoint with this command:

./build/ALPHA_FS/m5.fast -d ../outputs --remote-gdb-port 0
./configs/example/fs.py -n 2 --max-checkpoints=1 --
script=../scripts/runscript.rcS

Then I can run it fine with this command:

./build/ALPHA_FS/m5.fast -d ../outputs --remote-gdb-port 0
./configs/example/fs.py -n 2 -r 0

But switching to ruby causes errors:

/build/ALPHA_FS/m5.fast -d ../outputs --remote-gdb-port 0
./configs/example/ruby_fs.py -n 2 -r 0

In the system.terminal file I get this error output:

script(759): unhandled unaligned exception pc = [<fffffc00006b83c0>]
ra = [<fffffc00006b83bc>]  ps = 0007
r0 = 000000001f6c8000  r1 = fffffc00003111a0  r2 = fffffc0000018000
r3 = 000000000000002b  r4 = 0000000000000720  r5 = fffffc000085ecb8
r6 = 0000000000000059  r7 = 0000000000000040  r8 = 0000000000003fff
r9 = fffffc001f5c5580  r10= fffffc001f3eec00  r11= fffffc0000d09b80
r12=
fffffc001f6b0740  r13= 0000000000000001  r14= 0000000000000008 r15=
fffffc001f657e48 r16= 000000001f654000  r17= fffffc001f3eec00  r18=
fffffc001f6b0740 r19= 0000000000000001  r20= 0000000000000000  r21=
fffffc0000860640 r22= 0000000000000000  r23= 000000200618a0cf  r24=
4000000000000000 r25= 00000000000003ff  r27= fffffc0000311190  r28=
fffffc001f5c5580

This seems to happen no matter which protocol I compile into the
binary, although this was with MESI_CMP_directory.  Does anyone have
any suggestions as to how I can go about creating some checkpoints to
use like this or what I'm doing wrong?

Thanks
Tim

--
Timothy M. Jones
http://www.cl.cam.ac.uk/~tmj32
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
--
Timothy M. Jones
http://www.cl.cam.ac.uk/~tmj32

_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

--
Timothy M. Jones
http://www.cl.cam.ac.uk/~tmj32
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Reply via email to