Re: [gem5-users] [m5-users] Creating ruby checkpoints

Timothy M Jones Tue, 17 May 2011 04:56:42 -0700

Thanks very much for the pointers, Steve.

I'm feeling very foolish now. I've worked out the problem and it was mymistake. I was using a different kernel to the standard one and I'dupdated the relevant line in FSConfig.py in makeLinuxAlphaSystem() butnot in makeLinuxAlphaRubySystem(). Setting them to the same thingsolves the problem :-)

I'm not sure why this was causing problems, the symbol table is clearedbefore the checkpoint is loaded so that shouldn't have been an issue.There must have been something else that was conflicting. However,since it is working now, I'm not going to dig any further!


Tim

Steve Reinhardt wrote:

Hi Tim,

Brad's on vacation, so I'll try and answer...

If you're completely sure that it's the same PC value in both traces andthe error is just in the symbol table, then it makes sense that you'd berunning into trouble with trying to skip functions that aren't reallythere. AFAIK, calibrate_delay() is only called during boot, so it ispretty suspicious that you'd be executing it after your checkpoint.

I have no idea what could be going wrong though. As just came up onanother thread, the symbols are stored in the checkpoint file and notre-read from the kernel image; one thing to check is whether thecalibrate_delay PC in the m5.cpt file matches what's stored in thekernel image.


Steve

On Fri, May 13, 2011 at 5:49 AM, Timothy M Jones<[email protected] <mailto:[email protected]>> wrote:


    Hi Brad,

    Thanks for spending the time on this.  I have been digging a bit to
    try to find out the problem too.  I found that restoring the
    checkpoint seems to be assigning the wrong symbols to certain
    addresses.  If I trace the output from both runs, I get this.  First
    the correct version:

    2259304210000: system.cpu1: Decode: Decoded lda instruction: 0x211f3fff
    2259304210000: global: Reading int reg 31 (31) as 0.
    2259304210000: global: Setting int reg 8 (8) to 0x3fff.
    2259304210000: system.cpu1 + T0 : @alpha_switch_to+8    : lda
    r8,16383(r31)   : IntAlu :  D=0x0000000000003fff

    Now the version using ruby_fs.py:

    2259492768000: system.cpu1: Decode: Decoded lda instruction: 0x211f3fff
    2259492768000: global: Reading int reg 31 (31) as 0.
    2259492768000: global: Setting int reg 8 (8) to 0x3fff.
    2259492768000: system.cpu1 + T0 : @calibrate_delay+88    : lda
    r8,16383(r31)   : IntAlu :  D=0x0000000000003fff

    You can see that in the first version the address is from
    alpha_switch_to+8, but in the second it thinks it is
    calibrate_delay+88.  Later on the ruby version misses out a load of
    instructions with this line of output:

    2259493454000: global: PC based event serviced at
    0xfffffc0000311148: calibrate_delay
    2259493454000: global: Reading int reg 26 (26) as 0xfffffc00006b83bc.
    2259493454000: calibrate_delay: skipping calibrate_delay: pc =
    (0xfffffc0000311148=>0xfffffc000031114c), newpc =
    (0xfffffc00006b83bc=>0xfffffc00006b83c0)

    I'm not sure what's going on, but looking in the
    src/arch/alpha/linux directory, calibrate_delay is a function that
    should be skipped.  So, my guess is that the simulator thinks it is
    in this function and tries to skip it but in actual fact it isn't in
    this function at all and so execution goes haywire.  Would that make
    sense?  Where do I look to fix this problem of they symbols being wrong?

    Cheers

    Tim

    Beckmann, Brad wrote:

        Hi Tim,

        I spent a little time trying to reproduce your error, but so far
        I have not.  I'm using a slightly different Linux kernel than
        the default, but I'm not ready to declare that is the reason for
        the error.  Unfortunately I'm going to be out-of-town for the
        next week and a half, but I'll try look further at your problem
        when I return.  One minor question that I've been meaning to ask
        you is roughly how long did it take for you to encounter this error?

        Brad


            -----Original Message-----
            From: [email protected]
            <mailto:[email protected]> [mailto:m5-users-
            <mailto:m5-users->
            [email protected] <mailto:[email protected]>] On Behalf Of
            Timothy M Jones
            Sent: Friday, May 06, 2011 2:00 AM
            To: M5 users mailing list
            Subject: Re: [m5-users] Creating ruby checkpoints

            Hi Brad,

            Thanks for the reply and the explanation about Ruby.

            I've attached the runscript.rcS file that I was using.  I'm
            using the kernel from
            the M5 website and disk image from UTexas
            (http://www.cs.utexas.edu/~parsec_m5/linux-parsec-2-1-m5-with-test-
            
<http://www.cs.utexas.edu/%7Eparsec_m5/linux-parsec-2-1-m5-with-test->
            inputs.img.bz2)


            I looked at the config file and checkpoint files.  The CPUs
            do have the same
            names and I don't get any unserialization warnings at all
            when running from
            the checkpoint.  I did notice that the CPU types were
            different (since I was
            creating checkpoints with AtomicSimpleCPU) but adding the
            '-t' switch to the
            creation command didn't make the error go away.

            I also tried using the ruby_fs.py script to create
            checkpoints too by adding
            support for '--script' within it using the attached patch.
             This created a
            checkpoint without problems.  Loading from it caused a
            segmentation fault in
            the simulated program though.  These were the commands I
            used for that:

            ./build/ALPHA_FS/m5.fast -d ../outputs --remote-gdb-port 0
            ./configs/example/ruby_fs.py -n 2
            --script=../scripts/runscript.rcS
            --max-checkpoints=1

            ./build/ALPHA_FS/m5.fast -d ../outputs --remote-gdb-port 0
            ./configs/example/ruby_fs.py -n 2 -r 0

            Thanks again
            Tim

            Beckmann, Brad wrote:

                Hi Tim,

                Before I try to help you with your specific problem, I
                want to point out that

            is Ruby's current support for checkpointing is a little
            confusing and that is one
            area that we are actively improving.  In particular Ruby
            currently uses
            physmem as a functional memory image and thus messages
            within Ruby only
            impact the timing of memory accesses.  Thus, loading a
            checkpoint with Ruby
            is nothing more than loading Ruby's backing image of physmem
            with the
            checkpointed memory image. Also there is no current support
            for cache
            warmup. We are in the process of changing that, but that
            code is not yet
            ready.

                Having said that, I suspect that your problem is
                something different.  In

            general, your sequence of commands should work and I can't
            reproduce
            your specific error since I don't have your particular rcS
            script.  I'd be curious
            to know if you see any unserialzation warnings complaining
            that certain
            simobjects aren't in the loaded checkpoint.  In particular,
            do the cpus have
            the exact same name between the config.ini file with ruby
            and the m5.cpt
            file in your checkpoint?

                Sorry I can't be more help, but if you send me your rcS
                script, I'd be happy

            to investigate further.

                Brad


                    -----Original Message-----
                    From: [email protected]
                    <mailto:[email protected]>
                    [mailto:m5-users- <mailto:m5-users->

            [email protected] <mailto:[email protected]>]

                    On Behalf Of Timothy M Jones
                    Sent: Wednesday, May 04, 2011 5:49 AM
                    To: M5 users mailing list
                    Subject: [m5-users] Creating ruby checkpoints

                    Hello,

                    I'm trying to create checkpoints for use with ruby
                    using ALPHA_FS.
                    It takes ages to boot linux with ruby enabled, and
                    since I want
                    several checkpoints for different numbers of cores,
                    I was hoping I'd
                    be able to create checkpoints without ruby, then run
                    from the

            checkpoints with.

                    This doesn't appear to work.  If I create a
                    checkpoint with this command:

                    ./build/ALPHA_FS/m5.fast -d ../outputs
                    --remote-gdb-port 0
                    ./configs/example/fs.py -n 2 --max-checkpoints=1 --
                    script=../scripts/runscript.rcS

                    Then I can run it fine with this command:

                    ./build/ALPHA_FS/m5.fast -d ../outputs
                    --remote-gdb-port 0
                    ./configs/example/fs.py -n 2 -r 0

                    But switching to ruby causes errors:

                    /build/ALPHA_FS/m5.fast -d ../outputs
                    --remote-gdb-port 0
                    ./configs/example/ruby_fs.py -n 2 -r 0

                    In the system.terminal file I get this error output:

                    script(759): unhandled unaligned exception pc =
                    [<fffffc00006b83c0>]
                    ra = [<fffffc00006b83bc>]  ps = 0007
                    r0 = 000000001f6c8000  r1 = fffffc00003111a0  r2 =
                    fffffc0000018000
                    r3 = 000000000000002b  r4 = 0000000000000720  r5 =
                    fffffc000085ecb8
                    r6 = 0000000000000059  r7 = 0000000000000040  r8 =
                    0000000000003fff
                    r9 = fffffc001f5c5580  r10= fffffc001f3eec00  r11=
                    fffffc0000d09b80
                    r12=
                    fffffc001f6b0740  r13= 0000000000000001  r14=
                    0000000000000008 r15=
                    fffffc001f657e48 r16= 000000001f654000  r17=
                    fffffc001f3eec00  r18=
                    fffffc001f6b0740 r19= 0000000000000001  r20=
                    0000000000000000  r21=
                    fffffc0000860640 r22= 0000000000000000  r23=
                    000000200618a0cf  r24=
                    4000000000000000 r25= 00000000000003ff  r27=
                    fffffc0000311190  r28=
                    fffffc001f5c5580

                    This seems to happen no matter which protocol I
                    compile into the
                    binary, although this was with MESI_CMP_directory.
                     Does anyone have
                    any suggestions as to how I can go about creating
                    some checkpoints to
                    use like this or what I'm doing wrong?

                    Thanks
                    Tim

                    --
                    Timothy M. Jones
                    http://www.cl.cam.ac.uk/~tmj32
                    <http://www.cl.cam.ac.uk/%7Etmj32>
                    _______________________________________________
                    m5-users mailing list
                    [email protected] <mailto:[email protected]>
                    http://m5sim.org/cgi-bin/mailman/listinfo/m5-users


                _______________________________________________
                m5-users mailing list
                [email protected] <mailto:[email protected]>
                http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

            --
            Timothy M. Jones
            http://www.cl.cam.ac.uk/~tmj32
            <http://www.cl.cam.ac.uk/%7Etmj32>


        _______________________________________________
        m5-users mailing list
        [email protected] <mailto:[email protected]>
        http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

--Timothy M. Jones

    http://www.cl.cam.ac.uk/~tmj32 <http://www.cl.cam.ac.uk/%7Etmj32>
    _______________________________________________
    m5-users mailing list
    [email protected] <mailto:[email protected]>
    http://m5sim.org/cgi-bin/mailman/listinfo/m5-users



------------------------------------------------------------------------

_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users


--
Timothy M. Jones
http://www.cl.cam.ac.uk/~tmj32
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] [m5-users] Creating ruby checkpoints

Reply via email to