I don't believe that Rick's diff applied cleanly (some of the add/ remove line numbers were incorrect). Anyway, here is the diff I applied to the M5 tree.

Ali

Attachment: fix_file_serialization.diff.gz
Description: GNU Zip compressed data


On Dec 10, 2007, at 1:32 PM, Rick Strong wrote:

It has. I have included it in this response for your convenience.

-R

Vilas Sridharan wrote:
Hi Rick, et al,

Has a diff of this code been sent out (to properly track file I/O on checkpoint restore)? I think I am running into the same problem on some benchmarks. I can write the code if necessary, but if it's already been done... :).

Thanks,

   -Vilas

On Nov 15, 2007 6:54 PM, Ali Saidi <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED] >> wrote:

Yes please send us a diff when you're done and have tested the code.

   Thanks,
   Ali

   On Nov 15, 2007, at 6:51 PM, Rick Strong wrote:

   > Your collective trust of m5 guru-ness was right. I am now saving
> file offset, host flags, mode, name and able to bring it back from > checkpoint. Most of my changes were localized to the alloc_fd and > free_fd function in process.c by including a few more parameters. I > just have to correctly handle pipe, and I will post a diff if any
   > are interested. I went with the parallel array technique to just
> avoid too many changes, and that may reduce desirability. Let me
   > know.
   >
   > -R
   >
   > Steve Reinhardt wrote:
   >> Hmm, yea, I think making this work robustly in all cases is
   >> non-trivial, but you can probably at least fix your bug pretty
   >> easily.
   >>
   >> Basically the fd_map array in Process is the key; all the non-
   >> negative
   >> entries in this array are open file descriptors in the target
   >> process,
   >> and the value of the entry is the file descriptor in m5 that it
   >> corresponds to.
   >>
>> Rather than saving and restoring this array literally, like we are
   >> now
   >> (which actually makes no sense), you should serialize for every
>> non-negative fd the filename, mode (ro, rw) and offset, then reopen >> the file, seek to that offset, and store the new fd in the array
   >> entry
   >> in the unserialize method.
   >>
>> To have the filename and mode around you'll need to save that in a >> parallel array (or better expand fd_map to a struct) on the "open" >> calls. You'll have to special-case stdin/stdout etc if they don't
   >> get
   >> reassigned.
   >>
   >> To be really thorough you'll have to handle dup'd fds, etc.
   specially
   >> too, but that's probably optional in terms of getting past your
   bug.
   >>
   >> Hope that helps...
   >>
   >> Steve
   >>
   >> On Nov 14, 2007 9:38 PM, Rick Strong <[EMAIL PROTECTED]
   <mailto:[EMAIL PROTECTED]>> wrote:
   >>
>>> All right, I am in the process on understanding how it all works.
   >>> Where
   >>> is a good place to start. I am right now looking through sim/
   >>> process.*
   >>> and sim/syscall_emul* to work backwards to where all the
   >>> information is
>>> stored. If someone has insight on this system and could offer a
   >>> brief
   >>> description of how it works, it would be very helpful.
   >>>
   >>> -Richard
   >>>
   >>>
   >>> Nathan Binkert wrote:
   >>>
   >>>> When you fix this, pretty please submit a diff :)
   >>>>
   >>>>
   >>>>> I'm pretty sure I figured it out and I'm pretty sure it is
   >>>>> related to
>>>>> file I/O. When we restore from a checkpoint we don't reopen and
   >>>>> seek
   >>>>> to the appropriate place in any files we were reading
   from/writing
   >>>>> to. I bet what is happening is that the benchmark attempts
   to read
   >>>>> some input data (or maybe write some data) and the file
   >>>>> descriptor is
   >>>>> invalid when M5 passes the syscall through to the host OS.
   The OS
>>>>> returns an error code which alters the path of the benchmark and
   >>>>> it
>>>>> exits early. It shouldn't be too hard to fix but I don't have
   >>>>> time to
   >>>>> do it at the moment. You would need to keep track of all the
   open
>>>>> files paths and modes and add the paths/modes to the checkpoint
   >>>>> along
>>>>> with the current position (via tell()). Upon restoring from a
   >>>>> checkpoint you would reopen the files and seek() to the
   >>>>> appropriate
   >>>>> place in the file.
   >>>>>
   >>>>> Ali
   >>>>>
   >>>>> On Nov 14, 2007, at 10:02 PM, Rick Strong wrote:
   >>>>>
   >>>>>
   >>>>>> When I take a checkpoint in AtomicSimpleCPU (m5_2.0b4) at
   >>>>>> curTick=100015476500 (approx. 200,000,000 insts into the
   >>>>>> binary) in
   >>>>>> mcf, and resume execution in any CPU model, I get an exit
   syscall
>>>>>> (syscall trace included below) at cycle 100522711000 (approx
   >>>>>> 1014345
   >>>>>> insts into execution). What is strange is that if I run
   >>>>>> AtomicSimpleCPU through this point (from start), I have no
   >>>>>> problems.
   >>>>>> Any ideas on either the problem or how to debug?
   >>>>>>
>>>>>> It turns out that the same problem happens for checkpoints in
   >>>>>> twolf
>>>>>> about 200,000,000 insts into the binary. A resume has some file
   >>>>>> i/o
   >>>>>> and an untimely exit. Both problems seem related to file
   i/o and
>>>>>> then an exit call. Is it possible that some system call is not
   >>>>>> implemented and defaulting to exit. I included the syscall
   >>>>>> trace for
   >>>>>> twolf for any interested parties:
   >>>>>>
   >>>>>> I have resumed both checkpoints, immediately created new
   >>>>>> checkpoints, and they diff clean (except for order of the
   ptable
   >>>>>> entries).
   >>>>>>
   >>>>>> I am right now working on getting an EXEC trace for mcf,
   one from
   >>>>>> checkpoint and one executing from the beginning to find any
   >>>>>> differences.
   >>>>>>
   >>>>>>
   >>>>>> TWOLF syscall trace
   >>>>>> "
   >>>>>> 100285445500: system.cpu: pc 4832275812 syscall read called
   >>>>>> w/arguments 4,5368834056,8192,1
   >>>>>> 100285445500: system.cpu: syscall read returns
   >>>>>> 18446744073709551615
   >>>>>> 100286500500: system.cpu: pc 4832275812 syscall read called
   >>>>>> w/arguments 4,5368834056,8192,5
   >>>>>> 100286500500: system.cpu: syscall read returns
   >>>>>> 18446744073709551615
>>>>>> 100287514000: system.cpu: pc 4832260836 syscall close called
   >>>>>> w/arguments 0,4831383888,1,1048576
   >>>>>> 100287514000: system.cpu: syscall close returns 0
>>>>>> 100287679500: system.cpu: pc 4832260628 syscall write called
   >>>>>> w/arguments 1,5368796680,172,1048576
   >>>>>>
>>>>>> TimberWolfSC version: v4.3a date:Mon Jan 25 18:50:36 EST 1988
   >>>>>> Standard Cell Placement and Global Routing Program
   >>>>>> Authors: Carl Sechen, Bill Swartz
   >>>>>>     Yale University
   >>>>>> 100287679500: system.cpu: syscall write returns 172
>>>>>> 100287726500: system.cpu: pc 4832260836 syscall close called
   >>>>>> w/arguments 1,4831383888,172,0
   >>>>>>
   >>>>>> " MCF SYSCALL TRACE "
   >>>>>>
   >>>>>>>> 100519102000: system.cpu: syscall read called w/arguments
   >>>>>>>> 3,5368799240,8192,7
   >>>>>>>> 100519102000: system.cpu: syscall read returns
   >>>>>>>> 18446744073709551615
>>>>>>>> 100521401500: system.cpu: syscall obreak called w/ arguments
   >>>>>>>> 5374902272,0,0,1048576
   >>>>>>>> 100521401500: global: Break Point changed to: 0X1405E8000
>>>>>>>> 100521401500: system.cpu: syscall obreak returns 5374902272 >>>>>>>> 100521680500: system.cpu: syscall close called w/ arguments
   >>>>>>>> 0,4831387472,1,1048576
   >>>>>>>> 100521680500: system.cpu: syscall close returns 0
>>>>>>>> 100521846000: system.cpu: syscall write called w/ arguments
   >>>>>>>> 1,5368778616,119,1048576
   >>>>>>>> 100521846000: system.cpu: syscall write returns 119
>>>>>>>> 100521893000: system.cpu: syscall close called w/ arguments
   >>>>>>>> 1,4831387472,119,0
   >>>>>>>> 100521893000: system.cpu: syscall close returns 0
>>>>>>>> 100522014000: system.cpu: syscall close called w/ arguments
   >>>>>>>> 2,4831387472,0,1048576
   >>>>>>>> 100522014000: system.cpu: syscall close returns
   >>>>>>>> 18446744073709551615
>>>>>>>> 100522187500: system.cpu: syscall close called w/ arguments
   >>>>>>>> 3,4831387472,1,1048576
   >>>>>>>> 100522187500: system.cpu: syscall close returns 0
>>>>>>>> 100522357000: system.cpu: syscall obreak called w/ arguments
   >>>>>>>> 5368815616,0,0,1048576
   >>>>>>>> 100522357000: global: Break Point changed to: 0X14001A000
>>>>>>>> 100522357000: system.cpu: syscall obreak returns 5368815616
   >>>>>>>> 100522623500: system.cpu: syscall sigprocmask called w/
   >>>>>>>> arguments
   >>>>>>>> 1,18446744073709547831,0,0
   >>>>>>>> warn: ignoring syscall sigprocmask(1,
   >>>>>>>> 18446744073709547831, ...)
   >>>>>>>> 100522623500: system.cpu: syscall sigprocmask returns 0
   >>>>>>>> 100522711000: system.cpu: syscall exit called w/arguments
   >>>>>>>> 18446744073709551615,5368739848,2,0
   >>>>>>>>
   >>>>>>>> _______________________________________________
   >>>>>>>> m5-users mailing list
   >>>>>>>> m5-users@m5sim.org <mailto:m5-users@m5sim.org>
   >>>>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
   >>>>>>>>
   >>>>>>>>
   >>>>>> _______________________________________________
   >>>>>> m5-users mailing list
   >>>>>> m5-users@m5sim.org <mailto:m5-users@m5sim.org>
   >>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
   >>>>>>
   >>>>>>
   >>>>> _______________________________________________
   >>>>> m5-users mailing list
   >>>>> m5-users@m5sim.org <mailto:m5-users@m5sim.org>
   >>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
   >>>>>
   >>> _______________________________________________
   >>> m5-users mailing list
   >>> m5-users@m5sim.org <mailto:m5-users@m5sim.org>
   >>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
   >>>
   >>>
   >>
   >>
   >
   > _______________________________________________
   > m5-users mailing list
   > m5-users@m5sim.org <mailto:m5-users@m5sim.org>
   > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
   >

   _______________________________________________
   m5-users mailing list
   m5-users@m5sim.org <mailto:m5-users@m5sim.org>
   http://m5sim.org/cgi-bin/mailman/listinfo/m5-users



< syscall_emulatation_fixes .diff.zip>_______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

_______________________________________________
m5-users mailing list
m5-users@m5sim.org
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Reply via email to