Looking at the system call trace, I'm virtually certain this is the problem, and that the call that caused it was a read. You'll notice a read called on descriptor 3 of 8192 bytes, probably the size of the standard libraries IO buffering. That returns a negative value, and then you can see the process close up. It clears out it's heap (I think) and closes all the open file descriptors, 0, 1, 2 and 3, and then calls exits with a negative exit code.The read at 100519102000 seems to be what brings everything down.
Gabe Nathan Binkert wrote: > When you fix this, pretty please submit a diff :) > >> I'm pretty sure I figured it out and I'm pretty sure it is related to >> file I/O. When we restore from a checkpoint we don't reopen and seek >> to the appropriate place in any files we were reading from/writing >> to. I bet what is happening is that the benchmark attempts to read >> some input data (or maybe write some data) and the file descriptor is >> invalid when M5 passes the syscall through to the host OS. The OS >> returns an error code which alters the path of the benchmark and it >> exits early. It shouldn't be too hard to fix but I don't have time to >> do it at the moment. You would need to keep track of all the open >> files paths and modes and add the paths/modes to the checkpoint along >> with the current position (via tell()). Upon restoring from a >> checkpoint you would reopen the files and seek() to the appropriate >> place in the file. >> >> Ali >> >> On Nov 14, 2007, at 10:02 PM, Rick Strong wrote: >> >>> When I take a checkpoint in AtomicSimpleCPU (m5_2.0b4) at >>> curTick=100015476500 (approx. 200,000,000 insts into the binary) in >>> mcf, and resume execution in any CPU model, I get an exit syscall >>> (syscall trace included below) at cycle 100522711000 (approx 1014345 >>> insts into execution). What is strange is that if I run >>> AtomicSimpleCPU through this point (from start), I have no problems. >>> Any ideas on either the problem or how to debug? >>> >>> It turns out that the same problem happens for checkpoints in twolf >>> about 200,000,000 insts into the binary. A resume has some file i/o >>> and an untimely exit. Both problems seem related to file i/o and >>> then an exit call. Is it possible that some system call is not >>> implemented and defaulting to exit. I included the syscall trace for >>> twolf for any interested parties: >>> >>> I have resumed both checkpoints, immediately created new >>> checkpoints, and they diff clean (except for order of the ptable >>> entries). >>> >>> I am right now working on getting an EXEC trace for mcf, one from >>> checkpoint and one executing from the beginning to find any >>> differences. >>> >>> >>> TWOLF syscall trace >>> " >>> 100285445500: system.cpu: pc 4832275812 syscall read called >>> w/arguments 4,5368834056,8192,1 >>> 100285445500: system.cpu: syscall read returns 18446744073709551615 >>> 100286500500: system.cpu: pc 4832275812 syscall read called >>> w/arguments 4,5368834056,8192,5 >>> 100286500500: system.cpu: syscall read returns 18446744073709551615 >>> 100287514000: system.cpu: pc 4832260836 syscall close called >>> w/arguments 0,4831383888,1,1048576 >>> 100287514000: system.cpu: syscall close returns 0 >>> 100287679500: system.cpu: pc 4832260628 syscall write called >>> w/arguments 1,5368796680,172,1048576 >>> >>> TimberWolfSC version:v4.3a date:Mon Jan 25 18:50:36 EST 1988 >>> Standard Cell Placement and Global Routing Program >>> Authors: Carl Sechen, Bill Swartz >>> Yale University >>> 100287679500: system.cpu: syscall write returns 172 >>> 100287726500: system.cpu: pc 4832260836 syscall close called >>> w/arguments 1,4831383888,172,0 >>> >>> " MCF SYSCALL TRACE " >>>>> >>>>> 100519102000: system.cpu: syscall read called w/arguments >>>>> 3,5368799240,8192,7 >>>>> 100519102000: system.cpu: syscall read returns 18446744073709551615 >>>>> 100521401500: system.cpu: syscall obreak called w/arguments >>>>> 5374902272,0,0,1048576 >>>>> 100521401500: global: Break Point changed to: 0X1405E8000 >>>>> 100521401500: system.cpu: syscall obreak returns 5374902272 >>>>> 100521680500: system.cpu: syscall close called w/arguments >>>>> 0,4831387472,1,1048576 >>>>> 100521680500: system.cpu: syscall close returns 0 >>>>> 100521846000: system.cpu: syscall write called w/arguments >>>>> 1,5368778616,119,1048576 >>>>> 100521846000: system.cpu: syscall write returns 119 >>>>> 100521893000: system.cpu: syscall close called w/arguments >>>>> 1,4831387472,119,0 >>>>> 100521893000: system.cpu: syscall close returns 0 >>>>> 100522014000: system.cpu: syscall close called w/arguments >>>>> 2,4831387472,0,1048576 >>>>> 100522014000: system.cpu: syscall close returns 18446744073709551615 >>>>> 100522187500: system.cpu: syscall close called w/arguments >>>>> 3,4831387472,1,1048576 >>>>> 100522187500: system.cpu: syscall close returns 0 >>>>> 100522357000: system.cpu: syscall obreak called w/arguments >>>>> 5368815616,0,0,1048576 >>>>> 100522357000: global: Break Point changed to: 0X14001A000 >>>>> 100522357000: system.cpu: syscall obreak returns 5368815616 >>>>> 100522623500: system.cpu: syscall sigprocmask called w/arguments >>>>> 1,18446744073709547831,0,0 >>>>> warn: ignoring syscall sigprocmask(1, 18446744073709547831, ...) >>>>> 100522623500: system.cpu: syscall sigprocmask returns 0 >>>>> 100522711000: system.cpu: syscall exit called w/arguments >>>>> 18446744073709551615,5368739848,2,0 >>>>> >>>>> _______________________________________________ >>>>> m5-users mailing list >>>>> m5-users@m5sim.org >>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >>>>> >>>> >>> >>> >>> _______________________________________________ >>> m5-users mailing list >>> m5-users@m5sim.org >>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >>> >> >> _______________________________________________ >> m5-users mailing list >> m5-users@m5sim.org >> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > _______________________________________________ > m5-users mailing list > m5-users@m5sim.org > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users _______________________________________________ m5-users mailing list m5-users@m5sim.org http://m5sim.org/cgi-bin/mailman/listinfo/m5-users