Hi Joel, We haven't run into this specific issue internally; I think our bugs must be robust enough that they're reproducible even in the face of minor changes from system to system. (And of course once you start debugging on a particular system, the behavior is deterministic.)
This is really a symptom of a larger problem with SE mode, which is that too much of the underlying host filesystem leaks into the simulated environment. We've run into this in a bigger way with dynamic linking (and yes, Mike Upton, those were our patches that were described at the workshop which we haven't posted yet), where you can end up linking in different versions of shared libraries when you run on different platforms and ld.so just grabs whatever happens to be in the local /lib directory. Another thing David did this past summer is set up a path redirection scheme that lets you build a fake filesystem for your SE-mode binary. This lets you do things like set up a directory tree with a particular set of shared libs in it, make that look like "/" to the simulated binary, then reuse that tree on multiple systems. It also allows you to do some handy things like auto-generate a file that shows up as /proc/cpuinfo to the simulated system with information determined by your gem5 config. This doesn't directly solve the readlink() issue, but I think you could use this feature to put your executable at a reproducible host-independent path. Of course, this is yet another set of patches we have yet to post, as far as I can tell. Unfortunately we have some big deadlines this month, but I'm hoping we can get back to clearing up some of our patch backlog next month. In the meantime, as Tony says, I'm not sure how else to resolve the tension between reproducibility and correctness here. Steve On Wed, Sep 9, 2015 at 11:14 AM Joel Hestness <[email protected]> wrote: > > If any codes use readlink() in this way, they need the correct path, and > > not gem5’s path. This isn’t non-determinism, the runs will always be the > > same given the binaries are in the same path. I’m not sure how this can > be > > solved other than to say we won’t allow codes to use readlink() this way. > > > > It's not a viable solution to just disallow this kind of use, though. GLIBC > uses readlink, which gets compiled into some static binaries, so a gem5 > user may not have control over the use of readlink. A simple example where > this fails is if I run simulations on a cluster with a different directory > structure than machines I can use to debug. The difference in directory > structure is enough so that the cluster runs cannot be replicated on other > machines. > > > > > Using readlink() on /proc/self/exe would return the path to the gem5 > > binary previously, as opposed to the binary running inside of gem5. We > > noticed this because some of our codes used readlink() this way to figure > > out their paths. > > > > Sure. Can you explain how you are running simulations such that you don't > need to worry about simulations being replicable in different settings? Are > you always able to debug simulations in the exact same setting where you > discovered a bug? > > Joel > > > > > > > *From:* Joel Hestness [mailto:[email protected]] > > *Sent:* Wednesday, September 09, 2015 10:32 AM > > *To:* gem5 Developer List; Hashe, David; Che, Shuai; Gutierrez, Anthony > > *Subject:* Re: [gem5-dev] changeset in gem5: syscall: Add readlink to x86 > > with special cas... > > > > > > > > Hey David, Shuai, and Tony, > > > > > > > > It appears that this patch can introduce simulator non-determinism due > > to the use of Process::progName(). Basically, if benchmark paths are > > changed (e.g. running on a different system), then the string copy > > operations in the "/proc/self/exe" codepath can create strings of > different > > lengths. As a result, readlink can return different values depending on > the > > progName() and cause the simulated benchmark to behave differently. This > is > > problematic when trying to replicate sims for regressions or debugging. > > > > > > > > It looks like this patch aims to address changes in GLIBC, which now > > uses readlink to find the binary before handling environment variables > (in > > _dl_non_dynamic_init <http://osxr.org/glibc/source/elf/dl-support.c#0312 > >). > > That said, I'm not clear how this patch is supposed to fix problems or > how > > we can avoid the nondeterminism it introduces. Can you guys elaborate on > > how this change is intended to work (e.g. what is the use case you were > > aiming for)? > > > > > > > > Thank you, > > > > Joel > > > > > > > > > > > > > > > > On Sat, Aug 1, 2015 at 11:07 AM, David Hashe <[email protected]> > wrote: > > > > changeset 9abf6a7c14ab in /z/repo/gem5 > > details: http://repo.gem5.org/gem5?cmd=changeset;node=9abf6a7c14ab > > description: > > syscall: Add readlink to x86 with special case /proc/self/exe > > > > This patch implements the correct behavior. > > > > diffstat: > > > > src/arch/x86/linux/process.cc | 6 +++--- > > src/sim/syscall_emul.cc | 31 ++++++++++++++++++++++++++----- > > 2 files changed, 29 insertions(+), 8 deletions(-) > > > > diffs (80 lines): > > > > diff -r 255ebb0b32b4 -r 9abf6a7c14ab src/arch/x86/linux/process.cc > > --- a/src/arch/x86/linux/process.cc Mon Jul 20 09:15:18 2015 -0500 > > +++ b/src/arch/x86/linux/process.cc Mon Jul 20 09:15:18 2015 -0500 > > @@ -485,7 +485,7 @@ > > /* 264 */ SyscallDesc("renameat", unimplementedFunc), > > /* 265 */ SyscallDesc("linkat", unimplementedFunc), > > /* 266 */ SyscallDesc("symlinkat", unimplementedFunc), > > - /* 267 */ SyscallDesc("readlinkat", unimplementedFunc), > > + /* 267 */ SyscallDesc("readlinkat", readlinkFunc), > > /* 268 */ SyscallDesc("fchmodat", unimplementedFunc), > > /* 269 */ SyscallDesc("faccessat", unimplementedFunc), > > /* 270 */ SyscallDesc("pselect6", unimplementedFunc), > > @@ -626,7 +626,7 @@ > > /* 82 */ SyscallDesc("select", unimplementedFunc), > > /* 83 */ SyscallDesc("symlink", unimplementedFunc), > > /* 84 */ SyscallDesc("oldlstat", unimplementedFunc), > > - /* 85 */ SyscallDesc("readlink", unimplementedFunc), > > + /* 85 */ SyscallDesc("readlink", readlinkFunc), > > /* 86 */ SyscallDesc("uselib", unimplementedFunc), > > /* 87 */ SyscallDesc("swapon", unimplementedFunc), > > /* 88 */ SyscallDesc("reboot", unimplementedFunc), > > @@ -846,7 +846,7 @@ > > /* 302 */ SyscallDesc("renameat", unimplementedFunc), > > /* 303 */ SyscallDesc("linkat", unimplementedFunc), > > /* 304 */ SyscallDesc("symlinkat", unimplementedFunc), > > - /* 305 */ SyscallDesc("readlinkat", unimplementedFunc), > > + /* 305 */ SyscallDesc("readlinkat", readlinkFunc), > > /* 306 */ SyscallDesc("fchmodat", unimplementedFunc), > > /* 307 */ SyscallDesc("faccessat", unimplementedFunc), > > /* 308 */ SyscallDesc("pselect6", unimplementedFunc), > > diff -r 255ebb0b32b4 -r 9abf6a7c14ab src/sim/syscall_emul.cc > > --- a/src/sim/syscall_emul.cc Mon Jul 20 09:15:18 2015 -0500 > > +++ b/src/sim/syscall_emul.cc Mon Jul 20 09:15:18 2015 -0500 > > @@ -365,12 +365,10 @@ > > } > > strncpy((char *)buf.bufferPtr(), cwd.c_str(), size); > > result = cwd.length(); > > - } > > - else { > > + } else { > > if (getcwd((char *)buf.bufferPtr(), size) != NULL) { > > result = strlen((char *)buf.bufferPtr()); > > - } > > - else { > > + } else { > > result = -1; > > } > > } > > @@ -405,7 +403,30 @@ > > > > BufferArg buf(bufPtr, bufsiz); > > > > - int result = readlink(path.c_str(), (char *)buf.bufferPtr(), > bufsiz); > > + int result = -1; > > + if (path != "/proc/self/exe") { > > + result = readlink(path.c_str(), (char *)buf.bufferPtr(), > bufsiz); > > + } else { > > + // readlink() will return the path of the binary given > > + // with the -c option, however it is possible that this > > + // will still result in incorrect behavior if one binary > > + // runs another, e.g., -c time -o "my_binary" where > > + // my_binary calls readlink(). this is a very unlikely case, > > + // so we issue a warning. > > + warn_once("readlink may yield unexpected results if multiple " > > + "binaries are used\n"); > > + if (strlen(p->progName()) > bufsiz) { > > + // readlink will truncate the contents of the > > + // path to ensure it is no more than bufsiz > > + strncpy((char*)buf.bufferPtr(), p->progName(), bufsiz); > > + result = bufsiz; > > + } else { > > + // return the program's working path rather > > + // than the one for the gem5 binary itself. > > + strcpy((char*)buf.bufferPtr(), p->progName()); > > + result = strlen((char*)buf.bufferPtr()); > > + } > > + } > > > > buf.copyOut(tc->getMemProxy()); > > > > _______________________________________________ > > gem5-dev mailing list > > [email protected] > > http://m5sim.org/mailman/listinfo/gem5-dev > > > > > > > > > > > > -- > > > > Joel Hestness > > PhD Candidate, Computer Architecture > > Dept. of Computer Science, University of Wisconsin - Madison > > http://pages.cs.wisc.edu/~hestness/ > > > > > > -- > Joel Hestness > PhD Candidate, Computer Architecture > Dept. of Computer Science, University of Wisconsin - Madison > http://pages.cs.wisc.edu/~hestness/ > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
