On Wed, Apr 2, 2014 at 6:04 PM, Seufzer, William J. (LARC-D307) <[email protected]> wrote:
> Setting the tempdir in python (#1 above) did not fix the problem, although > somehow a few runs actually completed. But your advice on the /tmp directory > was correct, but not complete enough for our cluster configuration. Interesting. This makes fixing / automating the setting of the temp directory quite difficult. Would changing the $TMPDIR from within Python even work? > > Here is what I found that worked. > > If set the TMPDIR environment variable in the PBS script to match the > directory in #1, great success! > > If I comment #3 sometimes I will get an error that says something about not > finding the end of file in the .msh file (I can try to repeat and get the > actual message if you desire). Leaving #3 in gives comfort that all cores > will start #4 at the same time. The .msh file may be written on only one of the proccesses, but read on all of them. That is probably a bug. I'll put in a bug report for that. > If I comment out #1 and look at the results from #2, I consistently saw 4 of > 12 cores (over 3 nodes) print the $TMPDIR directory, the other cores printed > their local /tmp directory. Our cluster nodes do not share /tmp space so the > gettempdir() results were not necessarily the same even if the string was > identical. > > Success was found, but only after a bit of trial and error. I was surprised > that some of the processes would ignore the $TMPDIR environment variable and > use the local /tmp space. That is weird. > Now with TMPDIR and tempfile.tempdir both set and pointing to the same > directory I do get a warning from MPI that using a network file system may > not be the best solution. But in my case it works!! Good. I hope you make good progress with your work. -- Daniel Wheeler _______________________________________________ fipy mailing list [email protected] http://www.ctcms.nist.gov/fipy [ NIST internal ONLY: https://email.nist.gov/mailman/listinfo/fipy ]
