Dan,
I think I found the problem.
In my process of trial and error I got back to the following code:
import fipy as fp
import tempfile
tempfile.tempdir = './' #1
print tempfile.gettempdir() #2
f = open('tmsh3d.geo')
geo = f.read()
fp.parallelComm.Barrier() #3
mesh = fp.Gmsh3D(geo) #4
mx,my,mz = mesh.getCellCenters()
print '3D mesh set up'
_____________________
Setting the tempdir in python (#1 above) did not fix the problem, although
somehow a few runs actually completed. But your advice on the /tmp directory
was correct, but not complete enough for our cluster configuration.
Here is what I found that worked.
If set the TMPDIR environment variable in the PBS script to match the directory
in #1, great success!
If I comment #3 sometimes I will get an error that says something about not
finding the end of file in the .msh file (I can try to repeat and get the
actual message if you desire). Leaving #3 in gives comfort that all cores will
start #4 at the same time.
If I comment out #1 and look at the results from #2, I consistently saw 4 of 12
cores (over 3 nodes) print the $TMPDIR directory, the other cores printed their
local /tmp directory. Our cluster nodes do not share /tmp space so the
gettempdir() results were not necessarily the same even if the string was
identical.
Success was found, but only after a bit of trial and error. I was surprised
that some of the processes would ignore the $TMPDIR environment variable and
use the local /tmp space.
Now with TMPDIR and tempfile.tempdir both set and pointing to the same
directory I do get a warning from MPI that using a network file system may not
be the best solution. But in my case it works!!
Thanks,
Bill
_______________________________________________
fipy mailing list
[email protected]
http://www.ctcms.nist.gov/fipy
[ NIST internal ONLY: https://email.nist.gov/mailman/listinfo/fipy ]