Dan,

I think I found the problem. 

In my process of trial and error I got back to the following code:

import fipy as fp
import tempfile

tempfile.tempdir = './'  #1
print tempfile.gettempdir()  #2

f = open('tmsh3d.geo')
geo = f.read()

fp.parallelComm.Barrier() #3
mesh = fp.Gmsh3D(geo) #4

mx,my,mz = mesh.getCellCenters()

print '3D mesh set up'

_____________________

Setting the tempdir in python (#1 above) did not fix the problem, although 
somehow a few runs actually completed. But your advice on the /tmp directory 
was correct, but not complete enough for our cluster configuration.

Here is what I found that worked.

If set the TMPDIR environment variable in the PBS script to match the directory 
in #1, great success! 

If I comment #3 sometimes I will get an error that says something about not 
finding the end of file in the .msh file (I can try to repeat and get the 
actual message if you desire). Leaving #3 in gives comfort that all cores will 
start #4 at the same time.

If I comment out #1 and look at the results from #2, I consistently saw 4 of 12 
cores (over 3 nodes) print the $TMPDIR directory, the other cores printed their 
local /tmp directory. Our cluster nodes do not share /tmp space so the 
gettempdir() results were not necessarily the same even if the string was 
identical.

Success was found, but only after a bit of trial and error. I was surprised 
that some of the processes would ignore the $TMPDIR environment variable and 
use the local /tmp space.

Now with TMPDIR and tempfile.tempdir both set and pointing to the same 
directory I do get a warning from MPI that using a network file system may not 
be the best solution. But in my case it works!!

Thanks,

Bill




_______________________________________________
fipy mailing list
[email protected]
http://www.ctcms.nist.gov/fipy
  [ NIST internal ONLY: https://email.nist.gov/mailman/listinfo/fipy ]

Reply via email to