Dear Steven, dear meep users!

We upgraded our meep installation from 0.20.3 (from debian package) to 1.0.3 
(self compiled with mpich and parallel HDF5, all dependencies from debian 
package system).

We are working on a intel cluster with several nodes. All simulation data is on 
a file server and the directories are mounted via NFS to all nodes. 
If we start a meep-mpi (1.0.3) simulation at one of these nodes on one 
NFS-mounted directory, the simulation hangs up as soon as file output (output 
to a HDF5-file in this case) is started. If we disable file output the 
simulation works well. 
If we copy the ctl file to a local directory the simulation works well too, 
even with file output.
We also tried to use ssh-mounted filsystems instead of NFS but with the same 
result.

If we compile meep-mpi with serial HDF5, it works for ~50% of the simulations. 
But many calculations are aborted at an arbitrary time-step with following 
error message:

HDF5-DIAG: Error detected in HDF5 library version: 1.6.6 thread 3062614576.  
Back trace follows.
  #000: ../../../src/H5F.c line 2049 in H5Fopen(): unable to open file
    major(04): File interface
    minor(17): Unable to open file
  #001: ../../../src/H5F.c line 1829 in H5F_open(): unable to read superblock
    major(04): File interface
    minor(24): Read failed
  #002: ../../../src/H5Fsuper.c line 312 in H5F_read_superblock(): truncated 
file
    major(04): File interface
    minor(21): File has been truncated
HDF5-DIAG: Error detected in HDF5 library version: 1.6.6 thread 3062614576.  
Back trace follows.
  #000: ../../../src/H5D.c line 1163 in H5Dopen(): not a location
    major(01): Function arguments
    minor(03): Inappropriate type
  #001: ../../../src/H5G.c line 1928 in H5G_loc(): invalid object ID
    major(01): Function arguments
    minor(05): Bad value
HDF5-DIAG: Error detected in HDF5 library version: 1.6.6 thread 3062614576.  
Back trace follows.
  #000: ../../../src/H5D.c line 1266 in H5Dget_space(): not a dataset
    major(01): Function arguments
    minor(03): Inappropriate type
HDF5-DIAG: Error detected in HDF5 library version: 1.6.6 thread 3062614576.  
Back trace follows.
  #000: ../../../src/H5S.c line 856 in H5Sget_simple_extent_ndims(): not a data 
space
    major(01): Function arguments
    minor(03): Inappropriate type
meep: error on line 548 of h5file.cpp: file data is inconsistent rank for 
subsequent extend_data
[0] MPI Abort by user Aborting program !
[0] Aborting program!
p0_31944:  p4_error: : 1

It seems that the output of field slices with the in-volume command causes the 
problem.

Does anyone has seen this before?
How do you access remote file systems if running simulations on several 
computers?

Thanks and best regards,
Roman & Paul
-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01

_______________________________________________
meep-discuss mailing list
meep-discuss@ab-initio.mit.edu
http://ab-initio.mit.edu/cgi-bin/mailman/listinfo/meep-discuss

Reply via email to