On May 09, 2008 09:41 -0400, Phil Dickens wrote: > I am having similar struggles with locking on MPI-IO. > I am doing a simple strided write, and it fails because > of the locking. I'm a bit behind in the discussion, but > is there a way to fix (workaround) this problem?? Is this > something in my code, or the default driver (this is on > lonestar at TACC)? I have even downloaded the most up to date > version of MPICH, which I believe has a new Lustre ADIO > driver, but I am running into the same issues. > > Any thoughts would be greatly appreciated!!
One possibility is to mount the clients with "-o localflock", leaving all of the locking internal to Lustre. This in essence provides single-node flock (i.e. coherent on that node, but not across all clients). The other alternative is "-o flock", which is coherent locking across all clients, but has a noticable performance impact and may affect stability, depending on the version of lustre being used (newer is better of course). I'm not positive of the internals of the MPI-IO code, whether it depends on flock providing a barrier across nodes, or if it does this only for e.g. NFS not keeping writes coherent so they don't clobber the same page when writing. Tom is the expert here... > On Thu, 8 May 2008, Tom.Wang wrote: > > > Hi > > > > Marty Barnaby wrote: > >> To return to this discussion, in recent testing, I have found that > >> writing to a Lustre FS via a higher level library, like PNetCDF, fails > >> because the default for value for romio_ds_write is not disable. This > >> is set in the mpich code in the file /src/mpi/romio/adio/common/ad_hints.c > > You can use MPI_Info_set to disable romio_ds_write. What is the fail? > > flock? since data-sieving need flock. > >> > >> I believe it has something to do with locking issues. I'm not sure how > >> best to handle this, I'd prefer the data sieving default be disable, > >> though I don't know all the implications there. > > I agree data sieving should be disable. And also it check the contiguous > > buftype or filetype only by fileview, which is not enough sometimes, and > > trigger unnecessary read-modify-write even for contiguous > > write(especially for those higher level library, if you choose > > collective write). Since lustre has client cache and also the overhead > > of flock and read-modify-write, so I doubt the performance improvements > > we could get from data-sieving on lustre, although I do not have > > performance data to prove that. > >> Maybe an ad_lustre_open should be a place where the _ds_ hints are > >> set to disable. > > Yes, we should disable this for stride write in lustre. ad_lustre_open > > seems a right place to do this. > > > > Thanks > > WangDi > >> > >> Marty Barnaby > >> > >> > >> Weikuan Yu wrote: > >>> Andreas Dilger wrote: > >>> > >>>> On Mar 11, 2008 16:10 -0600, Marty Barnaby wrote: > >>>> > >>>>> I'm not actually sure what ROMIO abstract device the multiple CFS > >>>>> deployments I utilize were defined with. Probably just UFS, or maybe > >>>>> NFS. > >>>>> Did you have a recommended option yourself. > >>>>> > >>>> The UFS driver is the one used for Lustre if no other one exists. > >>>> > >>>> > >>>>> Besides the fact that most of the adio that were created over the years > >>>>> are > >>>>> completely obsolete and could be cleaned from ROMIO, what will the new > >>>>> one > >>>>> for Lustre offer? Particularly with respect to controls via the lfs > >>>>> utility > >>>>> that I can already get? > >>>>> > >>>> There is improved collective IO that aligns the IO on Lustre stripe > >>>> boundaries. Also the hints given to the MPIIO layer (before open, > >>>> not after) result in lustre picking a better stripe count/size. > >>>> > >>>> > >>> > >>> In addition, the one integrated into MPICH2-1.0.7 contains direct I/O > >>> support. Lockless I/O support was purged out due into my lack of > >>> confidence in low-level file system support. But it can be revived when > >>> possible. > >>> > >>> -- > >>> Weikuan Yu <+> 1-865-574-7990 > >>> http://ft.ornl.gov/~wyu/ > >>> > >>> > >> > >> ------------------------------------------------------------------------ > >> > >> _______________________________________________ > >> Lustre-discuss mailing list > >> Lustre-discuss@lists.lustre.org > >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > >> > > > > > > -- > > Regards, > > Tom Wangdi > > -- > > Sun Lustre Group > > System Software Engineer > > http://www.sun.com > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss@lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss