It may be overkill for your use case but MPI file utils is very good for large datasets.
https://github.com/hpc/mpifileutils Cheers, Carl. On Fri, 19 Oct 2018 at 7:05 am, <[email protected]> wrote: > Thank you all for the responses. I'm currently using msrsync and things > appear to be going very well. > > The data transfer is contained inside our DC. I'm transferring a user's > home directory content from one GPFS file system to another. Our IBM > Spectrum Scale Solution consists of 12 IO nodes connected to IB and the > client node that I'm transferring the data from one fs to another is also > connected to IB with a possible maximum of 2 hops. > > [root@client-system]# /gpfs/home/dwayne/bin/msrsync -P --stats -p 32 > /gpfs/home/user/ /research/project/user/ > [64756/992397 entries] [30.1 T/239.6 T transferred] [81 entries/s] [39.0 > G/s bw] [monq 0] [jq 62043] > > Best, > Dwayne > > -----Original Message----- > From: [email protected] [mailto: > [email protected]] On Behalf Of Christopher Black > Sent: Thursday, October 18, 2018 4:43 PM > To: gpfsug main discussion list <[email protected]> > Subject: Re: [gpfsug-discuss] Best way to migrate data > > Other tools and approaches that we've found helpful: > msrsync: handles parallelizing rsync within a dir tree and can greatly > speed up transfers on a single node with both filesystems mounted, > especially when dealing with many small files > Globus/GridFTP: set up one or more endpoints on each side, gridftp will > auto parallelize and recover from disruptions > > msrsync is easier to get going but is limited to one parent dir per node. > We've sometimes done an additional level of parallelization by running > msrsync with different top level directories on different hpc nodes > simultaneously. > > Best, > Chris > > Refs: > https://github.com/jbd/msrsync > https://www.globus.org/ > > On 10/18/18, 2:54 PM, "[email protected] on > behalf of Sanchez, Paul" <[email protected] on > behalf of [email protected]> wrote: > > Sharding can also work, if you have a storage-connected compute grid > in your environment: If you enumerate all of the directories, then use a > non-recursive rsync for each one, you may be able to parallelize the > workload by using several clients simultaneously. It may still max out the > links of these clients (assuming your source read throughput and target > write throughput bottlenecks aren't encountered first) but it may run that > way for 1/100th of the time if you can use 100+ machines. > > -Paul > -----Original Message----- > From: [email protected] < > [email protected]> On Behalf Of Buterbaugh, Kevin L > Sent: Thursday, October 18, 2018 2:26 PM > To: gpfsug main discussion list <[email protected]> > Subject: Re: [gpfsug-discuss] Best way to migrate data > > Hi Dwayne, > > I’m assuming you can’t just let an rsync run, possibly throttled in > some way? If not, and if you’re just tapping out your network, then would > it be possible to go old school? We have parts of the Medical Center here > where their network connections are … um, less than robust. So they tar > stuff up to a portable HD, sneaker net it to us, and we untar is from an > NSD server. > > HTH, and I really hope that someone has a better idea than that! > > Kevin > > > On Oct 18, 2018, at 12:19 PM, [email protected] wrote: > > > > Hi, > > > > Just wondering what the best recipe for migrating a user’s home > directory content from one GFPS file system to another which hosts a larger > research GPFS file system? I’m currently using rsync and it has maxed out > the client system’s IB interface. > > > > Best, > > Dwayne > > — > > Dwayne Hart | Systems Administrator IV > > > > CHIA, Faculty of Medicine > > Memorial University of Newfoundland > > 300 Prince Philip Drive > > St. John’s, Newfoundland | A1B 3V6 > > Craig L Dobbin Building | 4M409 > > T 709 864 6631 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttp-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss-26amp-3Bdata-3D02-257C01-257CKevin.Buterbaugh-2540vanderbilt.edu-257Ccca728d2d61f4be06bcd08d6351f3650-257Cba5a7f39e3be4ab3b45067fa80faecad-257C0-257C0-257C636754805507359478-26amp-3Bsdata-3D2YAiqgqKl4CerlyCn3vJ9v9u-252FrGzbfa7aKxJ0PYV-252Fhc-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=NVJncSq-SKJSPgljdYqLDoy753jhxiKJNI2M8CexJME&e= > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e= > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e= > > > ________________________________ > > This message is for the recipient’s use only, and may contain > confidential, privileged or protected information. Any unauthorized use or > dissemination of this communication is prohibited. If you received this > message in error, please immediately notify the sender and destroy all > copies of this message. The recipient should check this email and any > attachments for the presence of viruses, as we accept no liability for any > damage caused by any virus transmitted by this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss >
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
