Hi

  I feel as an 'other products do exist' I should also mention Ngenea and APSync which could meet the technical requirements of these use cases.

Ngenea allows you to bring data in from 'cloud' and also of interest in this specific use case, POSIX filesystems or filer islands.  You can present the remote data available locally and then inflate the data either on demand or via enacted process.  Massively parallel, multi-node, highly threaded with extremely granular rules based control.   You can also migrate data back to your filer re-utilising such islands as tiers.  You can even use it to 'virtually tier' within GPFS/Scale filesystems, alike a 'hardlink across independent filesets'.  Or even across Global WANs for true 24x7 follow-the-sun working practices.

APSync also provides a differently patched version of rsync and builds on top of the 'SnapDiff' technology previously presented at the UG whereby you don't need to re-scan your entire filesystem for each sync and thus can do incremental changes for create, modified, deleted and _track moved files_.  Handy and extremely time saving over regularised full runs.  Massively parallel, multi-node, highly threaded (a common theme with our tools...).

As I don't do sales; if anyone wants to talk tech nuts-and-bolts with me about these, or you have challenges (and I love a challenge..) by all means hit me up directly.  I like solving people's blockers :-)

Happy Friday ppl,

Jez


On 05/03/2019 21:38, Simon Thompson wrote:
DDN also have a paid for product for doing moving of data (data flow) We found 
out about it after we did a massive data migration...

I can't comment on it other than being aware of it. Sure your local DDN sales 
person can help.

But if only IBM supported some sort of restripe to new block size, we wouldn't 
have to do this mass migration :-P

Simon
________________________________________
From: [email protected] 
[[email protected]] on behalf of Simon Thompson 
[[email protected]]
Sent: 05 March 2019 16:38
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] suggestions forwar copying one GPFS file system 
into another

I wrote a patch to mpifileutils which will copy gpfs attributes, but when we 
played with it with rsync, something was obviously still different about the 
attrs from each, so use with care.

Simon
________________________________________
From: [email protected] 
[[email protected]] on behalf of Ratliff, John 
[[email protected]]
Sent: 05 March 2019 16:21
To: [email protected]
Subject: [gpfsug-discuss] suggestions for copying one GPFS file system into     
another

We use a GPFS file system for our computing clusters and we’re working on 
moving to a new SAN.

We originally tried AFM, but it didn’t seem to work very well. We tried to do a 
prefetch on a test policy scan of 100 million files, and after 24 hours it 
hadn’t pre-fetched anything. It wasn’t clear what was happening. Some smaller 
tests succeeded, but the NFSv4 ACLs did not seem to be transferred.

Since then we started using rsync with the GPFS attrs patch. We have over 600 
million files and 700 TB. I split up the rsync tasks with lists of files 
generated by the policy engine and we transferred the original data in about 2 
weeks. Now we’re working on final synchronization. I’d like to use one of the 
delete options to remove files that were sync’d earlier and then deleted. This 
can’t be combined with the files-from option, so it’s harder to break up the 
rsync tasks. Some of the directories I’m running this against have 30-150 
million files each. This can take quite some time with a single rsync process.

I’m also wondering if any of my rsync options are unnecessary. I was using avHAXS 
and numeric-ids. I’m thinking the A (acls) and X (xatttrs) might be unnecessary 
with GPFS->GPFS. We’re only using NFSv4 GPFS ACLs. I don’t know if GPFS uses 
any xattrs that rsync would sync or not. Removing those two options removed 
several system calls, which should make it much faster, but I want to make sure 
I’m syncing correctly. Also, it seems there is a problem with the GPFS patch on 
rsync where it will always give an error trying to get GPFS attributes on a 
symlink, which means it doesn’t sync any symlinks when using that option. So you 
can rsync symlinks or GPFS attrs, but not both at the same time. This has lead to 
me running two rsyncs, one to get all files and one to get all attributes.

Thanks for any ideas or suggestions.

John Ratliff | Pervasive Technology Institute | UITS | Research Storage – 
Indiana University | http://pti.iu.edu

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

--
*Jez Tucker*
Head of Research and Development, Pixit Media
07764193820 | [email protected] <mailto:[email protected]>
www.pixitmedia.com <http://www.pixitmedia.com> | Tw:@pixitmedia.com <https://twitter.com/PixitMedia>

--
<http://pixitmedia.com>
This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to