DDN also have a paid for product for doing moving of data (data flow) We found 
out about it after we did a massive data migration...

I can't comment on it other than being aware of it. Sure your local DDN sales 
person can help.

But if only IBM supported some sort of restripe to new block size, we wouldn't 
have to do this mass migration :-P

Simon 
________________________________________
From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of Simon Thompson 
[s.j.thomp...@bham.ac.uk]
Sent: 05 March 2019 16:38
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] suggestions forwar copying one GPFS file system 
into another

I wrote a patch to mpifileutils which will copy gpfs attributes, but when we 
played with it with rsync, something was obviously still different about the 
attrs from each, so use with care.

Simon
________________________________________
From: gpfsug-discuss-boun...@spectrumscale.org 
[gpfsug-discuss-boun...@spectrumscale.org] on behalf of Ratliff, John 
[jdrat...@iu.edu]
Sent: 05 March 2019 16:21
To: gpfsug-discuss@spectrumscale.org
Subject: [gpfsug-discuss] suggestions for copying one GPFS file system into     
another

We use a GPFS file system for our computing clusters and we’re working on 
moving to a new SAN.

We originally tried AFM, but it didn’t seem to work very well. We tried to do a 
prefetch on a test policy scan of 100 million files, and after 24 hours it 
hadn’t pre-fetched anything. It wasn’t clear what was happening. Some smaller 
tests succeeded, but the NFSv4 ACLs did not seem to be transferred.

Since then we started using rsync with the GPFS attrs patch. We have over 600 
million files and 700 TB. I split up the rsync tasks with lists of files 
generated by the policy engine and we transferred the original data in about 2 
weeks. Now we’re working on final synchronization. I’d like to use one of the 
delete options to remove files that were sync’d earlier and then deleted. This 
can’t be combined with the files-from option, so it’s harder to break up the 
rsync tasks. Some of the directories I’m running this against have 30-150 
million files each. This can take quite some time with a single rsync process.

I’m also wondering if any of my rsync options are unnecessary. I was using 
avHAXS and numeric-ids. I’m thinking the A (acls) and X (xatttrs) might be 
unnecessary with GPFS->GPFS. We’re only using NFSv4 GPFS ACLs. I don’t know if 
GPFS uses any xattrs that rsync would sync or not. Removing those two options 
removed several system calls, which should make it much faster, but I want to 
make sure I’m syncing correctly. Also, it seems there is a problem with the 
GPFS patch on rsync where it will always give an error trying to get GPFS 
attributes on a symlink, which means it doesn’t sync any symlinks when using 
that option. So you can rsync symlinks or GPFS attrs, but not both at the same 
time. This has lead to me running two rsyncs, one to get all files and one to 
get all attributes.

Thanks for any ideas or suggestions.

John Ratliff | Pervasive Technology Institute | UITS | Research Storage – 
Indiana University | http://pti.iu.edu

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to