Hi, there are various ways to speed up backup of data from GPFS to TSM (which btw is the same problem for most other backup solutions as well), but first one needs to find out what the problem is as it can be multiple completely independent issues and each of them needs a different solution, so let me explain the different issues and also what you can do about them.
the very first challenge is to find what data has changed. the way TSM does this is by crawling trough your filesystem, looking at mtime on each file to find out which file has changed. think about a ls -Rl on your filesystem root. this can, depending on how many files you have, take days in a large scale environment (think 100's of millions of files). there is very little one can speed up on this process the way its done. all you can do is put metadata on faster disks (e.g. SSD) and that will improve the speed of this 'scan phase'. an alternative is to not do this scan at all with the TSM client, but instead let GPFS find out for TSM what files have changed and then share this information with TSM . the function/command in GPFS to do so is called mmbackup : https://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.v3r5.gpfs100.doc%2Fbl1adm_backupusingmmbackup.htm it essentially traverses the GPFS metadata sequentially and in parallel across all nodes and filters out files that needs to be backed up. in several customer environments i was called to assist with issues like this, this change alone did speed up the backup process multiple orders of magnitudes. we had a few customers where this change reduced the scan time from days down to minutes. its not always this big, but its usually the largest chunk of the issue. the 2nd challenge is if you have to backup a very large number (millions) of very small (<32k) files. the main issue here is that for each file TSM issues a random i/o to GPFS, one at a time, so your throughput directly correlates with size of the files and latency for a single file read operation. if you are not on 3.5 TL3 and/or your files don't fit into the inode its actually even 2 random i/os that are issued as you need to read the metadata followed by the data block for the file. in this scenario you can only do 2 things : 1. parallelism - mmbackup again starts multiple processes in parallel to speed up this phase of the backup 2. use a 'helper' process to prefetch data for a single TSM client so all data comes out of cache and the latency for the random reads is eliminated to increase throughput. without any of this seeing only a few low MB/sec is not uncommon for customers, but with the changes above you are able to backup very large quantities of data. hope this helps. Sven ------------------------------------------ Sven Oehme Scalable Storage Research email: [email protected] IBM Almaden Research Lab ------------------------------------------ From: Sabuj Pattanayek <[email protected]> To: gpfsug main discussion list <[email protected]> Date: 03/20/2014 06:39 PM Subject: Re: [gpfsug-discuss] Is TSM/HSM 7.1 compatible with GPFS 3.5.0.12 ? Sent by: [email protected] We're using tsm 7.1 with gpfs 3.5.0.11. At some point we do want to enable the HSM features but haven't had time to properly configure/set them up yet. I had dmapi enabled on GPFS but was never able to bring it up with dmapi enabled. Everything wasn't properly configured at the time and we were missing some pieces (not my post, but same issue) : https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014622591 I'd say that we are having less than optimal performance with TSM however. We're only able to pull about 4TB a day. It took us 20 days to backup 80TB for our initial full. Using rsync/tar piped to tar would probably have taken less than a week. We tried various methods, e.g. using a fast intermediate diskpool, going simultaneously to our 6 LTO6 tape drives, etc but each "cp" (tsm client) process that TSM would use seemed to be very slow. We tweaked just about every setting to optimize performance but to really no avail. When going to the disk pool this is what should have happened : GPFS => relatively fast random I/O (on par with rsync/tar piped to tar) tsm disk cache => large sequential I/O's for each disk pool volume => tape this is what really happened GPFS => slow random I/O => tsm disk pool cache => slow random I/O => tape so instead we did : GPFS => slow random I/O (TSM) => tape ..but was the same speed as going through the tsm disk pool cache. We closely monitored the network, disk, memory, cpu, on the tsm server and none of the hardware or capabilities of the server were the bottleneck, it was all in TSM. If anyone has seen this sort of behavior and has some pointers/hints at improving performance I'd be glad to hear it. Thanks, Sabuj On Thu, Mar 20, 2014 at 5:21 PM, Grace Tsai <[email protected]> wrote: Hi, Is TSM/HSM 7.1 compatible with GPFS 3.5.0.12 ? Thanks. Grace _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
