It's also not a problem with the TSM database, which is sitting on a RAID10 of 4 SSDs. I'll watch dstat and it'll basically do a large 500MB/s write to the LUN every 30 mins - hour and then just sit idle waiting for more data.
On Fri, Mar 21, 2014 at 5:16 PM, Sabuj Pattanayek <[email protected]> wrote: > > the very first challenge is to find what data has changed. the way TSM >> does this is by crawling trough your filesystem, looking at mtime on each >> file to find out which file has changed. think about a ls -Rl on your >> filesystem root. this > > > The mmbackup mmapplypolicy phase is not a problem. It's going to take as > long as it's going to take. We're using 5 RAID1 SAS SSD NSD's for metadata > and it takes ~1 hour to do the traversal through 157 million files. > > >> >> the 2nd challenge is if you have to backup a very large number (millions) >> of very small (<32k) files. >> the main issue here is that for each file TSM issues a random i/o to >> GPFS, one at a time, so your throughput directly correlates with size of >> the files and latency for a single file read operation. if you are not on >> 3.5 TL3 and/or your files don't fit into the inode its actually even 2 >> random i/os that are issued as you need to read the metadata followed by >> the data block for the file. >> in this scenario you can only do 2 things : > > > The problem here is why is a single rsync or tar | tar process orders of > magnitude faster than a single tsm client at pulling data off of GPFS into > the same backup system's disk (e.g. disk pool)? It's not a problem with > GPFS, it's a problem with TSM itself. We tried various things, e.g. : > > 1) changed commmethod to sharedmem > 2) increase txnbytelimit to 10G > 3) increased movesizethresh to the same as txnbytelimit (10G) > 4) increase diskbufsize to 1023kb > 5) increased txngroupmax to 65000 > 6) increased movesizethresh to 10240 > > the next problem is that one would expect backups to tape to do straight > sequential I/O to tape, in the case of putting the files to the disk pool > before moving them to tape, it did the same random I/O to tape even with > 8GB disk pool chunks. We haven't tried the file pool option yet, but we've > been told that it'll do the same thing. If I'm tar'ing or dd'ing large > files to tape that's the most efficient, why doesn't TSM do something > similar? > > >> 1. parallelism - mmbackup again starts multiple processes in parallel to >> speed up this phase of the backup > > > ..use multiple clients. This would help, but again I'm trying to get a > single tsm client to be on par with a single "cp" process. > > Thanks, > Sabuj >
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
