Hi, I have done quite similar restores on our mailserver. you may also look at the Client what happens to the restore-process. It may happen that the cpu is at 100 % for the 'dsmc restore ..' ? Another thing is the filesystem on the Client and you may check the filesystem/disk-activity/Service-time if there is any 'weakness' that may result from creating that many i-nodes.
I have recently done a lot of mailserver-restores (always 3,5 mio Files/140 GB ) using an old tsm-server ( v5.1.9.5 with k-tapes and same konfig like you ... 10 tapes ) and observed that specially this old tsm-server was at the end. Especially our io-konfiguration of that old tsm-server was very bad : db,log, disk-cache are mixed up. This decreases the restore-performance especially when other activity ( backups at night ) happens. So we used dsmc restore -quiet /mail/ /data2/mail/ (tcpwindowsize 64, tcpbuffsize 32, largecommbuffers no, txnbytelimit 25600 resourceutilization 3) and received the 3,5 mio Files/140 GB finally in 09:53:34 For me that was ok because I know about the bad server-constitution. The restore time would be much more worse if the restore comes into a time when the tsm-DB got a lot of other transactions - like nighly backups. ... restoring the same with only one drive results in 51 hours . Running the same mail-restore test on a new hardware ( new db, tsm5.3, with 3592 Drives ) --using the same restore-client--- we finally got 3.5mio Files/150GB restored in 04:52:00 ... using just 1 drive because the data fits on 1 3599-tape. But here I have experienced a reproduceable bug/behaviour ( it is in the moment 'closed' because the solaris10 is not yet supported ) : when starting the restore everything runs fine and fast ( with a restore-performance at about 1 mio Files/hour ) ... after some time -maybe 40 % of the total restore time- the cpu of the client is raising to 100 % and the restore performance ( data/files) is thus slowing down -- there is no reason for this found at the server or at the client. ... maybe it happens when a very big directory with a lot of directory in it is in progress ... In the end I found a 'workaround': I canceled this slowed-down restore-process running at 100%CPU ( 'dsmc restore -quiet /mail/ /data2/mail/' ) with Control-C, and let him shut down ... and then I just restart the restore with 'dsmc restart restore -quiet' . This 'restarted restore' works fast again and finally ends with the 04:52:00 (total time). If I would not stop/restart the client-restore-session the restore will end restoring with 06:49:09 . That is reproduceable and it is a quite big difference ( 30 % faster with interrupting and restarting ) but maybe its because of our unsupported tsm-version ... or has someone else seen this "cpu-crunching" behaviour ? Greetings Rainer Thomas Denier wrote: > > We recently restored a large mail server. We restored about nine million > files with a total size of about ninety gigabytes. These were read from > nine 3490 K tapes. The node we were restoring is the only node using the > storage pool involved. We ran three parallel streams. The restore took > just over 24 hours. > > The client is Intel Linux with 5.2.3.0 client code. The server is mainframe > Linux with 5.2.2.0 server code. > > 'Query session' commands run during the restore showed the sessions in 'Run' > status most of the time. Accounting records reported the sessions in media > wait most of the time. We think most of this time was spent waiting for > movement of tape within a drive, not waiting for tape mounts. > > Our analysis has so far turned up only two obvious problems: the > movebatchsize and movesizethreshold options were smaller than IBM > recommends. On the face of it, these options affect server housekeeping > operations rather than restores. Could these options have any sort of > indirect impact on restore performance? For example, one of my co-workers > speculated that the option values might be forcing migration to write > smaller blocks on tape, and that the restore performance might be > degraded by reading a larger number of blocks. > > We are thinking of running a test restore with tracing enabled on the > client, the server, or both. Which trace classes are likely to be > informative without adding too much overhead? We are particularly > interested in information on the server side. The IBM documentation for > most of the server trace classes seems to be limited to the names of the > trace classes. -- ------------------------------------------------------------------------ Rainer Wolf eMail: [EMAIL PROTECTED] kiz - Abt. Infrastruktur Tel/Fax: ++49 731 50-22482/22471 Universität Ulm wwweb: http://kiz.uni-ulm.de
