We recently restored a large mail server. We restored about nine million files with a total size of about ninety gigabytes. These were read from nine 3490 K tapes. The node we were restoring is the only node using the storage pool involved. We ran three parallel streams. The restore took just over 24 hours.
The client is Intel Linux with 5.2.3.0 client code. The server is mainframe Linux with 5.2.2.0 server code. 'Query session' commands run during the restore showed the sessions in 'Run' status most of the time. Accounting records reported the sessions in media wait most of the time. We think most of this time was spent waiting for movement of tape within a drive, not waiting for tape mounts. Our analysis has so far turned up only two obvious problems: the movebatchsize and movesizethreshold options were smaller than IBM recommends. On the face of it, these options affect server housekeeping operations rather than restores. Could these options have any sort of indirect impact on restore performance? For example, one of my co-workers speculated that the option values might be forcing migration to write smaller blocks on tape, and that the restore performance might be degraded by reading a larger number of blocks. We are thinking of running a test restore with tracing enabled on the client, the server, or both. Which trace classes are likely to be informative without adding too much overhead? We are particularly interested in information on the server side. The IBM documentation for most of the server trace classes seems to be limited to the names of the trace classes.
