Steve, Since you're on AIX, you may want to look at the network tunable "tcp_keepidle" for keepalives instead of using libkeepalive. Setting it to something below 7200 (units are half-seconds, go figure) might do the trick for you. Stock AIX, so might be easier to get approval for a simple tunable tweak.
We've had some success with this for multi-channel TDPO restores across a firewall where some channels would be in MediaW for a long time waiting on the same tape in use by one channel (multi-channel backup to disk later migrate to a single tape, then multi-channel restore from that tape). =Dave On 05/29/2014 06:08 AM, Steven Harris wrote: > Thanks for that Thomas. > > libkeepalive appears to be able to compile on AIX, and if I can get it > past the right people we may have a solution. > > Steve. > > > On 28/05/2014 12:03 AM, Thomas Denier wrote: >> -----Steve Harris wrote: ----- >> >>> I have a situation that is causing me grief. As part of a V5 to V6 >>> upgrade I have implemented library managers. These live in one part >>> of >>> the network and the library clients live in another separated by a >>> firewall. The customer insists that timeouts be implemented on the >>> firewall for any session over 60 minutes: its a security thing for >>> some >>> reason and is non-negotiable. >>> >>> At times I get a lot of mounts queued, in the past when these were >>> local >>> mounts, they would eventually resolve themselves but now they time >>> out >>> in the firewall, never complete, and I get a cascading blockage >>> until >>> the whole server grinds to a halt. >>> >>> I'm told I can set recourcetimeout to less than the firewall timeout >>> and >>> that will cause the mounts to fail, but a lot of these are oracle >>> and >>> DB2 backups and they won't retry in a reasonable manner. >>> >>> Yes, I could use devicelasses and mount limits to reserve drives, and >>> I >>> could put some stuff on disk that now goes direct to tape, but >>> neither >>> of those are palatable. >>> >>> Of course the easiest thing would be to have the library clients use >>> keepalives on their sessions, as was added in recent versions for >>> NDMP >>> backups. I have raised an RFE to this effect at >>> >>> http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID >> = >>> 54030 >>> >>> and I'd appreciate your votes. >>> >>> Does anyone have bright ideas on how to proceed? I have thought >>> about SSL port forwarding, but apparently bypassing the controls that >>> way is frowned upon. Even if the RFE gets up, it won't help me as >>> half of the clients are still TSM 5.5 for the next six months or so >>> while we cut them over. >> If your TSM servers run under Linux you can use libkeepalive to >> make TCP connections use keepalive packets. We also have firewalls >> with a one hour timeout between our library manager and its >> clients. We had the kind of problems you describe when we first >> set up our current TSM environment. We have never had any >> trouble with firewall timeouts since we installed >> libkeepalive and set the appropriate environment variables >> for the TSM server processes. >> >> Thomas Denier >> Thomas Jefferson University Hospital >> We have >> -- Hello World. David Bronder - Systems Architect Segmentation Fault ITS-EI, Univ. of Iowa Core dumped, disk trashed, quota filled, soda warm. [email protected]
