Hi Nick, I don't claim to have slayed that particular dragon yet, but I have uncovered some background info on the subject:
In our case we're running a mix of RHEL (RHES?) 4 & 5 (on Intel x86_64) on the TSM servers and a few storage agents (and moving to SLES at some point). The first item I've found is that the default setting for the in-kernel (or would that be in-distro?) Emulex driver defaults to running multiple discovery threads when scanning for devices at boot time. (16 threads, IIRC.) This pretty much guarantees that devices won't ever be discovered in the same order. According to the Emulex doc, this can be changed with a boot time setting to the old behaviour of one discovery thread. (Would take longer.) The second item is that CDL/EDL emulating 3584 doesn't look much like a 3584 at the WWNN level. On a 3584, each drive has a unique WWNN that incorporates both the library serial number and information about where the drive is in the library. (Control path drives have a second LUN for communicating with the library.) On CDL/EDL all virtual tape drives in a given VTL show up as different LUN numbers on the virtual library's one WWNN. (Not optimal for the default udev rules on RHEL?) The third item is that RHEL ships with a udev rule already poplulated that could be used to make the CDL/EDL tape drives persistent, by using something other than boot-time-enumerated mt#/rmt# for the naming convention. (Name the drive after some part of the WWNN & LUN number?) Personally I like this approach better than the try-to-update-it-after-the-fact-on-the-library-manager scripts we use now. When IBM allowed copy-on-write to go into Linux, I wish they'd also have donated cfgmgr. I don't think any distro would take ODM though, so a ported cfgmgr would likely be useless. [RC] From: Nick Laflamme <[email protected]> To: [email protected] Date: 03/29/2011 04:28 PM Subject: [ADSM-L] Linux & SAN Device Interruptions Sent by: "ADSM: Dist Stor Manager" <[email protected]> How are those of you who run TSM servers or storage agents on Linux on Intel doing with disruptions with SAN-attached tape devices or the SAN fabric itself? In my current shop, we run TSM servers on AIX (and MVS, but that's another story), but we have storage agents on AIX, Windows, and Red Hat Linux on Intel. The Linux storage agents are relatively new; they were first deployed about two years ago. AIX and Windows storage agents have been there a bit longer, although I can't say how much longer; I, too, have been there less than two years. One problem that we've never been able to overcome with our Linux storage agents has been that if a virtual tape library is rebooted or if the SAN fabric gets massively unzoned (it happened about a month ago to us, sigh), the Linux storage agents don't notice the return of the SAN-attached tape devices until we reboot the Linux server. (We never had the Linux servers zoned to real 3584s and real LTO tape devices; they've only ever been zoned up to EMC Clariian Disk Libraries and then DataDomains with VTL cards in them.) This has persisted across updates to LINtape, CDL code levels, Data Domain code levels, and TSM storage agent levels. Needless to say, the application teams are rather steamed with us about this. We have at times had cases open simultaneously with EMC, Red Hat, and IBM, to no avail. If you have Linux TSM servers or storage agents that gracefully recover from disruptions on your tape SAN, can you share with me (and the rest of the list, if you want) RHEL level, device driver levels, HBA configuration, and whatever else you think might be relevant? Thanks, Nick U.S. BANCORP made the following annotations --------------------------------------------------------------------- Electronic Privacy Notice. This e-mail, and any attachments, contains information that is, or may be, covered by electronic communications privacy laws, and is also confidential and proprietary in nature. If you are not the intended recipient, please be advised that you are legally prohibited from retaining, using, copying, distributing, or otherwise disclosing this information in any manner. Instead, please reply to the sender that you have received this communication in error, and then immediately delete it. Thank you in advance for your cooperation. ---------------------------------------------------------------------
