On Thu, 10 Jul 2003, Ted Manos wrote: > (cross-posted to [EMAIL PROTECTED] and [EMAIL PROTECTED]) > > > Hello all (particularly Alan, Romney and crew!), > > > We have been doing testing with a new development version of SAS V9 for > Linux390 for a couple months now, and had not run into any major issues > until just recently. We are near the end of our "Proof of Concept", and > just ran into this problem which is a major stumbling block for us. It > appears to be an NFS locking issue, and not due to SAS. However, I learned > many, many moons ago, back when most of my hair wasn't grey (or I even > *had* most of my hair, for that matter!) to never rule out ANYTHING until a > problem/issue is resolved. > > > The problem is that when we try to write a SAS-format dataset from > SAS/Linux to an NFS-mounted CMS SFS directory, it hangs NFS. If we write > the SAS dataset to a local Linux directory, everything is fine. If we have > SAS read and/or write a flat-file to the NFS-mounted SFS directory, things > are fine. If we have SAS *read* a SAS dataset from an NFS-mounted CMS SFS > directory (the dataset was created on SAS/Unix then Ftp'd to SFS), > everything is fine. But if we try to either update or create a new SAS > dataset on the NFS-mounted SFS directory, it hangs things up tight. (Note > that we are *NOT* trying to read or write the SAS datasets from SAS in CMS > or anywhere else... just use the NFS-mounted SFS directory space as a > "remote storage pool".) > > > When it "hangs", the only way to get rid of all the remaining spawn zombies > is to re-IPL the Linux guest. The kill command will terminate most of the > processes, but not all of them. (Yes, I tried killing them from root... > every way I knew how... but am always open to new ideas/suggestions!) I > have no idea at this stage where the hang-up is occurring -- in the Linux > NFS software, the Linux kernel itself, the VM/CMS NFS server software, one > of the IP stacks, SAS, or someplace else. I'm not even sure at this stage > how to go about tracking it down, since there are a number of parts/pieces > that all come into play at various stages (I can function fairly well in > Linux, but I'm no real geek Linux hacker!). > > By "hung", I mean that all I/O (at least as far as I can tell) between the > SAS program running on Linux, the Linux NFS client representing the > particular Linux mount point/directory being used, and the VMNFS NFS server > had ceased to occur. Also, any further attempts to initiate I/O to that > NFS mount point, from any other ID/process also hang. Even root is no > longer able to do a simple directory on the mount point (e.g. ls -l > /terry), it hangs. It appears to be hung due to some kind of lock, or > pending some condition/state. That I can readily ascertain, there is no > CPU or I/O being burned in a loop. > > I do not believe that the problem is SFS, or that SFS is hung. SFS > continues to function perfectly normally when accessed from CMS. I also > don't *think* that it is the VMNFS server, as that appears to continue to > function normally for any/all other mount points it is serving, just not > the one that has hung. > > When I kill the originating process, and finally get it and all of its > spawn killed off, there still remain two of its spawn which I can not kill, > even from root, no matter what signal I try to use. The only way I am able > to reset everything to that mount point, so it can again be made > operational, is to completely shutdown and re-IPL that Linux instance, and > then re-mount all the NFS mount points. I do NOT have to do *anything* > whatever to VM, SFS or the VMNFS server. > > Does that absolve them completely?? LOL... not in MY lifetime! I've been > doing this stuff WAYYY too long to believe that until it is PROVEN to me. > It is certainly possible that the hang is being caused by some bad/goofy > "permission" within Linux, NFS, VMNFS or SFS itself, or even VMSECURE or > the ESM... or any other part or piece that may come into play. But, I do > tend to *doubt* it, since everything else continues to function as is > should, and the Linux NFS mount point comes back and functions "normally" > after Linux has been recycled and the NFS mounts re-issued. > > Unless I am missing something somewhere, it is my belief that an > NFS-mounted SFS directory should not appear any differently to Linux/Unix > than any other type of file system structure (with the exception of the 8.8 > filename limitation), since it is a hierarchical tree directory structure > and supports very large records. Record format, record length and blocking > (if any) shouldn't really be a factor if the file is just being written > from Linux and read by Linux, with nothing else coming along in between and > mucking with things. The NFS-mounted SFS should just look like any other > Linux/Unix directory/filesystem -- just a pool of disk space available to > use until you've hit your quota. > > > I am running 31-bit Linux "2.4.7-SuSE-SMP #1 SMP Wed Oct 17 15:31:03 GMT > 2001 s390" under z/VM V4.3.0 (PUT 0301) on a 2064-1Cx IFL processor. > > > Below is some related file information which may help someone to show me > the error of my ways and set me on the straight path to righteous NFSing! > > > SAMPLE SAS7BDAT E1 is a SAS dataset which was created using SAS/Unix > under AIX and then Ftp'd over to the SFS directory. > [EMAIL PROTECTED] AS7BDA#N E1 appears to be the start of a new copy of the > aforementioned dataset, created when we try to update it from SAS/Linux390. > I am not absolutely certain what ##NFS## #NAMES# E1 is, but it appears to > be an index/directory of files in that particular SFS directory which have > been/are being accessed/updated via NFS, and that are in a "locked" state. > And I'm not certain what ##NFS## #VHIST# E1 is, but it appears that it > might just be a history/listing of any/all pending NFS requests/operations > against files in that particular SFS directory. Alan/Romney... am I > correct, or goofy? (Hey now, guys... be nice! I meant in regards to what > I was thinking!! <g>) > > > Help/comments/suggestions??? Many, many advance thanks!!
In this case I've chosen not to prune;-) Two things to try: 1. Mount on Linux with '-o nolock.' This is for diagnostic purposes, and I don't recommend running a production workload like this. If it's a locking problem, this should circumvent it. 2. Try mounting a filesystem from another Linux client and see whether you have the problem with that. Be sure you have no firewalls between the two: while it's possible to push NFS through a firewall, this is not the time to do it. 3. Are yes. Check there are no firewalls in the way;-) -- Cheers John. Join the "Linux Support by Small Businesses" list at http://mail.computerdatasafe.com.au/mailman/listinfo/lssb Copyright John Summerfield. Reproduction prohibited.
