Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long)

John Summerfield Fri, 11 Jul 2003 06:06:43 -0700

On Thu, 10 Jul 2003, Ted Manos wrote:

> (cross-posted to [EMAIL PROTECTED] and [EMAIL PROTECTED])
>
>
> Hello all (particularly Alan, Romney and crew!),
>
>
> We have been doing testing with a new development version of SAS V9 for
> Linux390 for a couple months now, and had not run into any major issues
> until just recently.  We are near the end of our "Proof of Concept", and
> just ran into this problem which is a major stumbling block for us.  It
> appears to be an NFS locking issue, and not due to SAS.  However, I learned
> many, many moons ago, back when most of my hair wasn't grey (or I even
> *had* most of my hair, for that matter!) to never rule out ANYTHING until a
> problem/issue is resolved.
>
>
> The problem is that when we try to write a SAS-format dataset from
> SAS/Linux to an NFS-mounted CMS SFS directory, it hangs NFS.  If we write
> the SAS dataset to a local Linux directory, everything is fine.  If we have
> SAS read and/or write a flat-file to the NFS-mounted SFS directory, things
> are fine.  If we have SAS *read* a SAS dataset from an NFS-mounted CMS SFS
> directory (the dataset was created on SAS/Unix then Ftp'd to SFS),
> everything is fine.  But if we try to either update or create a new SAS
> dataset on the NFS-mounted SFS directory, it hangs things up tight.  (Note
> that we are *NOT* trying to read or write the SAS datasets from SAS in CMS
> or anywhere else... just use the NFS-mounted SFS directory space as a
> "remote storage pool".)
>
>
> When it "hangs", the only way to get rid of all the remaining spawn zombies
> is to re-IPL the Linux guest.  The kill command will terminate most of the
> processes, but not all of them.   (Yes, I tried killing them from root...
> every way I knew how... but am always open to new ideas/suggestions!)  I
> have no idea at this stage where the hang-up is occurring -- in the Linux
> NFS software, the Linux kernel itself, the VM/CMS NFS server software, one
> of the IP stacks, SAS, or someplace else.  I'm not even sure at this stage
> how to go about tracking it down, since there are a number of parts/pieces
> that all come into play at various stages (I can function fairly well in
> Linux, but I'm no real geek Linux hacker!).
>
> By "hung", I mean that all I/O (at least as far as I can tell) between the
> SAS program running on Linux, the Linux NFS client representing the
> particular Linux mount point/directory being used, and the VMNFS NFS server
> had ceased to occur.  Also,  any further attempts to initiate I/O to that
> NFS mount point, from any other ID/process also hang.  Even root is no
> longer able to do a simple directory on the mount point (e.g. ls -l
> /terry), it hangs.  It appears to be hung due to some kind of lock, or
> pending some condition/state.  That I can readily ascertain, there is no
> CPU or I/O being burned in a loop.
>
> I do not believe that the problem is SFS, or that SFS is hung.  SFS
> continues to function perfectly normally when accessed from CMS.  I also
> don't *think* that it is the VMNFS server, as that appears to continue to
> function normally for any/all other mount points it is serving, just not
> the one that has hung.
>
> When I kill the originating process, and finally get it and all of its
> spawn killed off, there still remain two of its spawn which I can not kill,
> even from root, no matter what signal I try to use.  The only way I am able
> to reset everything to that mount point, so it can again be made
> operational, is to completely shutdown and re-IPL that Linux instance, and
> then re-mount all the NFS mount points.  I do NOT have to do *anything*
> whatever to VM, SFS or the VMNFS server.
>
> Does that absolve them completely??  LOL... not in MY lifetime!  I've been
> doing this stuff WAYYY too long to believe that until it is PROVEN to me.
> It is certainly possible that the hang is being caused by some bad/goofy
> "permission" within Linux, NFS, VMNFS or SFS itself, or even VMSECURE or
> the ESM... or any other part or piece that may come into play.  But, I do
> tend to *doubt* it, since everything else continues to function as is
> should, and the Linux NFS mount point comes back and functions "normally"
> after Linux has been recycled and the NFS mounts re-issued.
>
> Unless I am missing something somewhere, it is my belief that an
> NFS-mounted SFS directory should not appear any differently to Linux/Unix
> than any other type of file system structure (with the exception of the 8.8
> filename limitation), since it is  a hierarchical tree directory structure
> and supports very large records.  Record format, record length and blocking
> (if any) shouldn't really be a factor if the file is just being written
> from Linux and read by Linux, with nothing else coming along in between and
> mucking with things.  The NFS-mounted SFS should just look like any other
> Linux/Unix directory/filesystem -- just a pool of disk space available to
> use until you've hit your quota.
>
>
> I am running 31-bit Linux "2.4.7-SuSE-SMP #1 SMP Wed Oct 17 15:31:03 GMT
> 2001 s390"  under z/VM V4.3.0 (PUT 0301) on a 2064-1Cx IFL processor.
>
>
> Below is some related file information which may help someone to show me
> the error of my ways and set me on the straight path to righteous NFSing!
>
>
> SAMPLE   SAS7BDAT E1  is a SAS dataset which was created using SAS/Unix
> under AIX and then Ftp'd over to the SFS directory.
> [EMAIL PROTECTED] AS7BDA#N E1 appears to be the start of a new copy of the
> aforementioned dataset, created when we try to update it from SAS/Linux390.
> I am not absolutely certain what ##NFS## #NAMES# E1 is, but it appears to
> be an index/directory of files in that particular SFS directory which have
> been/are being accessed/updated via NFS, and that are in a "locked" state.
> And I'm not certain what ##NFS##  #VHIST#  E1 is, but it appears that it
> might just be a history/listing of any/all pending NFS requests/operations
> against files in that particular SFS directory.  Alan/Romney... am I
> correct, or goofy?  (Hey now, guys... be nice!  I meant in regards to what
> I was thinking!! <g>)
>
>
> Help/comments/suggestions???  Many, many advance thanks!!


In this case I've chosen not to prune;-)

Two things to try:
1. Mount on Linux with '-o nolock.' This is for diagnostic purposes, and
I don't recommend running a production workload like this. If it's a
locking problem, this should circumvent it.

2. Try mounting a filesystem from another Linux client and see whether
you have the problem with that. Be sure you have no firewalls between
the two: while it's possible to push NFS through a firewall, this is not
the time to do it.

3. Are yes. Check there are no firewalls in the way;-)



--


Cheers
John.

Join the "Linux Support by Small Businesses" list at
http://mail.computerdatasafe.com.au/mailman/listinfo/lssb
Copyright John Summerfield. Reproduction prohibited.

Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long)

Reply via email to