Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long)

Ashley Chaloner Fri, 11 Jul 2003 02:03:36 -0700

Ted,

I have had a similar situation mounting an NFS volume (from a Sun) to
either a RH-7.2 or a RH-RawHide VM.


The processes in question seem to be sleeping in function "down" or
"wait_on_inode". So they look like they're in uninterruptible sleep,
so they don't get scheduled, so they never receive their termination
signals.

The problem occurs with automount and ordinary mount, but much more
with automount. If a server goes down and the "hard" option is
specified, the client's process(es) rightly hang until the server
comes back. However, in this case, NFS handles seem to get lost/broken
in a way that the client's processes think the server is down when it
isn't, so they hang.

(Also, a particular annoyance is that processes in uninterruptible
sleep are counted in the load average so there is a high load average
without any load on the processor.)

Conclusion (guessed): The problem is in the kernel NFS code,
perhaps search the source for: wait_on_inode (in fs/inode.c),
nfs_wait_on_inode (in fs/inode.c), down (in asm/semaphore.h).

I hope this helps. (It looks at least like you're closer to absoving
the VMNFS side of things :-)

Ashley Chaloner.

----
DCS,UoW,UK.
http://www.dcs.warwick.ac.uk/~csuwf/
----

On Thu, 10 Jul 2003, Ted Manos wrote:
> Date: Thu, 10 Jul 2003 21:21:15 -0500
> From: Ted Manos <[EMAIL PROTECTED]>
> Reply-To: Linux on 390 Port <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Subject: NFS hangs writing to SFS from SAS/Linux390 (moderately long)
>
> (cross-posted to [EMAIL PROTECTED] and [EMAIL PROTECTED])
>
>
> Hello all (particularly Alan, Romney and crew!),
>
>
> We have been doing testing with a new development version of SAS V9 for
> Linux390 for a couple months now, and had not run into any major issues
> until just recently.  We are near the end of our "Proof of Concept", and
> just ran into this problem which is a major stumbling block for us.  It
> appears to be an NFS locking issue, and not due to SAS.  However, I learned
> many, many moons ago, back when most of my hair wasn't grey (or I even
> *had* most of my hair, for that matter!) to never rule out ANYTHING until a
> problem/issue is resolved.
>
>
> The problem is that when we try to write a SAS-format dataset from
> SAS/Linux to an NFS-mounted CMS SFS directory, it hangs NFS.  If we write
> the SAS dataset to a local Linux directory, everything is fine.  If we have
> SAS read and/or write a flat-file to the NFS-mounted SFS directory, things
> are fine.  If we have SAS *read* a SAS dataset from an NFS-mounted CMS SFS
> directory (the dataset was created on SAS/Unix then Ftp'd to SFS),
> everything is fine.  But if we try to either update or create a new SAS
> dataset on the NFS-mounted SFS directory, it hangs things up tight.  (Note
> that we are *NOT* trying to read or write the SAS datasets from SAS in CMS
> or anywhere else... just use the NFS-mounted SFS directory space as a
> "remote storage pool".)
>
>
> When it "hangs", the only way to get rid of all the remaining spawn zombies
> is to re-IPL the Linux guest.  The kill command will terminate most of the
> processes, but not all of them.   (Yes, I tried killing them from root...
> every way I knew how... but am always open to new ideas/suggestions!)  I
> have no idea at this stage where the hang-up is occurring -- in the Linux
> NFS software, the Linux kernel itself, the VM/CMS NFS server software, one
> of the IP stacks, SAS, or someplace else.  I'm not even sure at this stage
> how to go about tracking it down, since there are a number of parts/pieces
> that all come into play at various stages (I can function fairly well in
> Linux, but I'm no real geek Linux hacker!).
>
> By "hung", I mean that all I/O (at least as far as I can tell) between the
> SAS program running on Linux, the Linux NFS client representing the
> particular Linux mount point/directory being used, and the VMNFS NFS server
> had ceased to occur.  Also,  any further attempts to initiate I/O to that
> NFS mount point, from any other ID/process also hang.  Even root is no
> longer able to do a simple directory on the mount point (e.g. ls -l
> /terry), it hangs.  It appears to be hung due to some kind of lock, or
> pending some condition/state.  That I can readily ascertain, there is no
> CPU or I/O being burned in a loop.
>
> I do not believe that the problem is SFS, or that SFS is hung.  SFS
> continues to function perfectly normally when accessed from CMS.  I also
> don't *think* that it is the VMNFS server, as that appears to continue to
> function normally for any/all other mount points it is serving, just not
> the one that has hung.
>
> When I kill the originating process, and finally get it and all of its
> spawn killed off, there still remain two of its spawn which I can not kill,
> even from root, no matter what signal I try to use.  The only way I am able
> to reset everything to that mount point, so it can again be made
> operational, is to completely shutdown and re-IPL that Linux instance, and
> then re-mount all the NFS mount points.  I do NOT have to do *anything*
> whatever to VM, SFS or the VMNFS server.
>
> Does that absolve them completely??  LOL... not in MY lifetime!  I've been
> doing this stuff WAYYY too long to believe that until it is PROVEN to me.
> It is certainly possible that the hang is being caused by some bad/goofy
> "permission" within Linux, NFS, VMNFS or SFS itself, or even VMSECURE or
> the ESM... or any other part or piece that may come into play.  But, I do
> tend to *doubt* it, since everything else continues to function as is
> should, and the Linux NFS mount point comes back and functions "normally"
> after Linux has been recycled and the NFS mounts re-issued.
>
> Unless I am missing something somewhere, it is my belief that an
> NFS-mounted SFS directory should not appear any differently to Linux/Unix
> than any other type of file system structure (with the exception of the 8.8
> filename limitation), since it is  a hierarchical tree directory structure
> and supports very large records.  Record format, record length and blocking
> (if any) shouldn't really be a factor if the file is just being written
> from Linux and read by Linux, with nothing else coming along in between and
> mucking with things.  The NFS-mounted SFS should just look like any other
> Linux/Unix directory/filesystem -- just a pool of disk space available to
> use until you've hit your quota.
>
>
> I am running 31-bit Linux "2.4.7-SuSE-SMP #1 SMP Wed Oct 17 15:31:03 GMT
> 2001 s390"  under z/VM V4.3.0 (PUT 0301) on a 2064-1Cx IFL processor.
>
>
> Below is some related file information which may help someone to show me
> the error of my ways and set me on the straight path to righteous NFSing!
>
>
> SAMPLE   SAS7BDAT E1  is a SAS dataset which was created using SAS/Unix
> under AIX and then Ftp'd over to the SFS directory.
> [EMAIL PROTECTED] AS7BDA#N E1 appears to be the start of a new copy of the
> aforementioned dataset, created when we try to update it from SAS/Linux390.
> I am not absolutely certain what ##NFS## #NAMES# E1 is, but it appears to
> be an index/directory of files in that particular SFS directory which have
> been/are being accessed/updated via NFS, and that are in a "locked" state.
> And I'm not certain what ##NFS##  #VHIST#  E1 is, but it appears that it
> might just be a history/listing of any/all pending NFS requests/operations
> against files in that particular SFS directory.  Alan/Romney... am I
> correct, or goofy?  (Hey now, guys... be nice!  I meant in regards to what
> I was thinking!! <g>)
>
>
> Help/comments/suggestions???  Many, many advance thanks!!
>
>
> -Ted
>

<snip CMS stuff />

Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long)

Reply via email to