(cross-posted to [EMAIL PROTECTED] and [EMAIL PROTECTED])
Hello all (particularly Alan, Romney and crew!),
We have been doing testing with a new development version of SAS V9 for
Linux390 for a couple months now, and had not run into any major issues
until just recently. We are near the end of our "Proof of Concept", and
just ran into this problem which is a major stumbling block for us. It
appears to be an NFS locking issue, and not due to SAS. However, I learned
many, many moons ago, back when most of my hair wasn't grey (or I even
*had* most of my hair, for that matter!) to never rule out ANYTHING until a
problem/issue is resolved.
The problem is that when we try to write a SAS-format dataset from
SAS/Linux to an NFS-mounted CMS SFS directory, it hangs NFS. If we write
the SAS dataset to a local Linux directory, everything is fine. If we have
SAS read and/or write a flat-file to the NFS-mounted SFS directory, things
are fine. If we have SAS *read* a SAS dataset from an NFS-mounted CMS SFS
directory (the dataset was created on SAS/Unix then Ftp'd to SFS),
everything is fine. But if we try to either update or create a new SAS
dataset on the NFS-mounted SFS directory, it hangs things up tight. (Note
that we are *NOT* trying to read or write the SAS datasets from SAS in CMS
or anywhere else... just use the NFS-mounted SFS directory space as a
"remote storage pool".)
When it "hangs", the only way to get rid of all the remaining spawn zombies
is to re-IPL the Linux guest. The kill command will terminate most of the
processes, but not all of them. (Yes, I tried killing them from root...
every way I knew how... but am always open to new ideas/suggestions!) I
have no idea at this stage where the hang-up is occurring -- in the Linux
NFS software, the Linux kernel itself, the VM/CMS NFS server software, one
of the IP stacks, SAS, or someplace else. I'm not even sure at this stage
how to go about tracking it down, since there are a number of parts/pieces
that all come into play at various stages (I can function fairly well in
Linux, but I'm no real geek Linux hacker!).
By "hung", I mean that all I/O (at least as far as I can tell) between the
SAS program running on Linux, the Linux NFS client representing the
particular Linux mount point/directory being used, and the VMNFS NFS server
had ceased to occur. Also, any further attempts to initiate I/O to that
NFS mount point, from any other ID/process also hang. Even root is no
longer able to do a simple directory on the mount point (e.g. ls -l
/terry), it hangs. It appears to be hung due to some kind of lock, or
pending some condition/state. That I can readily ascertain, there is no
CPU or I/O being burned in a loop.
I do not believe that the problem is SFS, or that SFS is hung. SFS
continues to function perfectly normally when accessed from CMS. I also
don't *think* that it is the VMNFS server, as that appears to continue to
function normally for any/all other mount points it is serving, just not
the one that has hung.
When I kill the originating process, and finally get it and all of its
spawn killed off, there still remain two of its spawn which I can not kill,
even from root, no matter what signal I try to use. The only way I am able
to reset everything to that mount point, so it can again be made
operational, is to completely shutdown and re-IPL that Linux instance, and
then re-mount all the NFS mount points. I do NOT have to do *anything*
whatever to VM, SFS or the VMNFS server.
Does that absolve them completely?? LOL... not in MY lifetime! I've been
doing this stuff WAYYY too long to believe that until it is PROVEN to me.
It is certainly possible that the hang is being caused by some bad/goofy
"permission" within Linux, NFS, VMNFS or SFS itself, or even VMSECURE or
the ESM... or any other part or piece that may come into play. But, I do
tend to *doubt* it, since everything else continues to function as is
should, and the Linux NFS mount point comes back and functions "normally"
after Linux has been recycled and the NFS mounts re-issued.
Unless I am missing something somewhere, it is my belief that an
NFS-mounted SFS directory should not appear any differently to Linux/Unix
than any other type of file system structure (with the exception of the 8.8
filename limitation), since it is a hierarchical tree directory structure
and supports very large records. Record format, record length and blocking
(if any) shouldn't really be a factor if the file is just being written
from Linux and read by Linux, with nothing else coming along in between and
mucking with things. The NFS-mounted SFS should just look like any other
Linux/Unix directory/filesystem -- just a pool of disk space available to
use until you've hit your quota.
I am running 31-bit Linux "2.4.7-SuSE-SMP #1 SMP Wed Oct 17 15:31:03 GMT
2001 s390" under z/VM V4.3.0 (PUT 0301) on a 2064-1Cx IFL processor.
Below is some related file information which may help someone to show me
the error of my ways and set me on the straight path to righteous NFSing!
SAMPLE SAS7BDAT E1 is a SAS dataset which was created using SAS/Unix
under AIX and then Ftp'd over to the SFS directory.
[EMAIL PROTECTED] AS7BDA#N E1 appears to be the start of a new copy of the
aforementioned dataset, created when we try to update it from SAS/Linux390.
I am not absolutely certain what ##NFS## #NAMES# E1 is, but it appears to
be an index/directory of files in that particular SFS directory which have
been/are being accessed/updated via NFS, and that are in a "locked" state.
And I'm not certain what ##NFS## #VHIST# E1 is, but it appears that it
might just be a history/listing of any/all pending NFS requests/operations
against files in that particular SFS directory. Alan/Romney... am I
correct, or goofy? (Hey now, guys... be nice! I meant in regards to what
I was thinking!! <g>)
Help/comments/suggestions??? Many, many advance thanks!!
-Ted
DVG895 FILELIST A0 V 169 Trunc=169 Size=306 Line=204 Col=1 Alt=632
Directory = VMSYSU:DTDC37.
Cmd Filename Filetype Fm Format Lrecl Records Blocks Date
Time
SAMPLE SAS7BDAT E1 V 48387 23 31 1/20/03
14:25:49
[EMAIL PROTECTED] AS7BDA#N E1 V 512 1 1 7/09/03
15:08:54
Cmd Filename Filetype Fm Format Lrecl Records Blocks Date
Time
##NFS## #NAMES# E1 V 59 3 1 7/09/03
15:08:54
##NFS## #VHIST# E1 F 16 4 1 7/09/03
15:08:54
[EMAIL PROTECTED] AS7BDA#N E1 V 512 1 1 7/09/03
15:08:54
HUSKER2 DATA E1 V 52 78 2 7/09/03
14:57:15
##NFS## #NAMES# E1 V 80 Trunc=80 Size=3 Line=1 Col=1 Alt=0
====>
|...+....1....+....2....+....3....+....4....+....5....+....6....+....7....+....>
00001 [EMAIL PROTECTED]@#N r �sq: golf.sas7bdat.lck
00002 [EMAIL PROTECTED] r �s>O gdata.sas7bdat.lck
00003 [EMAIL PROTECTED] r ��C sample.sas7bdat.lck
00004 * * * End of File * * *
C7D6D3C6 7CE2C1E2 F7C2C4C1 E37C7BD5 00020000 00001099 00000001 00000001
B9A2987A 00000011 87969386 4BA281A2 F7828481 A34B9383 9240
G O L F @ S A S 7 B D A T @ # N r
s q : g o l f . s a s 7 b d a t . l c k
4040 40404040 ...
C7C4C1E3 C17CE2C1 E2F7C2C4 C1E37BD5 00020000 00001099 00000001 00000001
B9A26ED6 00000012 878481A3 814BA281 A2F78284 81A34B93 8392
G D A T A @ S A S 7 B D A T # N r
s > O g d a t a . s a s 7 b d a t . l c k
4040 40404040 ...
E2C1D4D7 D3C57CE2 C1E2F7C2 C4C17BD5 00020000 00001099 00000001 00000001
B9B1C317 00000013 A2819497 93854BA2 81A2F782 8481A34B 9383
S A M P L E @ S A S 7 B D A # N r
C s a m p l e . s a s 7 b d a t . l c
9240 40404040 ...
k
EOF:
##NFS## #VHIST# E1 F 16 Trunc=16 Size=4 Line=1 Col=1 Alt=0
====>
|...+....1....+>
00001 �
00002 �
00003 �
00004 �
00005 * * * End of File * * *
00020000 000625B6 0000001A 00000001
00020000 000625B2 00000010 00000001
00020000 00063B54 00000001 00000001
00020000 00063B55 00000001 00000001
EOF: