Hello,
        I have discovered and fixed a bug in the Solaris version of
afsmonitor found in JPL's AFS cell:
/afs/jpl.nasa.gov/system/sun4m_54/usr/afsws/bin/afsmonitor

The bug is serious: afsmonitor thinks it cannot resolve any hostname
except localhost, rendering it useless:

% afsmonitor -fs oscar.transarc.com
[ parse_hostEntry ] Unable to resolve hostname oscar.transarc.com

Note that other AFS tools work, isolating the problem in afsmonitor:
% bos status oscar.transarc.com -noauth
Instance runntp, currently running normally.
...[stuff deleted]

I don't know if Transarc is currently aware of this problem, since it
is obvious, or if there's a patch available, or if it's been releasd,
or is so who has it. But I do know how to fix it :-) The command-line
fix soon follows using Sun's supplied debugger, adb, for those just
interested in fixing it. A more detailed discussion at the network
programming/assembly level follows for those still interested.

Here's several ways of identifying the version of afsmonitor I'm
working with, to make sure you're working with the same:

% sum /usr/afsws/bin/afsmonitor 
18649 896 /usr/afsws/bin/afsmonitor

% what /usr/afsws/bin/afsmonitor | head -3
/usr/afsws/bin/afsmonitor:
        Base configuration afs3.4 4.30
        $Id: rx.c,v 2.312 1995/10/20 19:50:15 zumach Exp $

% md5 /usr/afsws/bin/afsmonitor
4d82a0220fc1fc51a4e5aee550b1e8ac        /usr/afsws/bin/afsmonitor

Assume you have a *copy* of afsmonitor in the current directory that
you are going to futz with: (we *always* work with a backup copy,
right? ;)

---> ******************************************************** <---
---> * The Magick Command Line Fix (c) Daniel Bromberg 1996 * <---
---> ******************************************************** <---

% echo '1a068?W 94102004' | adb -w afsmonitor
0x1a068:        0x94102020      =       0x94102004

% echo '1a078?W 92102004' | adb -w afsmonitor
0x1a078:        0x92102020      =       0x92102004

The lines below the command lines are of course what adb should spew
back at you. afsmonitor should now work normally.

--------------------

OK, for the hackers:

The problem is in the procedure GetHostByName(), afsmonitor's wrapper
that coordinates the calls to gethostbyname() and gethostbyaddr().
Before this dump, what's happened is gethostbyname() has already been
called on the given hostname, and returned the proper result in the
hostent structure pointed at by [ %fp - 36 ].

The buggy assembly is here:
0x1a064 <GetHostByName+68>:     add  %fp, -36, %o1
0x1a068 <GetHostByName+72>:     mov  0x20, %o2          <--ERROR
0x1a06c <GetHostByName+76>:     call  0x542ec <bcopy>
0x1a070 <GetHostByName+80>:     nop 
0x1a074 <GetHostByName+84>:     add  %fp, -36, %o0
0x1a078 <GetHostByName+88>:     mov  0x20, %o1          <--ERROR
0x1a07c <GetHostByName+92>:     mov  2, %o2
0x1a080 <GetHostByName+96>:     call  0x73510 <gethostbyaddr>
0x1a084 <GetHostByName+100>:    nop 
0x1a088 <GetHostByName+104>:    st  %o0, [ %fp + -4 ]
0x1a08c <GetHostByName+108>:    ld  [ %fp + -4 ], %o0
0x1a090 <GetHostByName+112>:    b  0x1a098 <GetHostByName+120>
0x1a094 <GetHostByName+116>:    nop 
0x1a098 <GetHostByName+120>:    mov  %o0, %i0
0x1a09c <GetHostByName+124>:    ret 
0x1a0a0 <GetHostByName+128>:    restore 

Currently, the resolved address is stored 0x36 bytes above the frame
pointer, fp. We can see the top two lines are setting up the call to
bcopy. (For some reason we make a copy of the address before calling
gethostbyaddr() on it.) However, bcopy is being told to copy 0x20 (32)
bytes of data!  Yet the address, as all IP addresses are, is only 4
bytes - albeit 32 *bits*. I think we can guess how someone's thinking
went awry here. :-) This error alone might not be fatal if the
clobbered 24 bytes were not important, but the next error definitely
is. This time, at 0x1a078, we're seeting up the call to gethostbyaddr(),
and telling it we're giving it a 32 byte address once again! Of course
gethostbyaddr() fails, returning 0 (NULL) in %o0 rather than a pointer
to a hostent structure (see /usr/include/netdb.h). So, all we do is
re-assemble these instructions and change them:

(gdb) x 0x1a068
0x1a068 <GetHostByName+72>:     0x94102020
(gdb) x 0x1a078
0x1a078 <GetHostByName+88>:     0x92102020

Constants on SPARCs fit in the lower 12 bits, so the LSB 0x20 is all
we need to change, to 0x04, hence the adb command-line.

Voila. Any comments/additions/follow-up, please notify me! Thanks and
good monitoring. Respectfully submitted,

                                        Daniel Bromberg
                                        [EMAIL PROTECTED]
                                        Co-op, Jet Propulsion Laboratory
                                        M/S 126-130
                                        (818) 393-3872

Reply via email to