We are experiencing some rather odd bugs, and we are beginning to wonder if
anyone else running AIX3.2 and AFS3.2 (transarc) is using the "machines on
acl's" feature provided with AFS3.2.

Our environment: both clients and servers are IBM RS6000's running AIX3.2
plus the usual ungodly number of AIX patches with no meaningful level
information associated. The last set claiming a version number was 3.2.2

We have, at present, two servers and about 30 clients. In order to make OS
and application software available to the machines and a large number of
users who are not yet in AFS, we are using the "machines on acl's" feature.
This allows us to make the software available to our machines and non-
authenticated users without making it illegally available to the entire
world.

This is good, it does what we want. The problem is that three times in about
a month of running this system, we have had one (not the same one) of the
client machines lose its access to one or more volumes which has the group
containing the machine on its acl. It can typically access system:anyuser
protected things, and commonly can access several volumes protected
exactly the same way as the ones it cannot access. Authenticating as a user
on the acls of the volumes having problems allows access from the client
that is having trouble accessing them as a machine.

As near as I can tell in 3 instances, either restarting (bos restart -
bosserver <server>) all servers or salvaging and then restarting has solved
the problem. That is, in one instance I don't know when it cleared, in one I
tried restarting the servers, that did not seem to cut it, and I salvaged and
then restarted, which cleared it, and in the latest I cleared it by
restarting
all servers. Other clients which are also machines on acls have no problems
accessing the same volumes.

The other thing which seems to be the case, but is not certain, is that this
problem appears to originate when the fileservers restart themselves, given
that the servers restart themselves Sunday morning and we have had the
problem reported Monday morning and Sunday night.

We are working with Transarc to try and debug the problem; But we are led
to wonder, and thus inquire, if anyone else running rs_aix32 AFS3.2 code on
servers and clients is using the "machines on acl's" feature. If so, how
long,
and on how many machines; and has this happened to you, or are we somehow
unique? We may want to get into a detailed comparison of patches applied to
AIX if there is a working installation using this feature.

For that matter, are people using other server systems using this feature
without problems?  We are rather curious - We _really_ like the function
and features of AFS, it makes administering our workstations _much_
easier. This bug, however, has big teeth and bites hard. We'd like to squash
it.  Not using the feature is somewhere between painful and "just won't do
what we need".

-- 
-Lawrence Smith, MSC Computing Staff - Cornell Univ., Ithaca, NY
 [EMAIL PROTECTED] [EMAIL PROTECTED] (607)255-6064
-Cats, Coffee, Chocolate... Vices to live by. 

Reply via email to