We are experiencing some rather odd bugs, and we are beginning to wonder if anyone else running AIX3.2 and AFS3.2 (transarc) is using the "machines on acl's" feature provided with AFS3.2. Our environment: both clients and servers are IBM RS6000's running AIX3.2 plus the usual ungodly number of AIX patches with no meaningful level information associated. The last set claiming a version number was 3.2.2 We have, at present, two servers and about 30 clients. In order to make OS and application software available to the machines and a large number of users who are not yet in AFS, we are using the "machines on acl's" feature. This allows us to make the software available to our machines and non- authenticated users without making it illegally available to the entire world. This is good, it does what we want. The problem is that three times in about a month of running this system, we have had one (not the same one) of the client machines lose its access to one or more volumes which has the group containing the machine on its acl. It can typically access system:anyuser protected things, and commonly can access several volumes protected exactly the same way as the ones it cannot access. Authenticating as a user on the acls of the volumes having problems allows access from the client that is having trouble accessing them as a machine. As near as I can tell in 3 instances, either restarting (bos restart - bosserver <server>) all servers or salvaging and then restarting has solved the problem. That is, in one instance I don't know when it cleared, in one I tried restarting the servers, that did not seem to cut it, and I salvaged and then restarted, which cleared it, and in the latest I cleared it by restarting all servers. Other clients which are also machines on acls have no problems accessing the same volumes. The other thing which seems to be the case, but is not certain, is that this problem appears to originate when the fileservers restart themselves, given that the servers restart themselves Sunday morning and we have had the problem reported Monday morning and Sunday night. We are working with Transarc to try and debug the problem; But we are led to wonder, and thus inquire, if anyone else running rs_aix32 AFS3.2 code on servers and clients is using the "machines on acl's" feature. If so, how long, and on how many machines; and has this happened to you, or are we somehow unique? We may want to get into a detailed comparison of patches applied to AIX if there is a working installation using this feature. For that matter, are people using other server systems using this feature without problems? We are rather curious - We _really_ like the function and features of AFS, it makes administering our workstations _much_ easier. This bug, however, has big teeth and bites hard. We'd like to squash it. Not using the feature is somewhere between painful and "just won't do what we need". -- -Lawrence Smith, MSC Computing Staff - Cornell Univ., Ithaca, NY [EMAIL PROTECTED] [EMAIL PROTECTED] (607)255-6064 -Cats, Coffee, Chocolate... Vices to live by.
