Re: [OpenAFS] Changing host- and domainname
> Yes, that helped, see also my previous message. Our problem was, that > we didn't look for the NetInfo file in /var/lib/openafs/local/, but > only in /etc/openafs. Hint from the trenches: If you want different services to bind to different interfaces (for example fileserver to one IP and database servers to other interface) that can be done by having the binaries looking for differently named NetInfo files containing different IP numbers and using -rxbind. For examle NetI.db and NetI.fs. These file name examples happen to have the same length as NetInfo. Btw, it is bad practice to patch binaries without documenting it and I would be happier if there would be another way to do it, but I never came around to write a patch for that. Just saying ;-) Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Changing host- and domainname
Does it help to add the IP addr you want to the NetInfo file (creating on at the right place if it does not exist?) Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: openafs versus systemd
I think a step-by-step guide how to run an Ubuntu 22.04LTS and 23.04 desktop along with OpenAFS would be very much appreciated because I hear that folks are struggling with this and as it "is not possible" do use that argument to "then we can not run AFS - period". Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] aklog: unknown RPC error (-1765328370) while getting AFS tickets
> Good to know, in my case I am setting up new kerberos realm and new > OpenAFS cells just for testing. This ambiguos afs principal is good > for me, but maybe not enough for other people. Use the afs/cell-name. It has worked for me for years in different setups. It's better. Listen to Jeff (if not to me ;-) Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Limiting mount point to known cells
I would look for the AFSDB RR DNS lookup in the code and somehow prevent that names without dot in the middle are looked up - just fail it. But there are folks who are much more familiar with the code that me. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Limiting mount point to known cells
> I seem to remember seeing many paths of the form /afs/cs/ or /afs/ece/ > where the full cell names were cs.cmu.edu or ece.cmu.edu. But probably "ece" was entered into CellServDB and not into DNS. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Limiting mount point to known cells
> In the same thread, a blacklist (or whitelist) of cell names was > suggested to prevent afsdb queries for troublesome domains but it > seems it never got implemented. If the blacklist specification is visible and not hidden in some new magic file, I think that would be good. My suggestion would be to add the possibility to specify this in CellServDB. >git BLACKLIST or something like that. Because then anyone who wants a cell named "git" (you never know the users' wishes) would see this when looking through CellServDB to determine why it does not work as expected. I am normally not for blacklists, but what can you do? But wait a moment... Can't we assume that all cell names that we ask in DNS contain at least one dot "." in the middle? I doubt that there are AFS cells named without dot that we need to resolve with DNS. What do you think about that? Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Networking AFS related problem
> I actually solved the problem in a very dumb way. Turns out I had never > rebooted my phone or cycle my mobile connection since the problem > appeared. I just rebooted and the problem was gone. Well, such things happen. I had a "home router" provided by my ISP at some point in time which never got any firmware updates as the ISP relied on router reboots for that to happen. And mine was on UPS and did never reboot ever, so I did see errors which already were fixed but did not have made it into my box. Took a while to find that one. > Anyway, thank you everyone for all your help! It turned out to be kind > of a waste of time, sorry for that. Well, I hope we all learned a litte more about AFS UDP transport in the process (Ok, Jeff, you already know everything ;-) ;-) ). I don't feel that you wasted my time, it's a give and take. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Networking AFS related problem
Hi Jeff! > It is unlikely that an ISP is blocking UDP traffic. For some value of "ISP". I have been to Karolinska Institutet who did supply Internet through the same "eduroam" cooperation as my home university. However, the "AFS experience" was totally different as in "non existent" on that "eduroam" as they had implemented ... > The most likely > causes are a poorly implemented firewall ...firewall rules which blocked most of the UDP ports. > It has been reported[1] that more than half of the web browser > connections from Chrome browers to Google's servers are performed > using the UDP based QUIC protocol. But I bet chrome will work around the situation "no UDP" to a much greater extent than AFS. Greetigs, Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Networking AFS related problem
Yes, the first packets are quite small until you start to actually transfer chunks of files. If you don't see any traffic coming back then something is blocking On the server (the VLDB server(s) would be the first ones to be contacted as you see) you can see if outgoing or incoming is blocked. If NAT is involved, i can be broken NAT as well. I guess your IP provider lives in the IT world of 2022 where "Internet service" consists of mostly TCP/HTTPS and definitely not UDP ;-) Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Networking AFS related problem
> Since AFS is working perfectly as soon as I change my WiFi connection > to other connections, and the cell is perfectly reachable via a remote > machine, I think the problem is with my ISP, but I don't have enough > knowledge about AFS or networking to pinpoint the problem precisely. > (Of course, "normal" web browsing is perfectly okay even with this > problematic connection.) Whem this occurs, then I am always reminded that AFS uses UDP over it's own ports and not TCP with http or https. > I would be very grateful if anyone can help me gather some more > debugging info about this problem so I can give it to my ISP for them > to fix it. I would start with wireshark and filter on UDP ports 7000 and 7001, then look if the packets are going out and if you get answers back from the file servers and if you do if they are complete. Sometimes bad networks drop or truncate UDP packets (either in or out) and if they only truncate them, it can help to make your MTU size smaller, so that not all default 1500 bytes are used. I test like 1400, 1200, .. Restart afsd with additional parameter -rxmaxmtu SIZE. Good luck, Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Fwd: CRITICAL: RHEL7/CentOS7/SL7 client systems - AuriStorFS v2021.05-10 released > OpenAFS versions?
> Any version of OpenAFS cache manager configured with a disk cache ... But mem cache OK? Regards, Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Redux: Linux: systemctl --user vs. AFS
> We have been using Kerberos for a LONG time; over 20 years. Hi Ken! Nice to hear from you :-) > A long time ago we ran into issues with widespread Kerberos ticket theft > from attackers, due to the quite-common usage at that time of Kerberos > tickets being stored in files. So why is storage in files so much more dangrous than storage in memory? If one happens to get a process which can read the files in local /tmp, why could that process not modify any of /proc//mem on the same computer to get at the ticket cache anyway? OK, one benefit of memory is that it is automatically destroyed when no processes accesses it any more. But other than that? Harald. PS: Currently I'm dealing again with the "uid is security enough" people which are showing every time one buys a product together with the software (vendor does not offer feature "kerberos" bla bla bla ...) ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 14 Jan 2021 08:25:36 GMT Breakage in RX Connection ID calculation
> now : 3x 1.6.22.1 > intermediate: 2x 1.6.22.1 + 1x 1.8.7 > intermediate: 1x 1.6.22.1 + 2x 1.8.7 > final : 3x 1.8.7 I have for many years upgraded db severs in this manner. So I expect that it works this time as well. Actually we are currently runnig a mix of 1.8 and 1.6 without problems on our DBs. I don't think I ever had problems which were related to different versions. I try to do it in an order which here would make the 1.8.7 the new syncsite in step 2. Then test if everyting works as expected because from that point you can still back out and get quorum again with 2x 1.6.22. If there have been problems with db server upgrades it has been because of errors between chair and keyboard, like typos in config files. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
vos/bos and the kernel (Re: [OpenAFS] 2020 AFS Technologies Workshop Cancelled.. kafs update)
(I did cut down the cc list a little.) > The OpenAFS bos, vos and backup commands can be run from the client too, I > think, since they don't require any interaction with the afs kernel module. They use the token. I think they _could_ make an own token from the ticket but don't. Another thing vos is unaware of is the status of the servers over time. Even if the kernel knows that certain servers are down, vos does not and will still try to contact them and timeout. And the next invocation of vos _again_. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] vos release not as "smart" as I thought - or is this a not implemented feature?
> https://lists.openafs.org/pipermail/openafs-info/2019-September/042865.html Hi Ben! I read it previously but did probably not understand. Does this mean that it always pulls everything from "last update" minus 15 minutes even if the last pull was long _after_ the creation date of the readonly? Does that make sense? Here I have: RO dates CreationMon Nov 25 14:11:46 2019 CopyMon Nov 25 10:51:37 2019 Backup Mon Nov 25 01:17:02 2019 Last Access Mon Nov 25 10:57:40 2019 Last Update Mon Nov 25 10:57:40 2019 RW last update: Last Update Mon Nov 25 10:57:40 2019 But the logic does not seem to look at the RO creation time? $ vos rele H.haba.android -verbose -c stacken.kth.se H.haba.android RWrite: 536916074 ROnly: 536916075 Backup: 536916076 number of sites -> 3 server bananshake.stacken.kth.se partition /vicepb RW Site server bananshake.stacken.kth.se partition /vicepb RO Site server vaniljshake.stacken.kth.se partition /vicepb RO Site This is a complete release of volume 536916074 Re-cloning permanent RO volume 536916075 ... done Getting status of parent volume 536916074... done Starting transaction on RO clone volume 536916075... done Setting volume flags for volume 536916075... done Ending transaction on volume 536916075... done Replacing VLDB entry for H.haba.android... done Starting transaction on cloned volume 536916075... done Updating existing ro volume 536916075 on vaniljshake.stacken.kth.se ... Starting ForwardMulti from 536916075 to 536916075 on vaniljshake.stacken.kth.se (as of Mon Nov 25 10:57:38 2019). updating VLDB ... done Released volume H.haba.android successfully And after we got a new creation date yet more in the future (compared to the Last Update date): H.haba.android.readonly 536916075 RO3062302 K On-line vaniljshake.stacken.kth.se /vicepb RWrite 536916074 ROnly 0 Backup 0 MaxQuota 5000 K CreationThu Nov 28 09:56:40 2019 CopyMon Nov 25 10:51:37 2019 Backup Thu Nov 28 01:17:02 2019 Last Access Mon Nov 25 10:57:40 2019 Last Update Mon Nov 25 10:57:40 2019 Does that mean I should look at the horriffic source code of vos again? Cheers, Harald. PS: Jeff> The behavior you are expecting is implemented Jeff> by AuriStorFS vos And for various "reasons" (Beer often helps to understand these kind of reasons) PDC/KTH could not be convinced to shop that (yet?). But that would not have helped for the Stacken cell anyway. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Question regarding vos release and volume
> root.afs > RWrite: 536870915 ROnly: 536870916 > number of sites -> 2 >server server1.mydomain.dmz partition /vicepa RW Site >server server2.mydomain.dmz partition /vicepa RO Site If you want added redundancy and not all accesses to the RO only on server2, you need to add a RO volume like this: > root.afs > RWrite: 536870915 ROnly: 536870916 > number of sites -> 2 >server server1.mydomain.dmz partition /vicepa RW Site >server server1.mydomain.dmz partition /vicepa RO Site >server server2.mydomain.dmz partition /vicepa RO Site This will not use much extra space as the RO and RW on server1 will share data as long as it is not modified. Then to your problem... What does vos listvol say about vicepf on both servers? Do you have the /vicepf/V0.vol files? Can you vos dump the volumes from the specific server and partition? Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] auristor client with AFS servers, timeout at aklog
Hi Jeff, hi Måns! > Create separate IPv4 A records to refer to your hosts and list those in > the DNS SRV records instead of the hostname that includes both A and > records. That's of course how it should be. > The AuriStorFS rx stack will terminate calls within one second if an > ICMP6 port unreachable response is received. I wonder if the 10 second > delay is due to ICMP6 packets being firewalled. I know that Måns can operate tcpdump to determine that ;-) Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] About `dafileserver` vs `fileserver` differences (for small cells)
> However is it still "safe" and "advised" (outside of these > disadvantages) to run the old `fileserver` component? I would recommend everyone to migrate to "da" and not recommend to start with anything old. For obvious reasons, all the big installations will migrate to "da" and you don't want to run another codebase, don't you? Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] kafs client bugs
> From just the filenames, I don't see what some of the tests are meant to do - > take "discon-create" for example. This seems to be using some feature of Arla > that isn't in OpenAFS. You found the test of the Arla disconnected mode AFS feature (which was not 100% finished but you could tell arla that it was running in disconnected mode and then if you were lucky enough that your files were in cache you could continue to read them). > Would it be possible to import some of these tests into my kafs-client > package? I'm not clear what the licence on them is. I think some of the tests are so trivial (write1) that everyone would have done it like that. Others which are more elaborate like write-6G-file.c have a BSD style license like most of Arla but the Linux kernel module which is GPL/BSD dual licensed. So you probably should not distribute the BSD licensed stuff under the GPL but nothing forbids seperate distribution and usage of the tests for other file systems. I have even found local file system problems with these tests. And as usual: I am not a lawyer, so this is how I interpret what's in the source ;-) I would just run all of the tests to start with and of the ones that fail decide one of * fix afs * fix test * delete test Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] kafs client bugs
Hi David, remember Arla? There are a lot of tests of which many may be useful to run on kafs as well. Some test for the command line tools may be different though. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Upgrade of server by moving drives
Hi Mike and Michael! You have the chance to confuse everyone if you write your emails From: Mike and sign below with Michael and the other way around. :-) ;-) MM> FYI, The 'vos remaddrs' command has been available since OpenAFS 1.6.16 and can MM> be used to remove unreferenced server entries in the VLDB. You're right. Time flies and it made it into 1.6.16 which by now should be wide spread enough. Good. However, I admit, I still have systems where the client is based on older AFS releases where it's hidden in changeaddr -remove. MK> It works as i expected (...) MK> Nice piece of Software. You're welcome. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Upgrade of server by moving drives
> The "vos syncserv" and/or "vos syncvldb" commands can help with this > (but I always have to read the docs to remember which to use). If the volumes on the server are the real world and the VLDB is the map to find them, then syncvldb adjusts the map to match the world syncserv adjusts the real world to match the map I mostly use syncvldb and delentry to adjust the VLDB and never found a good use for syncserv. If there are old duplicates of volumes not in the VLDB which I really want to remove that can be done with vos zap. And I forgot to mention that 7. Clean up the temp IP/sysid in the VLDB is done like this: That command is not named vos removeaddr but hidden in vos changeaddr -oldaddr A.B.C.D -remove Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: Upgrade of server by moving drives,[OpenAFS] Upgrade of server by moving drives
> We would prefer to move the array containing our vicep* partitions from > the old to the new server and bring that one up. Yes, that is possible. Nothing on the /vicep* says what server it was on, that info is only on the VLDB(s) in the DB servers. The identity of the file server is the uuid ( vos listaddr -printuuid ). Then a fileserver has one or more IPs. The uuid is on the fileserver stored in the sysid file. If you have the DB and file servers on seperate computers i makes the procedure shown below easier because you can do one thing at the time. As long as you have another DB server up and running it should work as well with a DB server on the same computer but I have not tested this as we always have fileservers and DB servers seperate. This is how I upgrade a fileserver without move (backups taken and verified before you start yada-yada-yada): 1. Make new fileserver with temporary IP and test with temporary /vicepa 2. Shutdown fileserver process (but keep computer up) and remove test /vicepa 3. Copy (overwrite) the sysid file from the old fileserver (most likely /var/lib/openafs/local/sysid) to the new one 4. Shutdown old file server process and remove the sysid file and IP numbers so I don't start a duplicate by accident. Then shutdown OS. 5. Connect storage to new file server and make it visible as exactly the same /vicep* as on the old one. This may need a reboot depending on the storage. Move IP numbers to new server. 6. Start new fileserver process with the same sysid and IP of the old one. The new fileserver now looks like the old one to the VLDB. 7. Clean up the temp IP/sysid in the VLDB. 8. Enjoy (you had downtime from 4 to 6 but that might be done quite fast) If you screw up with the sysid or want to connect differnent /vicep* to different future servers, (that can not be done with the sysid trick), that can be repaired with the vos syncvldb command. DB servers you upgrade by having several and then upgrade the one that is not the current syncsite. When starting empty, that non-syncsite will then be recloned automatically. The udebug command is your friend there to monitor what's going on. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] server crash / moving volumes without vos move
> So my idea is to mount the vicepa (from external SAN) from the crashed > file server to the working fileserver as /vicepb so i could get the > volumes and register the volumes to the working afs fileserver. Yes, that will need a file server restart to make the working file server look for a new partition. Then you can see all volumes with vos listvol. Then read the vos syncvldb man page and continue from there, you can restrict the syncvldb to partition b. Viel Glück, Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Red Hat EL Support Customers - Please open a support case for kafs in RHEL8,Re: [OpenAFS] Red Hat EL Support Customers - Please open a support case for kafs in RHEL8
> While getting tokens at login work with this setup, things start to fail > once the users $HOME is set to be in /afs. While simple scenarios like > pure shell/console logins work, graphical desktop environments have lots > of problems. XFCE4 doesn't even start, Plasma works to some degree after > presenting lots of error dialogs to the user. Is this a problem due to AFS or due to the startup of the graphical environment which nowadays may involve systemd --user services instead of running all processes in the same session? > Seems there's still some work to do until this becomes an alternative > for the standard OpenAFS client. It may make a "beta", so yes. > So I wonder why RH customers would want that? RH decided that their customers wanted systemd ;-> Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Red Hat EL Support Customers - Please open a support case for kafs in RHEL8
Hi Jeff, hi David! Has it been 17 years? Well, we are all getting - mature ;-) Obviously a file system is ready for use if it's old enough to buy liquor (which difffers a little between countries). > When opening a support case please specify: > > Product: Red Hat Enterprise Linux > Version: 8.0 Beta > Case Type:Feature / Enhancement Request > Hostname: hostname of the system on which RHEL8 beta was installed We have a hen and egg problem here: Why would I install 8.0b on if it does not have kafs? Install a Fedora test, sure, but RHEL 8.0b? > If you are eligible, please attempt to open a support request by > December 11th. 3 workdays. Optimistic. > As part of the Linux kernel source tree, kafs is indirectly supported by > the entire Linux kernel development community. All of the automated > testing performed against the mainline kernel is also performed against > kafs. But the automated testing does probably not (yet) fetch a single file from an AFS server. (Compare to how the gssapi-key-exchange features in ssh are never tested in the ssh/scp shipped with distributions as the testing never fetched a single file with that feature - with known results). Testing that requires infrastructure is a lot of work to automate. Sorry, I may sound much more pessimistic here than I am actually are. This _might_ fly. I wish :-) Season's greetings, Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Offline migration of AFS partition files (i.e. the contents of `/vicepX`)
> Am I safe to assume that on Linux only the "new" partition layout is > used? (Is there a way to check which layout am I using?) Yes (under the old layout, you don't even see the files as they are not hooked into directories). Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Offline migration of AFS partition files (i.e. the contents of `/vicepX`)
> Can I safely `rsync` the files from the old partition to the new one? For Linux (The "new" server partition layout): If the file tree really is exactly copied (including all permissions and chmod-bits) then you have everything you need. This was not true for the old file system layout for example in SunOS UFS. I would copy to a not-yet used partition, mount it then as /vicepY (where Y is a new unused letter) and then as the first thing when startting the server run a salvage with the options -orphans attach -salvagedirs -force Harald. PS: I like zfs ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] problems with ubuntu 18.04 client
>> Are you in the same or different PAG? > > Hmm, i think that's not the reason, > if i login into the same Computer, the tree /afs/desy.de/user is also > missing for me ... Yes, but a new login probably gives you a new pag and a new security context. Question is if the old pag and security context still continues to work and if the answer is yes why in that case that differs from a new pag. Any complaints in the file server log about the client in question? Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] problems with ubuntu 18.04 client
> In an old terminal (where afs was running well) everyhing seems to be > ok, create files,folder, pwd... etc) Are you in the same or different PAG? Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 1.8.0: Cache inconsistent after vos release
> I was able to reproduce this today and have submitted a patch for review: > https://gerrit.openafs.org/#/c/13090/ > This patch works in my testing. Mark, that's just great. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] 1.8.0: Cache inconsistent after vos release
We see problems when doing a vos release that the 1.8.0 clients "miss" that the ro volume has changed. We think this is repeatable. Our setup is: * server version OpenAFS 1.6.22 2017-12-07 Centos * client version OpenAFS 1.8.0 2018-05-08 Both on Centos and Ubuntu As it works with older AFS _clients_ and the same server, we suspect that this is a 1.8.0 issue. An fs flushv fixes it. I am not aware of any firewalls or NAT between server and client. Before digging deeper: Has anyone of you seen this as well? If not, you might want to test this if you think about moving to 1.8 or already have. Thanks, Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] question about authentication with kerberos and Default principal
> Does heimdal-klist use /etc/krb5.conf or does it use some other > configuration file? I'm worried I did not set up a config file. It should use /etc/krb5.conf as well unless KRB5_CONFIG is set. You should have something like: [libdefaults] default_realm = YOURDOMAIN in there. > [gsgatlin@localhost ~]$ /usr/bin/heimdal-kinit gsgatlin or use /usr/bin/heimdal-kinit gsgatlin@YOURDOMAIN > Also, going back to the krb5 kinit, how can you specify a FILE: ticket > cache type ? Both MIT kinit and heimdal kinit honor the KRB5CCNAME environment variable which has the form TYPE:location thus a typical way to set your FILE cache is: export KRB5CCNAME=FILE:/tmp/krb5cc_`id -u` Btw: As FILE: is the oldest ticket cache type and the default, any file name will do. For example: export KRB5CCNAME=/tmp/whatever will set it to /tmp/whatever Greetings, Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] question about authentication with kerberos and Default principal
Hm. If I remember correct, at least parts of the kerberos ticket in the ticket cache are endian dependent. As the principal name seems to be broken to start with, maybe the error is there. Do you have the same problems if you use the FILE: ticket cache type or the kinit and afslog from heimdal to handle tickets and tokens? Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] permission issue when trying to switch kerberos realms.
I wrote >>I actually don't know how high a kvno can be but up to 32767 (2^15-1) >>"feels" safe. That was probably WRONG as Sergio pointed out to me. Sergio wrote: > It doesn't feel all that safe to me. True, RFC 4120 specifies the kvno as > UInt32, but https://k5wiki.kerberos.org/wiki/Projects/Larger_key_versions > makes interesting reading. Version 1.14 isn't all that old; Debian 8 only > has version 1.12. > > Maybe if one requires rxkad-k5 it's OK to have kvno>255, but back in > Kerberos 4 days it definitely wasn't. The OpenAFS code base still contains > things like > if (kvno > 255) > return KAANSWERTOOLONG; > (in src/kauth/krb_udp.c) and > @t(kvno)@\is a @b(one byte) key identifier associated with the key. It > will be included in any ticket created by the AuthServer encrypted with > this key. > (in src/kauth/AuthServer.mss). One byte. Auch. So until rxkad-k5 (around the corner - just kidding) we are probably stuck with that. So if you want to devide your KVNO space into two parts, around 100 for each is what you get :-( Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] permission issue when trying to switch kerberos realms.
Have the two service tickets different kvno (otherwise AFS can not tell them apart) and are they both installed on your AFS servers? I would start with a kvno of 1000 or so for the new cell which hopefully leaves enough headroom for keying the old cell if necessary. I actually don't know how high a kvno can be but up to 32767 (2^15-1) "feels" safe. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Security Advisory 2016-003 and 'bos salvage' questions
Is there any reason why the -salvagedir requires -all? We run dafs. To minimize downtime I'd like to use this per volume or if that is not possible at least per partition so I don't need to shut down the complete fileserver for this. Ok, I can move one volume to a dedicated salvage fileserver at a time and then out again, but that is tedious. # bos salvage -server sterlet -partition a -volume M.probe.sterlet.a -forceDAFS -salvagedirs -orphans attach -localauth -salvagedirs only possible with -all. This is our fileserver config: # cat /usr/afs/local/BosConfig restrictmode 0 restarttime 16 0 0 0 0 checkbintime 16 0 0 0 0 bnode dafs dafs 1 parm /usr/afs/bin/dafileserver -udpsize 131071 -sendsize 131071 -nojumbo -p 128 -busyat 1200 -rxpck 800 -s 2400 -l 2400 -cb 100 -b 480 -vc 2400 parm /usr/afs/bin/davolserver parm /usr/afs/bin/salvageserver -datelogs -parallel all8 -orphans attach parm /usr/afs/bin/dasalvager -datelogs -parallel all8 -orphans attach end Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] vos dumps to S3 via S3 Storage Gateway?
adsmpipe replacement: /afs/hpc2n.umu.se/lap/tsmpipe/x.x/src/ Used with some scripts do put vos dumps into TSM archive. This is the current backup solution for at least 3 AFS cells I know about. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Check free space on AFS share before login
I think the problem is well known and what one would need to do is to make (at every travesal of an AFS mount point) the OS aware of that the AFS volume in question is a seperate "device". Then make the statfs syscall on that path return the quota info from AFS. This has of course to happen dynamically as you make your way through the AFS space. This would make every volume look as a seperate file system. There are pros and cons in that approach. I think noone has written the code (for Unix/Linux) yet, but the Windows client might do this, but I'm by no means someone who knows something about AFS on Windows ;-) At our site, so far, is has been cheaper to multiply all quotas by 2 whenever the problem arose again. Und Tschüß, Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Additonal question about the OpenAFS Security Advisory 2016-003
> AFS file servers store directory information in a flat file that > consists of a header, a hash table and a fixed number of directory entry > blocks. That I was aware of. > When a volume dump is constructed for a volume move, a volume > release, a volume backup, etc. the contents of the directory files > are copied into the dump stream exactly as they are stored on disk > by the file server. That was the part I was unsure about. Thanks! > I hope this information is helpful. Yes, indeed. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Additonal question about the OpenAFS Security Advisory 2016-003
The security advisory says: > We further recommend that adminstrators salvage all volumes with the > -salvagedirs option, in order to remove existing leaks. Is moving the volume to another server enough to fix this as well or does the leak move with the volume? Thanks for the help, Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] [MacOsX] Issues with saving/removing files
It would not be the first time that applications in MacOSX try to determine themselves if they have file system permission (by issuing stat(2) and looking at owner and unix bits) instead of access(2). So fix your directory owner and/or permissions so that it _looks_ correct UNIX-wise. Test chmod 777 (will of course not do any real change security wise as the AFS permissions are in effect, but may fool the finder) Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Request for Assistance with OpenAFS
...but if the cell ist still on kaserver (without even v4 tickets), then only the old AFS tools will do the trick. That was really a long time ago. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Request for Assistance with OpenAFS
> ... I have tried various > options in the /etc/krb5.conf file with no luck yet. Any help is much > appreciated. you might want to try [libdefaults] allow_weak_crypto = yes in krb5.conf but if they the realm is v4 _only_, then you nay need a krb.conf and old v4 tools. In that case, I would try good old Heimdal 0.7.2 from source. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Cross-platform DFS
Do you have any special reason that your storage is spread out on several OS:es? Looks like a quite small installation to be so spread out. > we have SSD disks spread across both windows and linux machines that we > wanted to put together into a pool of storage, and ideally, mount it as NFS > in all servers, so our software can access the NFS mount point and > write/read data. No, not as NFS. AFS mounts as AFS with it's own kernel module (in Windows called "driver" I think). > Ideally, we wanted to build a pool of storage with 42TB (combining all > windows and linux servers), but without changing the windows servers to > linux What are you especially found of in these so that you want to keep them? Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Volume corruption or salvage error?
I have a very odd thing about at least one volume. The volume is ancient: # vos exa home.levitte.Mail home.levitte.Mail 536904039 RW 530871 K On-line beef.stacken.kth.se /vicepa RWrite 536904039 ROnly 0 Backup 536905115 MaxQuota 55 K CreationSun Mar 6 05:07:47 2011 CopySat Jul 18 16:22:24 2015 Backup Mon Oct 12 01:25:24 2015 Last Access Thu Oct 8 10:10:38 2015 Last Update Wed Apr 17 14:27:05 2002 0 accesses in the past day (i.e., vnode references) RWrite: 536904039 Backup: 536905115 number of sites -> 1 server beef.stacken.kth.se partition /vicepa RW Site There does not seem to be any change going on from $USER. The volume was salvaged (routine salvage after a vos move) where some errors were corrected in August: SalsrvLog.2015-08-17.10:39:09:07/18/2015 16:24:27 dispatching child to salvage volume 536904039... SalsrvLog.2015-08-17.10:39:09:07/18/2015 16:24:29 SALVAGING VOLUME 536904039. SalsrvLog.2015-08-17.10:39:09:07/18/2015 16:24:29 home.levitte.Mail (536904039) updated 04/17/2002 14:27 SalsrvLog.2015-08-17.10:39:09:07/18/2015 16:24:30 Salvaged home.levitte.Mail (536904039): 100258 files, 530871 blocks Since then, the volume has been cloned to a backup volume very often without error: VolserLog:Mon Aug 24 01:24:39 2015 1 Volser: Clone: Recloning volume 536904039 to volume 536905115 VolserLog:Tue Aug 25 01:23:11 2015 1 Volser: Clone: Recloning volume 536904039 to volume 536905115 VolserLog:Wed Aug 26 01:24:53 2015 1 Volser: Clone: Recloning volume 536904039 to volume 536905115 VolserLog:Thu Aug 27 01:22:00 2015 1 Volser: Clone: Recloning volume 536904039 to volume 536905115 VolserLog:Fri Aug 28 01:24:22 2015 1 Volser: Clone: Recloning volume 536904039 to volume 536905115 ... VolserLog:Sat Oct 10 01:25:23 2015 1 Volser: Clone: Recloning volume 536904039 to volume 536905115 VolserLog:Sun Oct 11 01:25:28 2015 1 Volser: Clone: Recloning volume 536904039 to volume 536905115 VolserLog:Mon Oct 12 01:25:24 2015 1 Volser: Clone: Recloning volume 536904039 to volume 536905115 However, when the backup volume should be dumped for backup: Wed Sep 9 01:25:05 2015 DumpVnode: volume 536905115 vnode 1 has inconsistent length (index 6144 disk 26624); aborting dump Sun Oct 11 01:25:28 2015 DumpVnode: volume 536905115 vnode 1 has inconsistent length (index 6144 disk 26624); aborting dump The rw clone can not be dumped either: Mon Oct 12 11:20:31 2015 DumpVnode: volume 536904039 vnode 1 has inconsistent length (index 6144 disk 26624); aborting dump So something is wrong in spite that salvage was "successful" and that the volume header says that the volume was not changed since the salvage. Has the salvage "forgotten" to update the size of vnode 1 where it created entries for orphaned files (according to the salvagelog)? I have two other volumes that are similar :-( The data is on zfs which has not reported any problems with data loss. Version is 1.6.9-2+deb8u2-debian (ok, a bit old) but I have not seen anything in the release notes that this is known or fixed. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Apache2 and OpenAFS
> Another (unsecure) way is to use IP-users in OpenAFS. That Ip gains > access to the path and the whole system under that IP can access the data. Hi Lars, long time no see. And all systems faking that IP will get access as well. For that answer you should be stuck behind double NAT for at least a week. ;-) ;-) Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Apache2 and OpenAFS
We run our web server authenticated from a keytab. The keytab contains # /usr/heimdal/sbin/ktutil --keytab=/etc/krb5.keytab.web-daemon list Vno Type Principal 0 des3-cbc-sha1web-daemon/scat.pdc.kth...@nada.kth.se 0 aes128-cts-hmac-sha1-96 web-daemon/scat.pdc.kth...@nada.kth.se 0 arcfour-hmac-md5 web-daemon/scat.pdc.kth...@nada.kth.se Then the webserver is started with heimdal kinit (which does all the pagsh and renew magic) with that keytab: # ps auxgwww | grep kinit root 31751 0.0 0.0 39880 2100 ?SJul04 0:04 /usr/heimdal/bin/kinit --no-forward --no-renew --keytab=/etc/krb5.keytab.web-daemon --afslog web-daemon/scat.pdc.kth...@nada.kth.se /usr/sbin/httpd -DNO_DETACH -D DEFAULT_VHOST -D SSL_DEFAULT_VHOST -D INFO -D LANGUAGE -D SSL -D CACHE -D MEM_CACHE -D DAV -D STATUS -D AUTH_DIGEST -D PROXY -D USERDIR -D REWRITE -k start The web-daemon/scat.pdc.kth...@nada.kth.se principal maps to this PTS identity (due to historical reasons the "/" is replaced with a "." in the OpenAFS pts to pricipal naming mapping, there are folks on this list who happen to know exactly why) $ pts exa web-daemon.scat.pdc.kth.se -c pdc.kth.se Name: web-daemon.scat.pdc.kth.se, id: 65531, owner: system:administrators, creator: haba.admin, membership: 4, flags: S, group quota: 20. Then all web-daemons.x.y.z are member in this group: $ pts mem web-daemons -c pdc.kth.se Members of web-daemons (id: -32225) are: web-daemon.wrasse.pdc.kth.se web-daemon.schelly.pdc.kth.se web-daemon.scat.pdc.kth.se Then you give web-daemons the appropriate permissions in the file system. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] backup - order of entries in volume set
Volume set userbackup: Entry 1: server .*, partition .*, volumes: user\..*\.backup Entry 2: server .*, partition .*, volumes: user\.backup It's not clear to me if the specification should include the .backup and/or .readonly suffixes or if the backup program does add the .backup automatically if you specify for example user. I think the reason for this is a strncmp in src/bucoord/commands.c line 311: for (tavols = ps-dupvdlist; add tavols; tavols = tavols-next) { if (strncmp(tavols-name, entries[e].name, l) == 0) { if ((strcmp(entries[e].name[l], .backup) == 0) || (strcmp(entries[e].name[l], .readonly) == 0) || (strcmp(entries[e].name[l], ) == 0)) add = 0; } } Yes, this looks like a bug. It's no idea to even compare the strings if they are of different length. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Error in volume - anything that can be salvaged?
Something is really fishy with this volume This volume was salvaged recently after some power failures and a scripted vos move. So during the vos move it was good enough to be moved. After that it was salvaged (forceDAFS) now it's broken beyond vos dump. See below. 07/18/2015 11:43:44 dispatching child to salvage volume 536889218... 07/18/2015 11:43:44 1 nVolumesInInodeFile 32 07/18/2015 11:43:44 SALVAGING VOLUME 536889218. 07/18/2015 11:43:44 ftp.free.doc (536889218) updated 02/28/2011 15:00 07/18/2015 11:43:44 totalInodes 3596 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25540.13495 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25542.13496 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25544.13497 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25546.13498 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25548.13499 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25550.13500 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25552.13501 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25554.13502 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25556.13503 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25558.13504 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25560.13505 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25562.13506 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25564.13507 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25566.13508 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25568.13509 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25570.13510 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25572.13511 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25574.13512 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25576.13513 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25578.13514 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25580.13515 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25582.13516 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25584.13517 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25586.13518 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25588.13519 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25590.13520 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25592.13521 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25594.13522 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25596.13523 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25598.13524 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25600.13525 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25602.13526 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25604.13527 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25606.13528 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25608.13529 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25610.13530 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25612.13531 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25614.13532 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25616.13533 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25618.13534 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25620.13535 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25622.13536 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25624.13537 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25626.13538 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25628.13539 07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as __ORPHANFILE__.25630.13540 07/18/2015 11:43:44 Vnode 25540: link
Re: [OpenAFS] Dafileserver does not manage to slavage volume
Last night there was a power outage again and the file server and the salvager did not like each other again. Unfortunately the FileLog is not dated like the SalvageLog. The reboot was at 2:20. Then I restarted around 2:35. Then I went home at around 04:05. Then I got a fileserver core at 6:24 and issued a restart at 9:41 which resulted in a core from salvageserver. Then I get alarming messages like Wed Jul 15 10:10:25 2015 Scheduling salvage for volume 536903735 on part /vicepa over SALVSYNC Wed Jul 15 10:10:40 2015 nUsers == 0, but header not on LRU but that does not seem to hinder salvage 07/15/2015 10:10:25 dispatching child to salvage volume 536903735... 07/15/2015 10:10:26 SALVAGING VOLUME 536903735. 07/15/2015 10:10:26 home.eb (536903735) updated 02/15/2015 13:01 07/15/2015 10:10:26 totalInodes 9 07/15/2015 10:10:26 Salvaged home.eb (536903735): 2 files, 3 blocks All this is a litte disturbing. And now I have to run off and do other things :( Well, will see if I have new core files when I return. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Dafileserver does not manage to slavage volume
Looks to me that if dafileserver cores while a volume salvage is in progress, the bosserver does not manage to figure out the correct actions so that the dafileserver - salvageserver - dasalvager and the SALVSYNC mechanism breaks. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Dafileserver does not manage to slavage volume
# dpkg -l | grep openafs-f ii openafs-fileserver 1.6.1-3+deb7u1amd64 AFS distributed filesystem file server After a poweroutage, the salvager has not salvaged everything: In SalsrvLog: 07/07/2015 00:57:12 Salvaged stackware (536870963): 3401 files, 71676 blocks and later 07/07/2015 06:08:17 dispatching child to salvage volume 536870963... Eh, why again? In Volserlog: Zillions of Wed Jul 8 13:22:06 2015 Scheduling salvage for volume 536870963 on part /vicepa over FSSYNC and one Wed Jul 8 13:22:06 2015 1 Volser: GetVolInfo: Could not attach volume 536870963 (/vicepa:V0536870963.vol) error=101 # file /vicepa/V0536870963.vol /vicepa/V0536870963.vol: data FileLog says: Wed Jul 8 02:16:08 2015 denying offline request for volume 536870963; volume is in salvaging state vos listvol gives bogus.536872474 536872474 RW 0 K Off-line Could not attach volume 536870963 Total volumes onLine 3217 ; Total volumes offLine 2 ; Total busy 0 The bogus.536872474 is I think a rest of another volume that did not salvage and attach which I forcibly took back from backup as I needed the data. Something like (vos dump vol.backup | vos restore vol) So, how should I force a salvage of 536870963? Is -forceDAFS the right next step? # bos salvage mount-kilimanjaro.stacken.kth.se a 536870963 -showlog This is a demand attach fileserver. Are you sure you want to proceed with a manual salvage? must specify -forceDAFS flag in order to proceed. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] OpenAFS still in development?
I do not believe that the OpenAFS mailing lists are an appropriate forum to discuss AuriStor. My response to Michael provided details on AuriStor because I felt it was necessary in order to properly answer the implied questions. What I've learned so far from AuriStor it looks like it could be a replacement for OpenAFS on the platforms it's available. And it can more as Jeff tells us. If that strategy is good advertising depends on cultural background. The question of supported platforms is an interesting one because it is very unclear what it means for OpenAFS to support a platform. What are the criteria? Is it sufficient to say that if you can build OpenAFS on the OS and hardware architecture that it is supported? Sorry, supported was probably a bad choice of word. But I don't know if availabe or runable or it builds it ships would be better. I am quite sure there are other criteria that could be added to the mix. I know that you take supported very seriously. I would be happy if other software vendors (which are not into file systems) would do that as well. * Linux . Red Hat Enterprise Linux (YFSI is a Red Hat Technology Partner) . Fedora . Debian . Ubuntu * Microsoft Windows * Apple OSX and iOS * Oracle Solaris * IBM AIX * Android Servers are supported everywhere but on Windows, iOS and Android but the performance varies significantly based upon the OS release, processor architecture, and underlying hardware so there are combinations that we recommend and those we do not. The failure to list an OS family or Linux distribution does not imply that YFSI will not support AuriStor on that platform. It only implies that there has been insufficient customer interest to this point for YFSI to expend the necessary resources on development, testing and certification (where applicable.) Thanks for the list. I guess on the main HW which is amd64 for most of the OSes above. Both at work and privately I run OpenAFS on platforms that are not on the list and even in the future will not have much customer interest. In the end software development has to be a partnership between those that build and those that deploy. If those that deploy do not fund those that build there will not be sufficient development hours and talent to build the solutions those that deploy require. I see that this partnership has stopped working in many places. It makes me sad. P.S. My apologies for the long reply. You don't need to apologise. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] OpenAFS still in development?
4. Is AuriStor a replacement for OpenAFS? AuriStor is designed to be a general purpose, platform independent, secure, distributed file system that can be successfully deployed internally, across the Internet, and within public cloud services. AuriStor is an IBM AFS(R) and OpenAFS compatible file system that (...) . and more On https://www.your-file-system.com/openafs/auristor-comparison I have not found a list of supported platforms (client and server) comparison for OpenAFS and AuriStor. Because if it does not run on the platform, it's not a replacement. I have not found a price list of the products and what they contain. However, there is much we are better advertisement. That's not something that works well in all cultures. My conclusion of the current landscape is that if you have chosen a closed source operating system that you pay money for, in the future you'll have to pay money for a decent file system as well, either directly to the OS vendor or to a third party. That will in the future be necessary for any other function enhancement as well. That trend will continue and sooner or later include every app(lication) that you want to run on that platform. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Max number of groups owned by user
Is there a setting somewhere for how many groups a user can create? see other's explanations From what template is that value taken when the user is created? Btw: Groups have quotas for the max number of members as well. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Metrics on the cell
I think that you are able to turn on some logging (which was probably inteded for debugging) to get out _some_ data. Others might know what exact knobs to turn. I have been asked by senior leadership to gather data on the usage of our cell. One very important fact that you have to give to the ones asking is that the following case measured on the server: $ cat /nfs/server/file /dev/null $ cat /nfs/server/file /dev/null $ cat /nfs/server/file /dev/null gives a usage of approx 3*bytes(file) in x seconds but $ cat /afs/cell/file /dev/null $ cat /afs/cell/file /dev/null $ cat /afs/cell/file /dev/null gives a usage of approx 1*bytes(file) in x/3 seconds. So how much the service is worth for all the users, you can only be seen if you have control over all the clients. That's often not easy to get for leadership. Good luck, Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] single OpenAFS cell and multiple/different kerberos realms
In order for user@B to obtain afs/cellname@A there must be a cross-realm relationship between A and B. The other way to obtain a token for cellname is to add a service principal afs/cellname@B to realm B and then export the key and add it in addition to the key from afs/cellname@A to the AFS cell. That summarizes it quite well. I think you must at least put the krbtgt/A@B into B which means that A trusts B or the afs/a@B into B which means that the AFS servers in a trust B. If you only can get user- (and not service-) principals into B, you loose. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] backup strategy
Last time I heard about tsmafs (the file level thing from Luleå) there were just some small adjustments necessary before publishing the code. That was a while ago. Hint hint ;-) Harald. PS: Places from where code for AFS-TSM backup integration has been published end in å? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] backup strategy
Do you have an existing backup system (for example for other stuff than AFS) and are you allowed to put data into that? In that case, what kind of system do you have? We do in effect (hiding all the gory stuff): ... Hilarious script ... ... Hilarious script ... ... Hilarious script ... vos dump VOLUMENAME | tsmpipe /some-prefix/VOLUMENAME.DUMPLEVEL.DATE which makes files in TSM which in effect are volume dumps. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] backup strategy
and with NetWorker, it was (it probably still is) impossible to save from a pipe, requiring the use of temporary disk for the purpose. TSM as shipped by IBM can't either, but there is a TSM API. Then tsmpipe was written in Umeå. Thanks again! # fs newcell hpc2n.umu.se chosan.hpc2n.umu.se grande.hpc2n.umu.se mamba.hpc2n.umu.se $ ls /afs/hpc2n.umu.se/lap/tsmpipe/x.x/src/dist/ Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Minor question on moving AFS db servers
I just mention this because I don't think there's any way to avoid this one. Other userspace clients will not notice because they are short-lived processes, but anything that's long running, we don't have a way to notify of CellServDB changes. The usespace clients could leave hints (for example in a file) that a cell has a bad DB server and then future userspace clients could do something intelligent with the hints. Another more agressive way would be to contact more than one db server at a time, for example pick two at random and use the info from the one that answers first. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Red Hat RPM packaging
Cool, but would it be easy for an OpenAFS user to migrate to these, and is the in-kernel AFS nearly as complete as OpenAFS? My immediate guess would be no on both counts, but I'd love to hear otherwise :) Probably Arla (in spite of unmaintained for some years) still is the second most complete AFS kernel module. And it's GPL. There are several reasons why Arla nowadays is unmaintained, but one is that the amount of work to track the Linux kernel changes compared to other more constructive work was very tiresome. Then one chooses to do other things, Arla was hacked a great deal in spare time. Yet another AFS kernel module from David Howell back in the time did not help neither. But maybe this was just a wing flap from the past ;-) Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help: Volume inaccessible after incident
$ vos examine html.milicchio Could not fetch the information about volume 536871913 from the server : No such device Volume does not exist on server aux.dia.uniroma3.it as indicated by the VLDB If you do a vos listvol against aux.dia.uniroma3.it you see that you a) only have html.milicchio.backup but not html.milicchio any more. b) have a lot of volumes in the offline state on that server Did the server run any salvage and is there anything in the log about the outcome? If you have any backupvolumes that are online, you can vos dump them, vos restore them to another server (preferably first under another name) and check the contents out. Especially if you do not have any backup on another media of the contents. vos listvol servername | grep volid will give you what the server thinks is there vos listvldb -name volid will give you what the DB thinks is there vos exa volid gives you some combined of the two above (if available). Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] System resources requirements and performance tuning for AFS file servers
Finally, when I upgrade my DB servers, I know that the “right” way is to shut everything down, copy over the databases, (...) Nah. ..., do you think I’d be safe to take down one database server at a time, bring up a new RHEL VM with the same IP address, start the AFS processes, and wait for the database to propagate to the new box? That's how I have done most upgrades. Not with VMs but with real boxes. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Apache2 and openafs
Another program which can start subprocesses in a PAG and authenticate from a keytab is Heimdals kinit. Harald. From the manpage: SYNOPSIS kinit [--afslog] [-c cachename | --cache=cachename] [-f | --no-forwardable] [-t keytabname | --keytab=keytabname] [-l time | --lifetime=time] [-p | --proxiable] [-R | --renew] [--renewable] [-r time | --renewable-life=time] [-S principal | --server=principal] [-s time | --start-time=time] [-k | --use-keytab] [-v | --validate] [-e enctypes | --enctypes=enctypes] [-a addresses | --extra-addresses=addresses] [--password-file=filename] [--fcache-version=version-number] [-A | --no-addresses] [--anonymous] [--enterprise] [--version] [--help] [principal [command]] ^^^ ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Two realms and one cell
A little question. We have one AFS cell myrealm.fr and a Kerberos realm myrealm.fr. We must use our AFS cell with a another realm named otherrealm.fr. There is no trusted relations between myrealm.fr and otherrealm.fr. Is it possible ? If you don't trust otherrealm.fr enough to establish cross-realm, you probably don't trust otherrealm.fr enough to give them a set of AFS service keys for your servers. Then users from otherrealm.fr must have ident...@myrealm.fr or any other realm you trust. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Cannot create directory with Nautilus
At work, I can create a directory in my $HOME with Nautilus (Debian Wheezy). My problem is with my personal computer (Ubuntu 12.04). I can't create a directory in my $HOME on the same OpenAFS server. Menus are disabled. I must create directory with mkdir. And you, how do you do ? I think Nautilus is to blame who thinks it can not do it because of UID and Unix permissions and therefore does not even try. What is your UID at work? (guess: not 1000) What is your UID on your home computer? (guess: 1000) I would change the UID on the home computer to match the UID on the work computer to work around the problem. I think we had this discussion before, a long time ago. Did we get any feedback from the Nautilus folks? Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Salvageserver 1.6.1-3+deb7u1 core dump
On master, this looks like opr_Verify(afs_dir_Delete(dh, ..) == 0); opr_Verify(afs_dir_Create(dh, .., pa) == 0); I guess that opr_Verify is some new fancy name for Assert? Same patch for master as for 1.6.9 needed? Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Salvageserver 1.6.1-3+deb7u1 core dump
All directories are supposed to have a .. entry by this point I disagree. Not if something has removed ... For example a buggy salvager like 1.6.1. Or bitrot. Or whatever. As this is a salvage program, that should not core dump even if unexpected things happen, I think it is very very very hard to justify such a assert in production code for the salvager. Especially because the next layer that does call the salvager does not handle that the salvager asserts. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Salvageserver 1.6.1-3+deb7u1 core dump
More from the core: (gdb) bt #0 0x7fbe95040475 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x7fbe950436f0 in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x0042f162 in osi_Panic ( msg=msg@entry=0x4635f0 assertion failed: %s, file: %s, line: %d\n) at ./../rx/rx_user.c:251 #3 0x0042f17d in osi_AssertFailU ( expr=expr@entry=0x4586bb Delete(dh, \..\) == 0, file=file@entry=0x458337 ../vol/vol-salvage.c, line=line@entry=3997) at ./../rx/rx_user.c:261 #4 0x00408078 in SalvageVolume ( salvinfo=salvinfo@entry=0x7fff12474f80, rwIsp=rwIsp@entry=0x1bd68b0, alinkH=0x1bd44a0) at ../vol/vol-salvage.c:3997 #5 0x0040af8d in DoSalvageVolumeGroup (salvinfo=salvinfo@entry=0x0, isp=0x1bd68b0, nVols=nVols@entry=2) at ../vol/vol-salvage.c:2092 #6 0x0040c391 in SalvageFileSys1 (partP=partP@entry=0x1bca880, singleVolumeNumber=536904480) at ../vol/vol-salvage.c:937 #7 0x004041a9 in DoSalvageVolume (slot=optimized out, node=0x1bd4030) at ../vol/salvaged.c:640 #8 SalvageServer (argv=optimized out, argc=optimized out) at ../vol/salvaged.c:574 #9 handleit (as=optimized out, arock=optimized out) at ../vol/salvaged.c:299 #10 0x004572a4 in cmd_Dispatch (argc=7, argc@entry=6, argv=0x1bc1c20, argv@entry=0x7fff12475768) at cmd.c:905 #11 0x00404c67 in main (argc=6, argv=0x7fff12475768) at ../vol/salvaged.c:418 So this is bailing out at vol_salvage.c opr_Verify(afs_dir_Delete(dh, ..) == 0) which looks a lot like http://git.openafs.org/?p=openafs.git;a=commitdiff;h=e8faeae6dcae0e566de2b21d53d3f78f3cc44e3f Improve JudgeEntry() detection of orphaned directories to prevent unintentional deletion of their '.' and '..' entries. This in turn prevents a later assert (opr_Verify) when we try to delete and re-add '..' in order to attach the orphan. ... So well, now I only need to find something that contains that patch (1.6.9 I suppose) for wheezy, correct? Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Salvageserver 1.6.1-3+deb7u1 core dump
This change went into 1.6.6, so 1.6.7 would do as well. Thanks. Built myself a 1.6.9 from the source deb from unstable. But unfortunately, the volume in question breaks the 1.6.9 salvageserver as well :( Gdb tells me: (gdb) up #1 0x7f82c1a436f0 in abort () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) up #2 0x7f82c2424bb8 in osi_Panic ( msg=msg@entry=0x7f82c245c1d0 assertion failed: %s, file: %s, line: %d\n) at ./../rx/rx_user.c:251 251 afs_abort(); (gdb) up #3 0x7f82c2424bd5 in osi_AssertFailU ( expr=expr@entry=0x7f82c2450c13 Delete(dh, \..\) == 0, file=file@entry=0x7f82c245088f ../vol/vol-salvage.c, line=line@entry=4127) at ./../rx/rx_user.c:261 261 osi_Panic(assertion failed: %s, file: %s, line: %d\n, expr, (gdb) up #4 0x7f82c23f97b7 in SalvageVolume ( salvinfo=salvinfo@entry=0x7fffcf109530, rwIsp=rwIsp@entry=0x7f82c2dba290, alinkH=0x7f82c2db7e80) at ../vol/vol-salvage.c:4127 4127osi_Assert(Delete(dh, ..) == 0); (gdb) list 4122SetSalvageDirHandle(dh, vid, salvinfo-fileSysDevice, 4123 salvinfo-vnodeInfo[class].inodes[v], 4124salvinfo-VolumeChanged); 4125pa.Vnode = LFVnode; 4126pa.Unique = LFUnique; 4127osi_Assert(Delete(dh, ..) == 0); 4128osi_Assert(Create(dh, .., pa) == 0); 4129 4130/* The original parent's link count was decremented above. 4131 * Here we increment the new parent's link count. (gdb) p dh $1 = {dirh_volume = 536904480, dirh_device = 0, dirh_inode = 173619825082415, dirh_handle = 0x7f82c2db8980, dirh_cacheCheck = 44, volumeChanged = 0x7fffcf109570} (gdb) p dh $2 = (DirHandle *) 0x7fffcf108c00 (gdb) p pa $3 = {Volume = 4294967295, Vnode = 1, Unique = 1} Should we not just make a .. in this situation? And now lunch. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Salvageserver 1.6.1-3+deb7u1 core dump
Well, I did add a patch like: Index: openafs-1.6.9/src/vol/vol-salvage.c === --- openafs-1.6.9.orig/src/vol/vol-salvage.c2014-06-12 08:30:48.0 + +++ openafs-1.6.9/src/vol/vol-salvage.c 2014-06-17 10:34:23.857444175 + @@ -4124,7 +4124,8 @@ salvinfo-VolumeChanged); pa.Vnode = LFVnode; pa.Unique = LFUnique; - osi_Assert(Delete(dh, ..) == 0); + if(Delete(dh, ..) != 0) + Log(Delete of .. failed, but will try to recreate it anyway\n); osi_Assert(Create(dh, .., pa) == 0); /* The original parent's link count was decremented above. Which created two empty __ORPHANDIR__* in the volume. Then I have the following logs from my backup script which tried a vos backup home.katy Tue Jun 17 01:39:36 2014 beef.stacken.kth.se : Could not start a transaction on the volume 536904474 Tue Jun 17 01:39:36 2014 beef.stacken.kth.se : Volume needs to be salvaged Tue Jun 17 01:39:36 2014 beef.stacken.kth.se : Error in vos backup command. Tue Jun 17 01:39:36 2014 beef.stacken.kth.se : Volume needs to be salvaged However, my salvage log says, that the volume was salvaged OK: 06/16/2014 23:39:37 Salvaged home.katy (536904474): 23897 files, 732753 blocks and that the salvage ended 06/16/2014 23:57:23 which is several hours before. When I did a vos backup home.katy recently, everything went good. What's going on here? Followup question: Should I now run a salvage over all volumes? How do I do that with as little impact as possible manually? Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Salvageserver 1.6.1-3+deb7u1 core dump
Is this a known bug of openafs-fileserver 1.6.1-3+deb7u1 (Debian)? 06/16/2014 23:42:11 dispatching child to salvage volume 536904480... 06/16/2014 23:42:23 SYNC_getCom: error receiving command 06/16/2014 23:42:23 SALVSYNC_com: read failed; dropping connection (cnt=190) 06/16/2014 23:42:23 salvageserver core dumped! 06/16/2014 23:42:23 salvageserver (pid=3622) terminated abnormally! 06/16/2014 23:42:12 2 nVolumesInInodeFile 64 06/16/2014 23:42:12 CHECKING CLONED VOLUME 536904573. 06/16/2014 23:42:12 home.maxz.backup (536904573) updated 11/20/2013 09:20 06/16/2014 23:42:12 SALVAGING VOLUME 536904480. 06/16/2014 23:42:12 home.maxz (536904480) updated 11/20/2013 09:20 06/16/2014 23:42:12 totalInodes 1238 06/16/2014 23:42:13 dir vnode 47: invalid entry deleted: ??/.. (vnode 249, unique 38300) 06/16/2014 23:42:13 dir vnode 51: invalid entry deleted: ??/.. (vnode 249, unique 38300) Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] OpenAFS and windows/unix versioning
In sync would have been nice, but as in sync has been problematic in the past and I don't expect that to change, I suggest to go with the last suggestion. I would call it marketing numbers and these should have another range so that they have clearly differing version numberings (like the 5.x example from Andrew). Or call the Windows versions 14.x this year and 15.x next year. Then we will never reach the feared OfW-13 version for sure ;-) Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] OpenAFS and windows/unix versioning
But overall not a major issue for us. Unfortunately, we have to _guess_ a lot about this, because many of the issues are probably not issues for the folks here on openafs-info. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] vos move operation stalls
I have installed a new fileserver, running Gentoo Linux, kernel 3.8.13, OpenAFS 1.6.5 (also tried 1.6.6). The server has two /vicep partitions, each with an XFS filesystem: * ~800GB /vicepa on the system disk - mdadm mirror with two SATA disks * 24TB /vicepb on an external RAID system, connected via iSCSI Moves to the /vicepa partitions work as expected. Moves to the /vicepb start, but after some time (usually 10-20sec) there is no progress anymore: - no disk i/o - package counters (vos status/rxdebug) stay constant forever - transactions do not time out within 3-4 days That's very strange, as the AFS server processes do access the file system as any other process would. So there is nothing special about them. We have used XFS/Linux for AFS server /vicep* for a long time (however not on iSCSI) and did not have any problems with it. We have moved now to ZFS on local HD. Can you strace the hanging process and see what syscall it's hanging on? Other fileserver/volserver operations on other volumes seem to be unaffected. At least something. Also, access to the iSCSI partition from the OS is still possible, there are no disk/iSCSI problems reported. Does a find /vicepb/ run through completely? Can you write files to /vicepa/TEST/ or something like that? Has anyone seen similar problems before? Does anyone have suggestions what I could try to debug the problem? Have you tried something else that XFS? have you tried to put the log part of XFS somewhere else? For performance, you might want it on mirrored local HD. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: OpenAFS client cache overrun?
Thanks also for the mention of AFS cache bypass, I think that may be a BIG help with this problem. 'Cache bypass' I don't believe is considered the most stable of features. It could indeed maybe help here, but I'd be looking out for kernel panics. I have not investigated further, but I suspect the cache bypass feature to be at least part responsible for a panic. This was however not on a normal system (Cray's version of SuSE ES 11 SP1). In addition, that system is running memcache. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: DB servers quorum and OpenAFS tools
The problem is that you the client to scan quickly to find a server that is up, but because networks are not perfectly reliable and drop packets all the time, it cannot know that a server is not up until that server has failed to respond to multiple retransmissions of the request. Those retransmissions cannot be sent quickly; in fact, they _must_ be sent with exponentially-increasing backoff times. Otherwise, when your network becomes congested, the retransmission of dropped packets will act as a runaway positive feedback loop, making the congestion worse and saturating the network. You are completely right if one must talk to that server. But I think that AFS/RX sometimes hangs to loong on waiting for one server instead of trying the next one. For example for questions that could be answered by any VLDB. I'm thinking of operation like group membership and volume location. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: DB servers quorum and OpenAFS tools
I have long thought that we should be using multi for vldb lookups, specifically to avoid the problems with down database servers. The situation is a little bit different for cache managers who can remember which servers are down and command line tools which normally discocver how the world looks on each startup. If the 'we ask everyone' strategy is not used all the time but only on startup, it will not happen that frequent. Probably not as frequent to cause problems for the scalability folks. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Extract files from /vicepa
If I understood correctly, the easiest way to restore the files is to setup another afs server and just overwrite the /vicepa folder with the one I have. Is this correct? Yes, I think that's still correct. The easiest way to set up a AFS server is probably to take a linux distro which has pre-packaged binaries for AFS client and server. Debian for example. I don't understand the part about the salvager deleting the data. I think the ownership and mode bits conain information if the file in question is active. The salvager may delete inactive data. But prior to copying your old data into your new /vicepa/ you can remove the salvager from BosConfig and then run the salvager by hand with -nowrite which will tell you what the salvager would have done. I have the recovered /vicepa folder on a ntfs partition. I'm trying to recover again the folder but to an ext4 partition trying to preserve ownership and modes ... Good if you can do that. Zip and Tar archives can be told to preserve ownership as well. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Ubik trouble
- On a client-mode connection, the source address is always ignored. This actually should have the effect of making small requests like votes always work. But for some reason it doesn't. That's my observation as well. The practical effect of this is that it is possible for voting to work fine, because that's a single-round-trip operation, while larger calls such as transferring a database update fare not so well (or consistently). I don't know exactly what went wrong, but when you udebug the problem, the last vote rcvd X secs ago counter goes up and up and up... Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Ubik trouble
This turned out to be a subtle network problem. This reminds me of another Ubik problem that I had which was because of (1) a config error and (2) something about Ubik that I still do not understand. (1) I had an old NetInfo file with a wrong IP addr lying around. This id _not_ prevent the server to start nor to prevent sync completely. The protection server synced fine and the volume location server refused. (2) I have a machine where the database server is known as X.Y.Z.43 but the machine's primary IP is X.Y.Z.46. This seems to work well until something somewhere checks the source address of the traffic when sync is tried. Result: The protection server synced fine and the volume location server refused. After I corrected the NetInfo on machine (1) and added -rxbind to the vlserver on machine (2) everything synced fine. I am still puzzled why when -rxbind is _not_ given then ubik says to me it first worked fine for 13 hours (866491 secs ago, see below) but not after that. $ udebug 130.237.234.3 7002 Host's addresses are: 130.237.234.3 Host's 130.237.234.3 time is Mon Jan 13 14:39:58 2014 Local time is Mon Jan 13 14:39:58 2014 (time differential 0 secs) Last yes vote for 130.237.234.3 was 8 secs ago (sync site); Last vote started 8 secs ago (at Mon Jan 13 14:39:50 2014) Local db version is 1369329952.4 I am sync site until 52 secs from now (at Mon Jan 13 14:40:50 2014) (3 servers) Recovery state 1f The last trans I handled was 0.360296 Sync site's db version is 1369329952.4 0 locked pages, 0 of them for write Server (130.237.234.101): (db 1369329952.4) last vote rcvd 8 secs ago (at Mon Jan 13 14:39:50 2014), last beacon sent 8 secs ago (at Mon Jan 13 14:39:50 2014), last vote was yes dbcurrent=1, up=1 beaconSince=1 Server (130.237.234.43): (db 1369329952.4) last vote rcvd 866491 secs ago (at Fri Jan 3 13:58:27 2014), last beacon sent 866474 secs ago (at Fri Jan 3 13:58:44 2014), last vote was yes dbcurrent=1, up=0 beaconSince=0 On 130.237.234.43: # ps augxww | grep ptserv root 22308 0.0 0.3 9216 7264 ?SJan03 0:23 /usr/lib/openafs/ptserver Log looks fine: # cat /var/log/openafs/PtLog Fri Jan 3 13:58:28 2014 ubik: primary address 130.237.234.46 does not exist Fri Jan 3 13:58:28 2014 Using 130.237.234.43 as my primary address Fri Jan 3 13:58:28 2014 Starting AFS ptserver 1.1 (/usr/lib/openafs/ptserver) # cat /var/lib/openafs/local/NetInfo 130.237.234.43 So how should I know that this would cease to work after 13 hours? Or some other odd number of hours? Now restarting everything on 130.237.234.43 After that: This seems not to be self-healing either. On the sync site: Fri Jan 3 13:58:44 2014 assuming distant vote time 19270408 from 130.237.234.43 is an error; marking host down Mon Jan 13 14:48:42 2014 ubik: A Remote Server has addresses: Looks like I have to restart the server on the syncsite as well (so it forgets the bad vote time). And I'm not sure what 19270408 actually means. 223 days ago? Well, after that restart: $ udebug 130.237.234.3 7002 Host's addresses are: 130.237.234.3 Host's 130.237.234.3 time is Mon Jan 13 14:59:03 2014 Local time is Mon Jan 13 14:59:07 2014 (time differential 4 secs) Last yes vote for 130.237.234.3 was 5 secs ago (sync site); Last vote started 5 secs ago (at Mon Jan 13 14:59:02 2014) Local db version is 1369329952.4 I am sync site until 51 secs from now (at Mon Jan 13 14:59:58 2014) (3 servers) Recovery state 1f The last trans I handled was 0.0 Sync site's db version is 1369329952.4 0 locked pages, 0 of them for write Server (130.237.234.101): (db 1369329952.4) last vote rcvd 9 secs ago (at Mon Jan 13 14:58:58 2014), last beacon sent 5 secs ago (at Mon Jan 13 14:59:02 2014), last vote was yes dbcurrent=1, up=1 beaconSince=1 Server (130.237.234.43): (db 1369329952.4) last vote rcvd 8 secs ago (at Mon Jan 13 14:58:59 2014), last beacon sent 5 secs ago (at Mon Jan 13 14:59:02 2014), last vote was yes dbcurrent=1, up=1 beaconSince=1 But was a lot of hazzle to get there... Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Ubik trouble
(...) This includes cases where one server has multiple addresses or interfaces on the same subnet. Yes that was the case here. The puzzling thing was that it worked at first, which probably means that the adress of B's outgoing reply probably changed through time (urk). No I don't have tcpdumps from back when it worked. Well, from the kernels point of view, any source adress is good enough. The sad truth is that in order to properly support multi-homed hosts, Rx needs to be fixed so that it identifies all available interfaces, binds a separate socket for each interface, and keeps track of to which interface an incoming connection belongs, so that it can send responses out the same interface. I don't know if it has to be sent out that interface (normally it probaby will) but the responses need to have that source adress if I understand that right. Currently I have no multihomed Ubik servers (besides from the one that should not have been, see above) and very few multihomed file servers. So I can not say if the rx breaks when routing is asymetric behaviour has given us any trouble for fileservers. At least not as notable as this Ubik problem where I have shot myself in the foot real good ;-) Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: How to remove a bogus (127.0.1.1) server entry for readonly volume?
My test: # cat /etc/hosts 127.0.0.1 localhost 127.0.1.1 bananshake.stacken.kth.se bananshake # bos restart bananshake -local -all # cat FileLog ... Fri Dec 13 14:38:18 2013 Getting FileServer address... Fri Dec 13 14:38:18 2013 FileServer bananshake.stacken.kth.se has address 127.0.1.1 (0x101007f or 0x7f000101 in host byte order) Fri Dec 13 14:38:18 2013 File Server started Fri Dec 13 14:38:18 2013 ... So the server thinks somewhere it is 127.0.1.1, but is this message bogus as it would be more interresting to know which addresses the file server actually registered in the db. When I check, it has not registered it in the address list, bananshake is still with only one IP under UUID 007cadf8-b425-124e-91-2e-e8eded82aa77: # vos listaddr -local -printuuid -nores -noauth -c stacken.kth.se UUID: 000a40c4-cfeb-1228-b1-12-0101007faa77 130.237.234.220 UUID: 00230816-42a1-1361-ae-c3-2ceaed82aa77 130.237.234.101 UUID: 00438a50-e3e5-115d-8f-56-d8eaed82aa77 130.237.234.216 UUID: 0047a130-223f-1244-9b-0b-0101007faa77 130.237.234.151 UUID: 003f1f1a-0189-106e-b6-45-0101007faa77 130.237.234.150 UUID: 007cadf8-b425-124e-91-2e-e8eded82aa77 130.237.237.232 Nevertheless you can create volumes: # vos create bananshake a -name broken.volume -local -verbose Volume broken.volume 536911797 created and brought online Created the VLDB entry for the volume broken.volume 536911797 Volume 536911797 created on partition /vicepa of bananshake # vos listvldb -server bananshake.stacken.kth.se -nores vsu_ClientInit: Could not get afs tokens, running unauthenticated. VLDB entries for server bananshake.stacken.kth.se broken.volume RWrite: 536911797 number of sites - 1 server 127.0.1.1 partition /vicepa RW Site Total entries: 1 To clean up I did a vos remove -id 536911797 -local as I knew that it was a throwaway volume which makes it easier than if you want only to remove one replica. Btw, this is # rxdebug localhost -v Trying 127.0.0.1 (port 7000): AFS version: OpenAFS 1.6.1-3+deb7u1-debian built 2013-07-25 And I have not found where this filtering of 127/something actually takes place. Pointers welcome. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Question about how to use vos shadow
Am am experimenting with vos shadow, and # vos shadow H.haba.test.alanine -fromserver beef.stacken.kth.se -frompartition c -toserver bananshake.stacken.kth.se -topartition a -local -verbose works as expected # vos listvol bananshake Total number of volumes on server bananshake partition /vicepa: 1 H.haba.test.alanine 536901865 RO 1544 K On-line (and nothing in the VLDB about it). However, when I try do make shadow readonly vols or shadow vols which are readonly, I'm not as successful: # vos shadow H.haba.test.alanine -fromserver beef.stacken.kth.se -frompartition c -toserver bananshake.stacken.kth.se -topartition a -toname H.haba.test.alanine.readonly -readonly -local -verbose vos: the name of the root volume H.haba.test.alanine.readonly exceeds the size limit of 22 (I would like to that this would result in the same result as if I would do a addsite, release, remsite which would leave a stranded unknown readonly copy with the .readonly suffix on the added server) Then am I right, that -toname in this kind of usage has to be used together with -toid because vos shadow can not make up an ID in the VLDB for something that then should not exist in the VLDB? # vos shadow H.haba.test.alanine -fromserver beef.stacken.kth.se -frompartition c -toserver bananshake.stacken.kth.se -topartition a -toname X.haba.test.alanine -readonly -local -verbose VLDB: no such entry Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: How to remove a bogus (127.0.1.1) server entry for readonly volume?
All of the places in the code tree that filter eventually test each address with rx_IsLoopbackAddr() which is defined in rx.h. When I look at vos.c, then I find GetServer which uses rx_IsLoopbackAddr. Now let's assume we feed that something that resolves to loopback. That will be detected at #1, and as a second route we look up the local hostname we are on. But if that at #2 is STILL loopback, that goes through at #3... GetServer(char *aname) { struct hostent *th; afs_uint32 addr; /* in network byte order */ afs_int32 code; char hostname[MAXHOSTCHARS]; if ((addr = GetServerNoresolve(aname)) == 0) { th = gethostbyname(aname); if (!th) return 0; memcpy(addr, th-h_addr, sizeof(addr)); } if (rx_IsLoopbackAddr(ntohl(addr))) { /* local host */ #1 code = gethostname(hostname, MAXHOSTCHARS); if (code) return 0; th = gethostbyname(hostname); #2 if (!th) return 0; memcpy(addr, th-h_addr, sizeof(addr)); #3 } return (addr); } I think there should be a is this still $#%^* a loopback addr test just before return(addr). Does that sound correct? Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: How to remove a bogus (127.0.1.1) server entry for readonly?
$ more hosts 127.0.0.1localhost 127.0.1.1peter.cae.uwm.edu peter I know various Linux distributions do this by default, ... So at least the AFS server installed from the .debs for these distros really should cope with that fact. but that's only because they don't have a way of knowing what the real IP is for the system, and they want the hostname to be able to resolve. For a real server system, you'd want the hostname to resolve to the actual public IP you use for it (either put a real IP in there, or rely on DNS). Otherwise, various tools where you specify the name 'peter' will get resolved to 127/8, and that's not good if we're storing that in a distributed database like for AFS. And not good at all if _all_ your servers think they are 127.0.1.1 and have that volume... If this should be catched by the avoid 127/16 code path then there must be a bug somewhere. My first guess would be host vs network byte order before even looking at the code. I'll see if I can find a test server that I can break... Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: How to remove a bogus (127.0.1.1) server entry for readonly volume?
The reason it doesn't work is because 'vos' has logic to convert any localhost-y address to the address the local hostname resolves to, to try and avoid people from adding localhost addresses into the vldb. However, you shouldn't need to do this, and I'm a little confused as to how you got the vldb in this state. If you know what fileserver it is that registered the 127.0.1.1 address, and you NetRestrict it, when you bring the fileserver up, it should register its addresses properly in the vldb, and you wouldn't see that entry for 127.0.1.1 again. Not in the listaddrs list but will it go away from the volume location list? But you also shouldn't need to NetRestrict that address, since the code for detecting the local addresses should ignore loopback-y addresses when the fileserver registers its addresses. Is there any more information you can provide on the server that did this, or how you got the vldb in this state? I supect that 127.0.1.1 is not loopback-y enough for the code which might only detect 127.0.0.x or 127.0.0.1. Then there might be different tests for loopbackness in different parts of OpenAFS. This seems to bite everyone who installs the Debian or Ubuntu packages on a non-modified server which has 127.0.0.1 localhost 127.0.1.1 myhostname in /etc/hosts and then does a vos create myhostname or vos addsite myhostname on the same server (which is a natural thing to do). So if you do the remsite on the server who has the 127.0.1.1 in /etc/hosts it should work. I think the AFS code should reject everything that smells loopback which means the whole 127.0.0.0/8. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: How to remove a bogus (127.0.1.1) server entry for readonly volume?
Not in the listaddrs list but will it go away from the volume location list? I don't understand this sentence. Is the 127.0.1.1 deleted or replaced as a replication site location if the server registers as 129.89.38.124 when previously known as 127.0.1.1. No, OpenAFS 1.6.5 detects 127/16; that is, 127.0.x.x. See src/rx/rx.h, rx_IsLoopbackAddr (in this specific case, called from GetServer in src/volser/vos.c). This is a kind of balance (or an argument compromise :) between trying to catch all loopback addresses an administrator is likely to accidentally specify, but allowing a range of 127/8 addresses if you actually do want to use loopback for testing or something. Then something in the server startup should probably check the same range. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help migrating to ubuntu from Solaris
Always before, I copied the contents of /usr/afs/etc to the new machine My guess is that you have copied it to the wrong place. It's not the same. The deb should contain info about the correct locations, probably /etc/openafs/server/ Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] ZFS-on-Linux on production fileservers?
* Are you using ZFS-on-Linux in production for file servers? Yes. * If not, and you looked into it, what stopped you? Long there was fear and doubt, but the (not) quality of HW-Raid solutions and hassle of Linux SW-Raid convinced us that it could not be worse with ZFS. * If you are, how is it working out for you? It does. The are-the-zpools-OK reporting could be more comfortable and the zpool status output is not compatible to the old one. But compared to problems we had before, that's a no-brainer. Don't be surprised if raidz needs CPU power to calculate the checksums, so you need to get the balance between I/O and CPU/cores right for the raidz level you want. ext3/ext4 people: What is your fsck strategy? Before that we used xfs on HW- and SW-Raid. We had no problems with the xfs part of it. However we felt all the time that the possible max log sizes were 1990-ish. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Questions about multihoming servers
Thanks Jeffrey, I think that was a good summary. 2. Resiliency to network interface or switch failure If you want that for your DB servers, you'll have do something like this: 1. Assign a routed IP adress (/32) to a loopback interface. 2. Make sure that this IP is reachable even if you have the failures you want to cover with this setup. 3. Configure your server (NetInfo/NetRestrict) to only use that adress. That will give you redundancy at level 3. I will probably do that as well for some of my fileservers as I trust OSPF more to do the right thing than fileserver and cache manager (sorry ;) Then there are a lot of other ways to give you redundancy at level 2 (which I not facy ;) Related: I think it would be nice if there would be some caching so that vos would not need to figure out at every invocation that it can not reach a particular server. Has anyone already written such code? Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] file server crashed
Sep 2 13:47:28 afsfs03 kernel: fileserver[8563]: segfault at 106c46f8 ip 0043f448 sp 7fd0f75fde10 error 4 in fileserver[40+b3000] Sep 2 13:47:29 afsfs03 abrt[12333]: Saved core dump of pid 8534 (/usr/afs/bin/fileserver) to /var/spool/abrt/ccpp-2013-09-02-13:47:28-8534 (211034112 bytes) Yes, that core dump would be nice to have for analytics. Sep 2 13:47:29 afsfs03 abrtd: Directory 'ccpp-2013-09-02-13:47:28-8534' creation detected Sep 2 13:47:30 afsfs03 abrtd: Package 'openafs-server' isn't signed with proper key Sep 2 13:47:30 afsfs03 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2013-09-02-13:47:28-8534' exited with 1 Sep 2 13:47:30 afsfs03 abrtd: Corrupted or bad directory /var/spool/abrt/ccpp-2013-09-02-13:47:28-8534, deleting Seems like abrtd deleted it. Looks to me that this abrtd process is not doing you a favour. Can you have a look if it's really gone? Do you have DeleteUploaded=yes in your abrt.conf? My Scientiffic Linux 6.0 does not run an abrtd, so I don't know that much about it. Is that automatically installed in 6.3 and with what conf? openafs-server-1.4.14.1-1.1.x86_64 I am running openafs-server-1.6.5-145.sl6 on SL 6.0. From the sl-security repo. Is there any reason to stick with 1.4.14? Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] file server crashed
I checked the dump file, however, it was deleted by abrtd. :-( But I didn't got the key lines DeleteUploaded=yes Do you mean to add this line manually? I don't know. openafs-server-1.4.14.1-1.1.x86_64 I am running openafs-server-1.6.5-145.sl6 on SL 6.0. From the sl-security repo. Is there any reason to stick with 1.4.14? I just follow the specification of CERN and many tests on it passed. Then the guys from CERN should answer instead ;-) Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] vos shadow to backup user homes
However the main reason I'm replying is your comment about RAID. IMO, anytime you're configuring a mission-critical system without RAID you're probably asking for future headaches. My experiences with RAID, especially HW-Raid is mixed. Last week I got an DL360 G4 with some built-in HW-RAID(5) that returned read errors from the RAID to the OS without failing any drive(s). On an email server. Looks to me like a serious bug in the RAID-firmware. I even have found high-end RAID (Rio) whose memory did not deploy any ECC. One device did make stripes of zeroes into every block that got through it. Then there are HW-RAIDs which can detect silent bit-rot on your HDs and some that can't. All of my database and fileservers currently use hardware raid (3ware or LSI/PERC). But one of my idle time projects -- a bit of an inside joke since I have no idle time -- is to play around with ZFS on linux to see if I feel it's ready for prime time yet or not. I currently trust linux SW RAID (MD) and ZFS more than any HW RAID. So almost all our file servers have been migrated from HW-RAID to SW-RAID or ZFS. Plan is to complement that with shadow volumes for some volumes which have data that need an way of instant resore. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Heimdal KDC bug mentioned in rekeying document
The versions where I have seen the problem were: * 1.5.2 master on Solaris and slave on amd64 FreeBSD * 1.3.3 master and slave on i386 OpenBSD The patch which changes the abort() to a warning is at file:///afs/pdc.kth.se/public/ftp/outgoing/heimdal-1.3.3-kadmlog.patch ftp://ftp.pdc.kth.se/outgoing/heimdal-1.3.3-kadmlog.patch Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Heimdal KDC bug mentioned in rekeying document
Because I'm doing lots of updates to 1.5.2 patched with the patch I posted, using kadmin from 1.6~git20120403+dfsg1-3, and having no trouble. That's good. I will have to double check versions of everything. Maybe I'm confused, maybe there is another patch at another place in there, that prevents the failure to happen. What type of update? What I understand from the reports I got, some verson of kadmin sets something called policy after setting attributes. The policy is set to default whatever that means. kadmin mod haba Attributes [requires-pre-auth, disallow-postdated]: ENTER Policy [default]: ENTER On Ubuntu 13.04: This is kadmin 1.5.99 (as it calls itself :-() or 1.6~git20120403+dfsg1-2 as the package version is called. If you have the bug: This policy change to default for the principal is then propagated through iprop from the master to the slave. The recieving end then calls abort() on the unknown content in the iprop modify. It does not fail if you use hprop. So the test for the bug is to set up a system with master and slave and then issue a mod like above, containing the policy change to default. If your ipropd-slave then aborts, you have the bug. If not, is has been fixed somewhere in the chain kadmin-kadmind-ipropd-master-ipropd-slave. Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Heimdal KDC bug mentioned in rekeying document
You should package the tip of heimdal-1-5-branch. Agree. But you might want to know: Your slaves will abort() if you update a pricipal with the Heimdal kadmin shipped with modern Debian/Ubuntu That one was cut from some snapshot. To fix that you will need another patch. We have one, but that only fixes the abort in the slave code instead of the cause in the kadmind. I think I worte something about that on the Heimdal list and maybe I should dig out that thread again... Harald. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info