Re: [OpenAFS] Changing host- and domainname

2024-01-22 Thread Harald Barth


> Yes, that helped, see also my previous message. Our problem was, that
> we didn't look for the NetInfo file in /var/lib/openafs/local/, but
> only in /etc/openafs.

Hint from the trenches: If you want different services to bind to
different interfaces (for example fileserver to one IP and database
servers to other interface) that can be done by having the binaries
looking for differently named NetInfo files containing different IP
numbers and using -rxbind. For examle NetI.db and NetI.fs. These file
name examples happen to have the same length as NetInfo. Btw, it is
bad practice to patch binaries without documenting it and I would be
happier if there would be another way to do it, but I never came
around to write a patch for that. Just saying ;-)

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Changing host- and domainname

2024-01-22 Thread Harald Barth


Does it help to add the IP addr you want to the NetInfo file (creating
on at the right place if it does not exist?)

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: openafs versus systemd

2023-06-09 Thread Harald Barth


I think a step-by-step guide how to run an Ubuntu 22.04LTS and 23.04
desktop along with OpenAFS would be very much appreciated because I
hear that folks are struggling with this and as it "is not possible"
do use that argument to "then we can not run AFS - period".

Harald.


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] aklog: unknown RPC error (-1765328370) while getting AFS tickets

2022-09-14 Thread Harald Barth


> Good to know, in my case I am setting up new kerberos realm and new
> OpenAFS cells just for testing.  This ambiguos afs principal is good
> for me, but maybe not enough for other people.

Use the afs/cell-name. It has worked for me for years in different
setups. It's better. Listen to Jeff (if not to me ;-)

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Limiting mount point to known cells

2022-08-29 Thread Harald Barth


I would look for the AFSDB RR DNS lookup in the code and somehow
prevent that names without dot in the middle are looked up - just fail
it.

But there are folks who are much more familiar with the code that me.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Limiting mount point to known cells

2022-08-29 Thread Harald Barth
> I seem to remember seeing many paths of the form /afs/cs/ or /afs/ece/
> where the full cell names were cs.cmu.edu or ece.cmu.edu.

But probably "ece" was entered into CellServDB and not into DNS.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Limiting mount point to known cells

2022-08-27 Thread Harald Barth
> In the same thread, a blacklist (or whitelist) of cell names was
> suggested to prevent afsdb queries for troublesome domains but it
> seems it never got implemented.

If the blacklist specification is visible and not hidden
in some new magic file, I think that would be good.

My suggestion would be to add the possibility to specify
this in CellServDB.

>git BLACKLIST

or something like that. Because then anyone who wants a cell named
"git" (you never know the users' wishes) would see this when looking
through CellServDB to determine why it does not work as expected.

I am normally not for blacklists, but what can you do?

But wait a moment... Can't we assume that all cell names that we
ask in DNS contain at least one dot "." in the middle? I doubt
that there are AFS cells named without dot that we need to
resolve with DNS. What do you think about that?

Harald.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Networking AFS related problem

2022-02-04 Thread Harald Barth
> I actually solved the problem in a very dumb way. Turns out I had never
> rebooted my phone or cycle my mobile connection since the problem
> appeared. I just rebooted and the problem was gone.

Well, such things happen. I had a "home router" provided by my ISP at
some point in time which never got any firmware updates as the ISP
relied on router reboots for that to happen. And mine was on UPS and
did never reboot ever, so I did see errors which already were fixed
but did not have made it into my box. Took a while to find that one.

> Anyway, thank you everyone for all your help! It turned out to be kind
> of a waste of time, sorry for that.

Well, I hope we all learned a litte more about AFS UDP transport in
the process (Ok, Jeff, you already know everything ;-) ;-) ). I don't
feel that you wasted my time, it's a give and take.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Networking AFS related problem

2022-02-02 Thread Harald Barth


Hi Jeff!

> It is unlikely that an ISP is blocking UDP traffic.

For some value of "ISP". I have been to Karolinska Institutet who did
supply Internet through the same "eduroam" cooperation as my home
university. However, the "AFS experience" was totally different
as in "non existent" on that "eduroam" as they had implemented ...

> The most likely
> causes are a poorly implemented firewall

...firewall rules which blocked most of the UDP ports.

> It has been reported[1] that more than half of the web browser
> connections from Chrome browers to Google's servers are performed
> using the UDP based QUIC protocol.

But I bet chrome will work around the situation "no UDP" to a much
greater extent than AFS.

Greetigs,
Harald.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Networking AFS related problem

2022-02-02 Thread Harald Barth


Yes, the first packets are quite small until you start to actually
transfer chunks of files.

If you don't see any traffic coming back then something is
blocking On the server (the VLDB server(s) would be the first ones
to be contacted as you see) you can see if outgoing or incoming is
blocked.

If NAT is involved, i can be broken NAT as well.

I guess your IP provider lives in the IT world of 2022 where "Internet
service" consists of mostly TCP/HTTPS and definitely not UDP ;-)

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Networking AFS related problem

2022-02-02 Thread Harald Barth


> Since AFS is working perfectly as soon as I change my WiFi connection
> to other connections, and the cell is perfectly reachable via a remote
> machine, I think the problem is with my ISP, but I don't have enough
> knowledge about AFS or networking to pinpoint the problem precisely.
> (Of course, "normal" web browsing is perfectly okay even with this
> problematic connection.)

Whem this occurs, then I am always reminded that AFS uses UDP over
it's own ports and not TCP with http or https.

> I would be very grateful if anyone can help me gather some more
> debugging info about this problem so I can give it to my ISP for them
> to fix it.

I would start with wireshark and filter on UDP ports 7000 and 7001,
then look if the packets are going out and if you get answers back
from the file servers and if you do if they are complete. Sometimes
bad networks drop or truncate UDP packets (either in or out) and if
they only truncate them, it can help to make your MTU size smaller, so
that not all default 1500 bytes are used. I test like 1400, 1200, ..
Restart afsd with additional parameter -rxmaxmtu SIZE.

Good luck,
Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Fwd: CRITICAL: RHEL7/CentOS7/SL7 client systems - AuriStorFS v2021.05-10 released > OpenAFS versions?

2021-11-12 Thread Harald Barth


> Any version of OpenAFS cache manager configured with a disk cache ...

But mem cache OK?

Regards,
Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Redux: Linux: systemctl --user vs. AFS

2021-08-06 Thread Harald Barth


> We have been using Kerberos for a LONG time; over 20 years.

Hi Ken! Nice to hear from you :-)

> A long time ago we ran into issues with widespread Kerberos ticket theft
> from attackers, due to the quite-common usage at that time of Kerberos
> tickets being stored in files.

So why is storage in files so much more dangrous than storage in
memory? If one happens to get a process which can read the files in
local /tmp, why could that process not modify any of /proc//mem
on the same computer to get at the ticket cache anyway? 

OK, one benefit of memory is that it is automatically destroyed when
no processes accesses it any more. But other than that?

Harald.

PS: Currently I'm dealing again with the "uid is security enough"
people which are showing every time one buys a product together
with the software (vendor does not offer feature "kerberos" bla
bla bla ...)
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] 14 Jan 2021 08:25:36 GMT Breakage in RX Connection ID calculation

2021-01-18 Thread Harald Barth
>  now : 3x 1.6.22.1
>  intermediate: 2x 1.6.22.1 + 1x 1.8.7
>  intermediate: 1x 1.6.22.1 + 2x 1.8.7
>  final   : 3x 1.8.7

I have for many years upgraded db severs in this manner. So I expect
that it works this time as well. Actually we are currently runnig a
mix of 1.8 and 1.6 without problems on our DBs. I don't think I ever
had problems which were related to different versions. I try to do it
in an order which here would make the 1.8.7 the new syncsite in step
2. Then test if everyting works as expected because from that point
you can still back out and get quorum again with 2x 1.6.22. If there
have been problems with db server upgrades it has been because of
errors between chair and keyboard, like typos in config files.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


vos/bos and the kernel (Re: [OpenAFS] 2020 AFS Technologies Workshop Cancelled.. kafs update)

2020-04-06 Thread Harald Barth


(I did cut down the cc list a little.)

> The OpenAFS bos, vos and backup commands can be run from the client too, I
> think, since they don't require any interaction with the afs kernel module.

They use the token. I think they _could_ make an own token from the
ticket but don't.

Another thing vos is unaware of is the status of the servers over
time. Even if the kernel knows that certain servers are down, vos does
not and will still try to contact them and timeout. And the next
invocation of vos _again_.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] vos release not as "smart" as I thought - or is this a not implemented feature?

2019-11-28 Thread Harald Barth
> https://lists.openafs.org/pipermail/openafs-info/2019-September/042865.html

Hi Ben!

I read it previously but did probably not understand. Does this mean that it 
always 
pulls everything from "last update" minus 15 minutes even if the last pull was
long _after_ the creation date of the readonly? Does that make sense?

Here I have:

RO dates
CreationMon Nov 25 14:11:46 2019
CopyMon Nov 25 10:51:37 2019
Backup  Mon Nov 25 01:17:02 2019
Last Access Mon Nov 25 10:57:40 2019
Last Update Mon Nov 25 10:57:40 2019
 
RW last update:
Last Update Mon Nov 25 10:57:40 2019

But the logic does not seem to look at the RO creation
time? 


$ vos rele H.haba.android  -verbose -c stacken.kth.se

H.haba.android 
RWrite: 536916074 ROnly: 536916075 Backup: 536916076 
number of sites -> 3
   server bananshake.stacken.kth.se partition /vicepb RW Site 
   server bananshake.stacken.kth.se partition /vicepb RO Site 
   server vaniljshake.stacken.kth.se partition /vicepb RO Site 
This is a complete release of volume 536916074
Re-cloning permanent RO volume 536916075 ... done
Getting status of parent volume 536916074... done
Starting transaction on RO clone volume 536916075... done
Setting volume flags for volume 536916075... done
Ending transaction on volume 536916075... done
Replacing VLDB entry for H.haba.android... done
Starting transaction on cloned volume 536916075... done
Updating existing ro volume 536916075 on vaniljshake.stacken.kth.se ...
Starting ForwardMulti from 536916075 to 536916075 on vaniljshake.stacken.kth.se 
(as of Mon Nov 25 10:57:38 2019).
updating VLDB ... done
Released volume H.haba.android successfully

And after we got a new creation date yet more in the future (compared
to the Last Update date):

H.haba.android.readonly   536916075 RO3062302 K  On-line
vaniljshake.stacken.kth.se /vicepb 
RWrite  536916074 ROnly  0 Backup  0 
MaxQuota   5000 K 
CreationThu Nov 28 09:56:40 2019
CopyMon Nov 25 10:51:37 2019
Backup  Thu Nov 28 01:17:02 2019
Last Access Mon Nov 25 10:57:40 2019
Last Update Mon Nov 25 10:57:40 2019

Does that mean I should look at the horriffic source code of vos
again?

Cheers,
Harald.

PS: 

Jeff> The behavior you are expecting is implemented
Jeff> by AuriStorFS vos

And for various "reasons" (Beer often helps to understand these kind
of reasons) PDC/KTH could not be convinced to shop that (yet?). But
that would not have helped for the Stacken cell anyway.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Question regarding vos release and volume

2019-08-11 Thread Harald Barth
> root.afs
> RWrite: 536870915 ROnly: 536870916
> number of sites -> 2
>server server1.mydomain.dmz partition /vicepa RW Site
>server server2.mydomain.dmz partition /vicepa RO Site

If you want added redundancy and not all accesses to the RO only on server2,
you need to add a RO volume like this:

> root.afs
> RWrite: 536870915 ROnly: 536870916
> number of sites -> 2
>server server1.mydomain.dmz partition /vicepa RW Site
>server server1.mydomain.dmz partition /vicepa RO Site
>server server2.mydomain.dmz partition /vicepa RO Site

This will not use much extra space as the RO and RW on server1 will share data
as long as it is not modified.

Then to your problem... 

What does vos listvol say about vicepf on both servers?

Do you have the /vicepf/V0.vol files? 

Can you vos dump the volumes from the specific server and partition?

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] auristor client with AFS servers, timeout at aklog

2019-06-07 Thread Harald Barth


Hi Jeff, hi Måns!

> Create separate IPv4 A records to refer to your hosts and list those in
> the DNS SRV records instead of the hostname that includes both A and
>  records.

That's of course how it should be.

> The AuriStorFS rx stack will terminate calls within one second if an
> ICMP6 port unreachable response is received.  I wonder if the 10 second
> delay is due to ICMP6 packets being firewalled.

I know that Måns can operate tcpdump to determine that ;-)

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] About `dafileserver` vs `fileserver` differences (for small cells)

2019-03-09 Thread Harald Barth


> However is it still "safe" and "advised" (outside of these
> disadvantages) to run the old `fileserver` component?

I would recommend everyone to migrate to "da" and not recommend to
start with anything old. For obvious reasons, all the big
installations will migrate to "da" and you don't want to run another
codebase, don't you?

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] kafs client bugs

2019-03-07 Thread Harald Barth


> From just the filenames, I don't see what some of the tests are meant to do -
> take "discon-create" for example.  This seems to be using some feature of Arla
> that isn't in OpenAFS.

You found the test of the Arla disconnected mode AFS feature (which
was not 100% finished but you could tell arla that it was running in
disconnected mode and then if you were lucky enough that your files
were in cache you could continue to read them).

> Would it be possible to import some of these tests into my kafs-client
> package?  I'm not clear what the licence on them is.

I think some of the tests are so trivial (write1) that everyone would
have done it like that. Others which are more elaborate like
write-6G-file.c have a BSD style license like most of Arla but the
Linux kernel module which is GPL/BSD dual licensed. So you probably
should not distribute the BSD licensed stuff under the GPL but nothing
forbids seperate distribution and usage of the tests for other file
systems. I have even found local file system problems with these tests.

And as usual: I am not a lawyer, so this is how I interpret what's in
the source ;-)

I would just run all of the tests to start with and of the ones
that fail decide one of
 * fix afs
 * fix test
 * delete test

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] kafs client bugs

2019-03-06 Thread Harald Barth


Hi David, remember Arla? There are a lot of tests of which many may be useful 
to run
on kafs as well. Some test for the command line tools may be different though.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Upgrade of server by moving drives

2019-02-26 Thread Harald Barth


Hi Mike and Michael!

You have the chance to confuse everyone if you write your emails
From: Mike and sign below with Michael and the other way around.
:-) ;-)

MM> FYI, The 'vos remaddrs' command has been available since OpenAFS 1.6.16 and 
can
MM> be used to remove unreferenced server entries in the VLDB.

You're right. Time flies and it made it into 1.6.16 which by now
should be wide spread enough. Good.

However, I admit, I still have systems where the client is based on
older AFS releases where it's hidden in changeaddr -remove.

MK> It works as i expected (...)
MK> Nice piece of Software.

You're welcome.

Harald.


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Upgrade of server by moving drives

2019-02-25 Thread Harald Barth


> The "vos syncserv" and/or "vos syncvldb" commands can help with this
> (but I always have to read the docs to remember which to use).

If the volumes on the server are the real world and the VLDB is the
map to find them, then

syncvldb adjusts the map to match the world
syncserv adjusts the real world to match the map

I mostly use syncvldb and delentry to adjust the VLDB and never found
a good use for syncserv. If there are old duplicates of volumes not in
the VLDB which I really want to remove that can be done with vos zap.

And I forgot to mention that 7. Clean up the temp IP/sysid in the VLDB
is done like this:

That command is not named vos removeaddr but hidden in vos changeaddr -oldaddr 
A.B.C.D -remove 

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: Upgrade of server by moving drives,[OpenAFS] Upgrade of server by moving drives

2019-02-25 Thread Harald Barth


> We would prefer to move the array containing our vicep* partitions from
> the old to the new server and bring that one up.

Yes, that is possible. Nothing on the /vicep* says what server it was
on, that info is only on the VLDB(s) in the DB servers. The identity
of the file server is the uuid ( vos listaddr -printuuid ). Then a
fileserver has one or more IPs. The uuid is on the fileserver stored
in the sysid file.

If you have the DB and file servers on seperate computers i makes the
procedure shown below easier because you can do one thing at the time.
As long as you have another DB server up and running it should work as
well with a DB server on the same computer but I have not tested this
as we always have fileservers and DB servers seperate.

This is how I upgrade a fileserver without move (backups taken
and verified before you start yada-yada-yada):

1. Make new fileserver with temporary IP and test
   with temporary /vicepa

2. Shutdown fileserver process (but keep computer up) and remove test
   /vicepa

3. Copy (overwrite) the sysid file from the old fileserver (most
   likely /var/lib/openafs/local/sysid) to the new one

4. Shutdown old file server process and remove the sysid file and IP
   numbers so I don't start a duplicate by accident. Then shutdown OS.

5. Connect storage to new file server and make it visible as exactly
   the same /vicep* as on the old one. This may need a reboot depending
   on the storage. Move IP numbers to new server.

6. Start new fileserver process with the same sysid and IP of the old
   one. The new fileserver now looks like the old one to the VLDB.

7. Clean up the temp IP/sysid in the VLDB.

8. Enjoy 
   (you had downtime from 4 to 6 but that might be done quite fast)

If you screw up with the sysid or want to connect differnent /vicep*
to different future servers, (that can not be done with the sysid
trick), that can be repaired with the vos syncvldb command.

DB servers you upgrade by having several and then upgrade the one that
is not the current syncsite. When starting empty, that non-syncsite
will then be recloned automatically. The udebug command is your friend
there to monitor what's going on.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] server crash / moving volumes without vos move

2018-12-17 Thread Harald Barth


> So my idea is to mount the vicepa (from external SAN) from the crashed
> file server to the working fileserver as /vicepb so i could get the
> volumes and register the volumes to the working afs fileserver.

Yes, that will need a file server restart to make the working file
server look for a new partition.

Then you can see all volumes with vos listvol. 

Then read the vos syncvldb man page and continue from there, you can
restrict the syncvldb to partition b.

Viel Glück,
Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Red Hat EL Support Customers - Please open a support case for kafs in RHEL8,Re: [OpenAFS] Red Hat EL Support Customers - Please open a support case for kafs in RHEL8

2018-12-08 Thread Harald Barth


> While getting tokens at login work with this setup, things start to fail
> once the users $HOME is set to be in /afs. While simple scenarios like
> pure shell/console logins work, graphical desktop environments have lots
> of problems. XFCE4 doesn't even start, Plasma works to some degree after
> presenting lots of error dialogs to the user.

Is this a problem due to AFS or due to the startup of the graphical
environment which nowadays may involve systemd --user services instead
of running all processes in the same session?

> Seems there's still some work to do until this becomes an alternative
> for the standard OpenAFS client.

It may make a "beta", so yes.

> So I wonder why RH customers would want that?

RH decided that their customers wanted systemd ;->

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Red Hat EL Support Customers - Please open a support case for kafs in RHEL8

2018-12-07 Thread Harald Barth


Hi Jeff, hi David!

Has it been 17 years? Well, we are all getting - mature ;-) 

Obviously a file system is ready for use if it's old enough to buy
liquor (which difffers a little between countries).

> When opening a support case please specify:
> 
>  Product:  Red Hat Enterprise Linux
>  Version:  8.0 Beta
>  Case Type:Feature / Enhancement Request
>  Hostname: hostname of the system on which RHEL8 beta was installed

We have a hen and egg problem here: Why would I install 8.0b on
 if it does not have kafs? Install a Fedora
test, sure, but RHEL 8.0b?

> If you are eligible, please attempt to open a support request by
> December 11th.

3 workdays. Optimistic.

> As part of the Linux kernel source tree, kafs is indirectly supported by
> the entire Linux kernel development community.  All of the automated
> testing performed against the mainline kernel is also performed against
> kafs.

But the automated testing does probably not (yet) fetch a single file
from an AFS server. (Compare to how the gssapi-key-exchange features
in ssh are never tested in the ssh/scp shipped with distributions as
the testing never fetched a single file with that feature - with known
results). Testing that requires infrastructure is a lot of work to
automate.

Sorry, I may sound much more pessimistic here than I am actually are.
This _might_ fly. I wish :-)

Season's greetings,
Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Offline migration of AFS partition files (i.e. the contents of `/vicepX`)

2018-12-07 Thread Harald Barth
> Am I safe to assume that on Linux only the "new" partition layout is
> used?  (Is there a way to check which layout am I using?)

Yes (under the old layout, you don't even see the files as they are not
hooked into directories).

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Offline migration of AFS partition files (i.e. the contents of `/vicepX`)

2018-12-05 Thread Harald Barth


> Can I safely `rsync` the files from the old partition to the new one?

For Linux (The "new" server partition layout):

If the file tree really is exactly copied (including all permissions
and chmod-bits) then you have everything you need. This was not true
for the old file system layout for example in SunOS UFS.

I would copy to a not-yet used partition, mount it then as /vicepY
(where Y is a new unused letter) and then as the first thing when
startting the server run a salvage with the options 
-orphans attach -salvagedirs -force

Harald.

PS: I like zfs
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] problems with ubuntu 18.04 client

2018-10-04 Thread Harald Barth


>> Are you in the same or different PAG?
> 
> Hmm, i think that's not the reason,
> if i login into the same Computer, the tree /afs/desy.de/user is also
> missing for me ...

Yes, but a new login probably gives you a new pag and a new security
context. Question is if the old pag and security context still
continues to work and if the answer is yes why in that case that
differs from a new pag.

Any complaints in the file server log about the client in question?

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] problems with ubuntu 18.04 client

2018-10-04 Thread Harald Barth
> In an old terminal (where afs was running well) everyhing seems to be
> ok, create files,folder, pwd... etc)

Are you in the same or different PAG?

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] 1.8.0: Cache inconsistent after vos release

2018-05-26 Thread Harald Barth

> I was able to reproduce this today and have submitted a patch for review:
> https://gerrit.openafs.org/#/c/13090/
> This patch works in my testing.

Mark, that's just great.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] 1.8.0: Cache inconsistent after vos release

2018-05-23 Thread Harald Barth

We see problems when doing a vos release that the 1.8.0 clients "miss"
that the ro volume has changed. We think this is repeatable.

Our setup is:

* server version OpenAFS 1.6.22 2017-12-07 
  Centos
* client version OpenAFS 1.8.0  2018-05-08
  Both on Centos and Ubuntu 

As it works with older AFS _clients_ and the same server, we suspect
that this is a 1.8.0 issue. An fs flushv fixes it.

I am not aware of any firewalls or NAT between server and client.

Before digging deeper: Has anyone of you seen this as well? If not,
you might want to test this if you think about moving to 1.8 or
already have.

Thanks,
Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] question about authentication with kerberos and Default principal

2018-03-03 Thread Harald Barth
> Does  heimdal-klist use /etc/krb5.conf or does it use some other
> configuration file? I'm worried I did not set up a config file.

It should use /etc/krb5.conf as well unless KRB5_CONFIG is set.

You should have something like:

[libdefaults]
default_realm = YOURDOMAIN

in there.

> [gsgatlin@localhost ~]$ /usr/bin/heimdal-kinit gsgatlin

or use 

/usr/bin/heimdal-kinit gsgatlin@YOURDOMAIN

> Also, going back to the krb5 kinit, how can you specify a FILE: ticket
> cache type ?

Both MIT kinit and heimdal kinit honor the KRB5CCNAME environment
variable which has the form TYPE:location thus a typical way to set
your FILE cache is:

export KRB5CCNAME=FILE:/tmp/krb5cc_`id -u`

Btw: As FILE: is the oldest ticket cache type and the default, any
file name will do. For example:

export KRB5CCNAME=/tmp/whatever

will set it to /tmp/whatever

Greetings,
Harald.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] question about authentication with kerberos and Default principal

2018-03-03 Thread Harald Barth

Hm. If I remember correct, at least parts of the kerberos ticket in
the ticket cache are endian dependent. As the principal name seems to
be broken to start with, maybe the error is there. Do you have the
same problems if you use the FILE: ticket cache type or the kinit and
afslog from heimdal to handle tickets and tokens?

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] permission issue when trying to switch kerberos realms.

2018-01-17 Thread Harald Barth

I wrote 

>>I actually don't know how high a kvno can be but up to 32767 (2^15-1)
>>"feels" safe.

That was probably WRONG as Sergio pointed out to me.

Sergio wrote:
> It doesn't feel all that safe to me. True, RFC 4120 specifies the kvno as
> UInt32, but https://k5wiki.kerberos.org/wiki/Projects/Larger_key_versions
> makes interesting reading. Version 1.14 isn't all that old; Debian 8 only
> has version 1.12.
> 
> Maybe if one requires rxkad-k5 it's OK to have kvno>255, but back in
> Kerberos 4 days it definitely wasn't. The OpenAFS code base still contains
> things like
> if (kvno > 255)
> return KAANSWERTOOLONG;
> (in src/kauth/krb_udp.c) and
> @t(kvno)@\is a @b(one byte) key identifier associated with the key.  It
> will be included in any ticket created by the AuthServer encrypted with
> this key.
> (in src/kauth/AuthServer.mss).

One byte. Auch.

So until rxkad-k5 (around the corner - just kidding) we are probably
stuck with that. So if you want to devide your KVNO space into two
parts, around 100 for each is what you get :-(

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] permission issue when trying to switch kerberos realms.

2018-01-15 Thread Harald Barth

Have the two service tickets different kvno (otherwise AFS can not
tell them apart) and are they both installed on your AFS servers?

I would start with a kvno of 1000 or so for the new cell which
hopefully leaves enough headroom for keying the old cell if
necessary.

I actually don't know how high a kvno can be but up to 32767 (2^15-1)
"feels" safe.

Harald.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Security Advisory 2016-003 and 'bos salvage' questions

2017-04-04 Thread Harald Barth

Is there any reason why the -salvagedir requires -all?
We run dafs.

To minimize downtime I'd like to use this per volume or if that is not
possible at least per partition so I don't need to shut down the
complete fileserver for this. Ok, I can move one volume to a dedicated
salvage fileserver at a time and then out again, but that is tedious.

# bos salvage -server sterlet -partition  a -volume M.probe.sterlet.a 
-forceDAFS -salvagedirs -orphans attach -localauth
 -salvagedirs only possible with -all.

This is our fileserver config:

# cat /usr/afs/local/BosConfig 
restrictmode 0
restarttime 16 0 0 0 0
checkbintime 16 0 0 0 0
bnode dafs dafs 1
parm /usr/afs/bin/dafileserver -udpsize 131071 -sendsize 131071 -nojumbo -p 128 
-busyat 1200 -rxpck 800 -s 2400 -l 2400 -cb 100 -b 480 -vc 2400
parm /usr/afs/bin/davolserver
parm /usr/afs/bin/salvageserver -datelogs -parallel all8 -orphans attach
parm /usr/afs/bin/dasalvager -datelogs -parallel all8 -orphans attach
end

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] vos dumps to S3 via S3 Storage Gateway?

2017-03-03 Thread Harald Barth

adsmpipe replacement:

/afs/hpc2n.umu.se/lap/tsmpipe/x.x/src/

Used with some scripts do put vos dumps into TSM archive. This is the
current backup solution for at least 3 AFS cells I know about.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Check free space on AFS share before login

2017-02-01 Thread Harald Barth


I think the problem is well known and what one would need to do is to
make (at every travesal of an AFS mount point) the OS aware of that
the AFS volume in question is a seperate "device". Then make the
statfs syscall on that path return the quota info from AFS. This has
of course to happen dynamically as you make your way through the AFS
space.

This would make every volume look as a seperate file system. There
are pros and cons in that approach.

I think noone has written the code (for Unix/Linux) yet, but the
Windows client might do this, but I'm by no means someone who knows
something about AFS on Windows ;-)

At our site, so far, is has been cheaper to multiply all quotas by 2
whenever the problem arose again.

Und Tschüß,
Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Additonal question about the OpenAFS Security Advisory 2016-003

2016-12-07 Thread Harald Barth

> AFS file servers store directory information in a flat file that
> consists of a header, a hash table and a fixed number of directory entry
> blocks.

That I was aware of.

> When a volume dump is constructed for a volume move, a volume
> release, a volume backup, etc. the contents of the directory files
> are copied into the dump stream exactly as they are stored on disk
> by the file server.

That was the part I was unsure about. Thanks!

> I hope this information is helpful.

Yes, indeed.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Additonal question about the OpenAFS Security Advisory 2016-003

2016-12-07 Thread Harald Barth

The security advisory says:

> We further recommend that adminstrators salvage all volumes with the
> -salvagedirs option, in order to remove existing leaks.

Is moving the volume to another server enough to fix this as well or
does the leak move with the volume?

Thanks for the help,
Harald.


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] [MacOsX] Issues with saving/removing files

2016-04-11 Thread Harald Barth

It would not be the first time that applications in MacOSX try to
determine themselves if they have file system permission (by issuing
stat(2) and looking at owner and unix bits) instead of access(2). So
fix your directory owner and/or permissions so that it _looks_ correct
UNIX-wise. Test chmod 777 (will of course not do any real change
security wise as the AFS permissions are in effect, but may fool the
finder)

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Request for Assistance with OpenAFS

2016-04-07 Thread Harald Barth

...but if the cell ist still on kaserver (without even v4 tickets), then
only the old AFS tools will do the trick. That was really a long time ago.


Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Request for Assistance with OpenAFS

2016-04-07 Thread Harald Barth

> ... I have tried various
> options in the /etc/krb5.conf file with no luck yet. Any help is much
> appreciated.

you might want to try 

[libdefaults]
 allow_weak_crypto = yes

in krb5.conf but if they the realm is v4 _only_, then you nay need a
krb.conf and old v4 tools. In that case, I would try good old Heimdal
0.7.2 from source.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Cross-platform DFS

2016-01-12 Thread Harald Barth

Do you have any special reason that your storage is spread out on
several OS:es? Looks like a quite small installation to be so
spread out.

> we have SSD disks spread across both windows and linux machines that we
> wanted to put together into a pool of storage, and ideally, mount it as NFS
> in all servers, so our software can access the NFS mount point and
> write/read data.

No, not as NFS. AFS mounts as AFS with it's own kernel module (in
Windows called "driver" I think).

> Ideally, we wanted to build a pool of storage with 42TB (combining all
> windows and linux servers), but without changing the windows servers to
> linux

What are you especially found of in these so that you want to keep them?

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Volume corruption or salvage error?

2015-10-12 Thread Harald Barth

I have a very odd thing about at least one volume. The volume is ancient:

# vos exa home.levitte.Mail
home.levitte.Mail 536904039 RW 530871 K  On-line
beef.stacken.kth.se /vicepa 
RWrite  536904039 ROnly  0 Backup  536905115 
MaxQuota 55 K 
CreationSun Mar  6 05:07:47 2011
CopySat Jul 18 16:22:24 2015
Backup  Mon Oct 12 01:25:24 2015
Last Access Thu Oct  8 10:10:38 2015
Last Update Wed Apr 17 14:27:05 2002
0 accesses in the past day (i.e., vnode references)

RWrite: 536904039 Backup: 536905115 
number of sites -> 1
   server beef.stacken.kth.se partition /vicepa RW Site 

There does not seem to be any change going on from $USER. The volume
was salvaged (routine salvage after a vos move) where some errors
were corrected in August:

SalsrvLog.2015-08-17.10:39:09:07/18/2015 16:24:27 dispatching child to salvage 
volume 536904039...
SalsrvLog.2015-08-17.10:39:09:07/18/2015 16:24:29 SALVAGING VOLUME 536904039.
SalsrvLog.2015-08-17.10:39:09:07/18/2015 16:24:29 home.levitte.Mail (536904039) 
updated 04/17/2002 14:27
SalsrvLog.2015-08-17.10:39:09:07/18/2015 16:24:30 Salvaged home.levitte.Mail 
(536904039): 100258 files, 530871 blocks

Since then, the volume has been cloned to a backup volume very often without 
error:

VolserLog:Mon Aug 24 01:24:39 2015 1 Volser: Clone: Recloning volume 536904039 
to volume 536905115
VolserLog:Tue Aug 25 01:23:11 2015 1 Volser: Clone: Recloning volume 536904039 
to volume 536905115
VolserLog:Wed Aug 26 01:24:53 2015 1 Volser: Clone: Recloning volume 536904039 
to volume 536905115
VolserLog:Thu Aug 27 01:22:00 2015 1 Volser: Clone: Recloning volume 536904039 
to volume 536905115
VolserLog:Fri Aug 28 01:24:22 2015 1 Volser: Clone: Recloning volume 536904039 
to volume 536905115
...
VolserLog:Sat Oct 10 01:25:23 2015 1 Volser: Clone: Recloning volume 536904039 
to volume 536905115
VolserLog:Sun Oct 11 01:25:28 2015 1 Volser: Clone: Recloning volume 536904039 
to volume 536905115
VolserLog:Mon Oct 12 01:25:24 2015 1 Volser: Clone: Recloning volume 536904039 
to volume 536905115

However, when the backup volume should be dumped for backup:

Wed Sep  9 01:25:05 2015 DumpVnode: volume 536905115 vnode 1 has inconsistent 
length (index 6144 disk 26624); aborting dump
Sun Oct 11 01:25:28 2015 DumpVnode: volume 536905115 vnode 1 has inconsistent 
length (index 6144 disk 26624); aborting dump

The rw clone can not be dumped either:

Mon Oct 12 11:20:31 2015 DumpVnode: volume 536904039 vnode 1 has inconsistent 
length (index 6144 disk 26624); aborting dump

So something is wrong in spite that salvage was "successful" and that
the volume header says that the volume was not changed since the
salvage. Has the salvage "forgotten" to update the size of vnode 1
where it created entries for orphaned files (according to the
salvagelog)?

I have two other volumes that are similar :-(

The data is on zfs which has not reported any problems with data loss.

Version is 1.6.9-2+deb8u2-debian (ok, a bit old) but I have not seen
anything in the release notes that this is known or fixed.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Apache2 and OpenAFS

2015-10-07 Thread Harald Barth

> Another (unsecure) way is to use IP-users in OpenAFS. That Ip gains
> access to the path and the whole system under that IP can access the data.

Hi Lars, long time no see.

And all systems faking that IP will get access as well. For that
answer you should be stuck behind double NAT for at least a week.
;-) ;-)

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Apache2 and OpenAFS

2015-10-07 Thread Harald Barth
We run our web server authenticated from a keytab. The keytab contains

# /usr/heimdal/sbin/ktutil --keytab=/etc/krb5.keytab.web-daemon list
Vno  Type Principal
  0  des3-cbc-sha1web-daemon/scat.pdc.kth...@nada.kth.se
  0  aes128-cts-hmac-sha1-96  web-daemon/scat.pdc.kth...@nada.kth.se
  0  arcfour-hmac-md5 web-daemon/scat.pdc.kth...@nada.kth.se

Then the webserver is started with heimdal kinit (which does all the
pagsh and renew magic) with that keytab:

# ps auxgwww | grep kinit
root 31751  0.0  0.0  39880  2100 ?SJul04   0:04 
/usr/heimdal/bin/kinit --no-forward --no-renew 
--keytab=/etc/krb5.keytab.web-daemon --afslog 
web-daemon/scat.pdc.kth...@nada.kth.se /usr/sbin/httpd -DNO_DETACH -D 
DEFAULT_VHOST -D SSL_DEFAULT_VHOST -D INFO -D LANGUAGE -D SSL -D CACHE -D 
MEM_CACHE -D DAV -D STATUS -D AUTH_DIGEST -D PROXY -D USERDIR -D REWRITE -k 
start

The web-daemon/scat.pdc.kth...@nada.kth.se principal maps to this PTS
identity (due to historical reasons the "/" is replaced with a "." in
the OpenAFS pts to pricipal naming mapping, there are folks on this
list who happen to know exactly why)

$ pts exa web-daemon.scat.pdc.kth.se -c pdc.kth.se
Name: web-daemon.scat.pdc.kth.se, id: 65531, owner: system:administrators, 
creator: haba.admin,
  membership: 4, flags: S, group quota: 20.

Then all web-daemons.x.y.z are member in this group:

$ pts mem web-daemons  -c pdc.kth.se
Members of web-daemons (id: -32225) are:
  web-daemon.wrasse.pdc.kth.se
  web-daemon.schelly.pdc.kth.se
  web-daemon.scat.pdc.kth.se

Then you give web-daemons the appropriate permissions in the file system.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] backup - order of entries in volume set

2015-08-28 Thread Harald Barth
 Volume set userbackup:
  Entry 1: server .*, partition .*, volumes: user\..*\.backup
  Entry 2: server .*, partition .*, volumes: user\.backup

It's not clear to me if the specification should include the .backup
and/or .readonly suffixes or if the backup program does add the
.backup automatically if you specify for example user.

 I think the reason for this is a strncmp in src/bucoord/commands.c
 line 311:
 
 for (tavols = ps-dupvdlist; add  tavols;
  tavols = tavols-next) {
 if (strncmp(tavols-name, entries[e].name, l) == 0) {
 if ((strcmp(entries[e].name[l], .backup) == 0)
 || (strcmp(entries[e].name[l], .readonly)
 == 0)
 || (strcmp(entries[e].name[l], ) == 0))
 add = 0;
 }
 }

Yes, this looks like a bug. It's no idea to even compare the strings
if they are of different length.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Error in volume - anything that can be salvaged?

2015-08-11 Thread Harald Barth

Something is really fishy with this volume

This volume was salvaged recently after some power failures and a
scripted vos move. So during the vos move it was good enough
to be moved. After that it was salvaged (forceDAFS) now it's
broken beyond vos dump. See below.


07/18/2015 11:43:44 dispatching child to salvage volume 536889218...
07/18/2015 11:43:44 1 nVolumesInInodeFile 32 
07/18/2015 11:43:44 SALVAGING VOLUME 536889218.
07/18/2015 11:43:44 ftp.free.doc (536889218) updated 02/28/2011 15:00
07/18/2015 11:43:44 totalInodes 3596
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25540.13495
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25542.13496
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25544.13497
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25546.13498
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25548.13499
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25550.13500
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25552.13501
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25554.13502
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25556.13503
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25558.13504
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25560.13505
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25562.13506
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25564.13507
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25566.13508
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25568.13509
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25570.13510
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25572.13511
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25574.13512
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25576.13513
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25578.13514
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25580.13515
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25582.13516
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25584.13517
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25586.13518
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25588.13519
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25590.13520
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25592.13521
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25594.13522
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25596.13523
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25598.13524
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25600.13525
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25602.13526
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25604.13527
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25606.13528
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25608.13529
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25610.13530
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25612.13531
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25614.13532
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25616.13533
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25618.13534
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25620.13535
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25622.13536
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25624.13537
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25626.13538
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25628.13539
07/18/2015 11:43:44 Attaching orphaned file to volume's root dir as 
__ORPHANFILE__.25630.13540
07/18/2015 11:43:44 Vnode 25540: link 

Re: [OpenAFS] Dafileserver does not manage to slavage volume

2015-07-15 Thread Harald Barth

Last night there was a power outage again and the file server and the
salvager did not like each other again.

Unfortunately the FileLog is not dated like the SalvageLog. The reboot
was at 2:20. Then I restarted around 2:35. Then I went home at around
04:05. Then I got a fileserver core at 6:24 and issued a restart at
9:41 which resulted in a core from salvageserver. Then I get alarming
messages like

Wed Jul 15 10:10:25 2015 Scheduling salvage for volume 536903735 on part 
/vicepa over SALVSYNC
Wed Jul 15 10:10:40 2015 nUsers == 0, but header not on LRU

but that does not seem to hinder salvage

07/15/2015 10:10:25 dispatching child to salvage volume 536903735...
07/15/2015 10:10:26 SALVAGING VOLUME 536903735.
07/15/2015 10:10:26 home.eb (536903735) updated 02/15/2015 13:01
07/15/2015 10:10:26 totalInodes 9
07/15/2015 10:10:26 Salvaged home.eb (536903735): 2 files, 3 blocks

All this is a litte disturbing. And now I have to run off and do other
things :( Well, will see if I have new core files when I return.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Dafileserver does not manage to slavage volume

2015-07-15 Thread Harald Barth

Looks to me that if dafileserver cores while a volume salvage is in
progress, the bosserver does not manage to figure out the correct
actions so that the dafileserver - salvageserver - dasalvager and the
SALVSYNC mechanism breaks.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Dafileserver does not manage to slavage volume

2015-07-08 Thread Harald Barth

# dpkg -l | grep  openafs-f
ii  openafs-fileserver 1.6.1-3+deb7u1amd64  
  AFS distributed filesystem file server

After a poweroutage, the salvager has not salvaged everything:

In SalsrvLog:

07/07/2015 00:57:12 Salvaged stackware (536870963): 3401 files, 71676 blocks

and later

07/07/2015 06:08:17 dispatching child to salvage volume 536870963...

Eh, why again?

In Volserlog:

Zillions of

Wed Jul  8 13:22:06 2015 Scheduling salvage for volume 536870963 on part 
/vicepa over FSSYNC

and one

Wed Jul  8 13:22:06 2015 1 Volser: GetVolInfo: Could not attach volume 
536870963 (/vicepa:V0536870963.vol) error=101

# file /vicepa/V0536870963.vol 
/vicepa/V0536870963.vol: data

FileLog says:

Wed Jul  8 02:16:08 2015 denying offline request for volume 536870963; volume is
 in salvaging state

vos listvol gives

bogus.536872474   536872474 RW  0 K Off-line
 Could not attach volume 536870963 
Total volumes onLine 3217 ; Total volumes offLine 2 ; Total busy 0

The bogus.536872474 is I think a rest of another volume that did not
salvage and attach which I forcibly took back from backup as I needed
the data. Something like (vos dump vol.backup | vos restore vol)

So, how should I force a salvage of 536870963? Is -forceDAFS the right
next step?

# bos salvage mount-kilimanjaro.stacken.kth.se a 536870963 -showlog
This is a demand attach fileserver.  Are you sure you want to proceed with a 
manual salvage?
must specify -forceDAFS flag in order to proceed.

Harald.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] OpenAFS still in development?

2015-06-21 Thread Harald Barth

 I do not believe that the OpenAFS mailing lists are an appropriate forum
 to discuss AuriStor.  My response to Michael provided details on
 AuriStor because I felt it was necessary in order to properly answer the
 implied questions.

What I've learned so far from AuriStor it looks like it could be a
replacement for OpenAFS on the platforms it's available. And it can
more as Jeff tells us. If that strategy is good advertising depends
on cultural background.

 The question of supported platforms is an interesting one because it
 is very unclear what it means for OpenAFS to support a platform.  What
 are the criteria?  Is it sufficient to say that if you can build OpenAFS
 on the OS and hardware architecture that it is supported?

Sorry, supported was probably a bad choice of word. But I don't know
if availabe or runable or it builds it ships would be better.

 I am quite sure there are other criteria that could be added to the mix.

I know that you take supported very seriously. I would be happy if
other software vendors (which are not into file systems) would do that
as well.

  * Linux
. Red Hat Enterprise Linux
  (YFSI is a Red Hat Technology Partner)
. Fedora
. Debian
. Ubuntu
  * Microsoft Windows
  * Apple OSX and iOS
  * Oracle Solaris
  * IBM AIX
  * Android
 
 Servers are supported everywhere but on Windows, iOS and Android but the
 performance varies significantly based upon the OS release, processor
 architecture, and underlying hardware so there are combinations that we
 recommend and those we do not.
 
 The failure to list an OS family or Linux distribution does not imply
 that YFSI will not support AuriStor on that platform.  It only implies
 that there has been insufficient customer interest to this point for
 YFSI to expend the necessary resources on development, testing and
 certification (where applicable.)

Thanks for the list. I guess on the main HW which is amd64 for most
of the OSes above. Both at work and privately I run OpenAFS on
platforms that are not on the list and even in the future will not
have much customer interest.

 In the end software development has to be a partnership between those
 that build and those that deploy.  If those that deploy do not fund
 those that build there will not be sufficient development hours and
 talent to build the solutions those that deploy require.

I see that this partnership has stopped working in many places. It
makes me sad.

 P.S. My apologies for the long reply.

You don't need to apologise.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] OpenAFS still in development?

2015-06-21 Thread Harald Barth

 4. Is AuriStor a replacement for OpenAFS?
 
 AuriStor is designed to be a general purpose, platform independent,
 secure, distributed file system that can be successfully deployed
 internally, across the Internet, and within public cloud services.
 AuriStor is an IBM AFS(R) and OpenAFS compatible file system that
 (...)
  . and more

On 

https://www.your-file-system.com/openafs/auristor-comparison

I have not found a list of supported platforms (client and server)
comparison for OpenAFS and AuriStor. Because if it does not
run on the platform, it's not a replacement. 

I have not found a price list of the products and what they contain.

However, there is much we are better advertisement. That's not
something that works well in all cultures.

My conclusion of the current landscape is that if you have chosen a
closed source operating system that you pay money for, in the future
you'll have to pay money for a decent file system as well, either
directly to the OS vendor or to a third party. That will in the future
be necessary for any other function enhancement as well. That trend
will continue and sooner or later include every app(lication) that you
want to run on that platform.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Max number of groups owned by user

2015-03-24 Thread Harald Barth
 Is there a setting somewhere for how many groups a user can create?

see other's explanations

From what template is that value taken when the user is created?

Btw: Groups have quotas for the max number of members as well.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Metrics on the cell

2015-03-16 Thread Harald Barth

I think that you are able to turn on some logging (which was probably
inteded for debugging) to get out _some_ data. Others might know what
exact knobs to turn.

 I have been asked by senior leadership to gather data on the usage
 of our cell.

One very important fact that you have to give to the ones asking is
that the following case measured on the server:

$ cat /nfs/server/file  /dev/null
$ cat /nfs/server/file  /dev/null
$ cat /nfs/server/file  /dev/null

gives a usage of approx 3*bytes(file) in x seconds but

$ cat /afs/cell/file  /dev/null
$ cat /afs/cell/file  /dev/null
$ cat /afs/cell/file  /dev/null

gives a usage of approx 1*bytes(file) in x/3 seconds.

So how much the service is worth for all the users, you can only be
seen if you have control over all the clients. That's often not
easy to get for leadership.

Good luck,
Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] single OpenAFS cell and multiple/different kerberos realms

2015-01-27 Thread Harald Barth
 In order for user@B to obtain afs/cellname@A there must be a cross-realm
 relationship between A and B.
 
 The other way to obtain a token for cellname is to add a service
 principal afs/cellname@B to realm B and then export the key and add it
 in addition to the key from afs/cellname@A to the AFS cell.

That summarizes it quite well. I think you must at least put the
krbtgt/A@B into B which means that A trusts B or the afs/a@B into
B which means that the AFS servers in a trust B.

If you only can get user- (and not service-) principals into B, you
loose.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] backup strategy

2014-11-13 Thread Harald Barth

Last time I heard about tsmafs (the file level thing from Luleå) there
were just some small adjustments necessary before publishing the
code. That was a while ago. Hint hint ;-)

Harald.

PS: Places from where code for AFS-TSM backup integration
has been published end in å?
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] backup strategy

2014-11-12 Thread Harald Barth

Do you have an existing backup system (for example for other stuff
than AFS) and are you allowed to put data into that? In that case,
what kind of system do you have?

We do in effect (hiding all the gory stuff):

... Hilarious script ...
 ... Hilarious script ...
  ... Hilarious script ...
vos dump VOLUMENAME | tsmpipe /some-prefix/VOLUMENAME.DUMPLEVEL.DATE

which makes files in TSM which in effect are volume dumps.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] backup strategy

2014-11-12 Thread Harald Barth

 and with NetWorker, it was (it probably still is) impossible to save
 from a pipe, requiring the use of temporary disk for the purpose.

TSM as shipped by IBM can't either, but there is a TSM API. Then
tsmpipe was written in Umeå. Thanks again!

# fs newcell hpc2n.umu.se chosan.hpc2n.umu.se grande.hpc2n.umu.se 
mamba.hpc2n.umu.se
$ ls /afs/hpc2n.umu.se/lap/tsmpipe/x.x/src/dist/

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: Minor question on moving AFS db servers

2014-10-30 Thread Harald Barth

 I just mention this because I don't think there's any way to avoid this
 one. Other userspace clients will not notice because they are
 short-lived processes, but anything that's long running, we don't have a
 way to notify of CellServDB changes.

The usespace clients could leave hints (for example in a file) that
a cell has a bad DB server and then future userspace clients could do
something intelligent with the hints.

Another more agressive way would be to contact more than one db server
at a time, for example pick two at random and use the info from the one 
that answers first.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: Red Hat RPM packaging

2014-09-16 Thread Harald Barth
 Cool, but would it be easy for an OpenAFS user to migrate to these,
 and is the in-kernel AFS nearly as complete as OpenAFS? My immediate
 guess would be no on both counts, but I'd love to hear otherwise :)

Probably Arla (in spite of unmaintained for some years) still is the
second most complete AFS kernel module. And it's GPL. There are
several reasons why Arla nowadays is unmaintained, but one is that the
amount of work to track the Linux kernel changes compared to other
more constructive work was very tiresome. Then one chooses to do other
things, Arla was hacked a great deal in spare time. Yet another AFS
kernel module from David Howell back in the time did not help neither.

But maybe this was just a wing flap from the past ;-)

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help: Volume inaccessible after incident

2014-09-01 Thread Harald Barth

 $ vos examine html.milicchio
 Could not fetch the information about volume 536871913 from the server
 : No such device
 Volume does not exist on server aux.dia.uniroma3.it as indicated by the VLDB

If you do a vos listvol against aux.dia.uniroma3.it you see that you

a) only have html.milicchio.backup but not html.milicchio any more.

b) have a lot of volumes in the offline state on that server

Did the server run any salvage and is there anything in the log about the 
outcome?

If you have any backupvolumes that are online, you can vos dump them, vos 
restore
them to another server (preferably first under another name) and check the 
contents
out. Especially if you do not have any backup on another media of the contents.

vos listvol servername | grep volid

will give you what the server thinks is there

vos listvldb -name volid 

will give you what the DB thinks is there

vos exa volid

gives you some combined of the two above (if available).

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] System resources requirements and performance tuning for AFS file servers

2014-08-15 Thread Harald Barth

 Finally, when I upgrade my DB servers, I know that the “right” way
 is to shut everything down, copy over the databases, (...)

Nah.

 ..., do you think I’d be safe to take down one database server at a
 time, bring up a new RHEL VM with the same IP address, start the AFS
 processes, and wait for the database to propagate to the new box?

That's how I have done most upgrades. Not with VMs but with real boxes.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Apache2 and openafs

2014-07-08 Thread Harald Barth

Another program which can start subprocesses in a PAG and authenticate
from a keytab is Heimdals kinit.

Harald.

From the manpage:

SYNOPSIS
 kinit [--afslog] [-c cachename | --cache=cachename]
   [-f | --no-forwardable] [-t keytabname | --keytab=keytabname] [-l
   time | --lifetime=time] [-p | --proxiable] [-R | --renew]
   [--renewable] [-r time | --renewable-life=time] [-S principal |
   --server=principal] [-s time | --start-time=time]
   [-k | --use-keytab] [-v | --validate] [-e enctypes |
   --enctypes=enctypes] [-a addresses | --extra-addresses=addresses]
   [--password-file=filename] [--fcache-version=version-number]
   [-A | --no-addresses] [--anonymous] [--enterprise] [--version]
   [--help] [principal [command]]
^^^

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Two realms and one cell

2014-07-03 Thread Harald Barth

 A little question. We have one AFS cell myrealm.fr and a Kerberos
 realm myrealm.fr. We must use our AFS cell with a another realm named
 otherrealm.fr. There is no trusted relations between myrealm.fr and
 otherrealm.fr. Is it possible ?

If you don't trust otherrealm.fr enough to establish cross-realm, you
probably don't trust otherrealm.fr enough to give them a set of AFS
service keys for your servers.

Then users from otherrealm.fr must have ident...@myrealm.fr or any
other realm you trust.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Cannot create directory with Nautilus

2014-06-26 Thread Harald Barth

 At work, I can create a directory in my $HOME with Nautilus (Debian
 Wheezy). My problem is with my personal computer (Ubuntu 12.04). I
 can't create a directory in my $HOME on the same OpenAFS server. Menus
 are disabled. I must create directory with mkdir. And you, how do you
 do ?

I think Nautilus is to blame who thinks it can not do it because of
UID and Unix permissions and therefore does not even try.

What is your UID at work? (guess: not 1000)
What is your UID on your home computer? (guess: 1000)

I would change the UID on the home computer to match the UID on the
work computer to work around the problem.

I think we had this discussion before, a long time ago. Did we get any
feedback from the Nautilus folks?

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Salvageserver 1.6.1-3+deb7u1 core dump

2014-06-19 Thread Harald Barth
On master, this looks like

opr_Verify(afs_dir_Delete(dh, ..) == 0);
opr_Verify(afs_dir_Create(dh, .., pa) == 0);

I guess that opr_Verify is some new fancy name for Assert?
Same patch for master as for 1.6.9 needed?

Harald.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: Salvageserver 1.6.1-3+deb7u1 core dump

2014-06-19 Thread Harald Barth

 All directories are supposed to have a .. entry by this point

I disagree. Not if something has removed ... For example a buggy
salvager like 1.6.1. Or bitrot. Or whatever. As this is a salvage
program, that should not core dump even if unexpected things happen, I
think it is very very very hard to justify such a assert in production
code for the salvager. Especially because the next layer that does
call the salvager does not handle that the salvager asserts.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Salvageserver 1.6.1-3+deb7u1 core dump

2014-06-17 Thread Harald Barth
More from the core:

(gdb) bt
#0  0x7fbe95040475 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x7fbe950436f0 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x0042f162 in osi_Panic (
msg=msg@entry=0x4635f0 assertion failed: %s, file: %s, line: %d\n)
at ./../rx/rx_user.c:251
#3  0x0042f17d in osi_AssertFailU (
expr=expr@entry=0x4586bb Delete(dh, \..\) == 0, 
file=file@entry=0x458337 ../vol/vol-salvage.c, line=line@entry=3997)
at ./../rx/rx_user.c:261
#4  0x00408078 in SalvageVolume (
salvinfo=salvinfo@entry=0x7fff12474f80, rwIsp=rwIsp@entry=0x1bd68b0, 
alinkH=0x1bd44a0) at ../vol/vol-salvage.c:3997
#5  0x0040af8d in DoSalvageVolumeGroup (salvinfo=salvinfo@entry=0x0, 
isp=0x1bd68b0, nVols=nVols@entry=2) at ../vol/vol-salvage.c:2092
#6  0x0040c391 in SalvageFileSys1 (partP=partP@entry=0x1bca880, 
singleVolumeNumber=536904480) at ../vol/vol-salvage.c:937
#7  0x004041a9 in DoSalvageVolume (slot=optimized out, 
node=0x1bd4030) at ../vol/salvaged.c:640
#8  SalvageServer (argv=optimized out, argc=optimized out)
at ../vol/salvaged.c:574
#9  handleit (as=optimized out, arock=optimized out)
at ../vol/salvaged.c:299
#10 0x004572a4 in cmd_Dispatch (argc=7, argc@entry=6, argv=0x1bc1c20, 
argv@entry=0x7fff12475768) at cmd.c:905
#11 0x00404c67 in main (argc=6, argv=0x7fff12475768)
at ../vol/salvaged.c:418

So this is bailing out at 
vol_salvage.c opr_Verify(afs_dir_Delete(dh, ..) == 0)
which looks a lot like 

http://git.openafs.org/?p=openafs.git;a=commitdiff;h=e8faeae6dcae0e566de2b21d53d3f78f3cc44e3f

 Improve JudgeEntry() detection of orphaned directories to
 prevent unintentional deletion of their '.' and '..' entries.
 This in turn prevents a later assert (opr_Verify) when we try to
 delete and re-add '..' in order to attach the orphan.
 ...

So well, now I only need to find something that contains that patch
(1.6.9 I suppose) for wheezy, correct?

Harald.


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Salvageserver 1.6.1-3+deb7u1 core dump

2014-06-17 Thread Harald Barth
 This change went into 1.6.6, so 1.6.7 would do as well.

Thanks. Built myself a 1.6.9 from the source deb from unstable.

But unfortunately, the volume in question breaks the 1.6.9 salvageserver as 
well :(

Gdb tells me:

(gdb) up
#1  0x7f82c1a436f0 in abort () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) up
#2  0x7f82c2424bb8 in osi_Panic (
msg=msg@entry=0x7f82c245c1d0 assertion failed: %s, file: %s, line: %d\n)
at ./../rx/rx_user.c:251
251 afs_abort();
(gdb) up
#3  0x7f82c2424bd5 in osi_AssertFailU (
expr=expr@entry=0x7f82c2450c13 Delete(dh, \..\) == 0, 
file=file@entry=0x7f82c245088f ../vol/vol-salvage.c, 
line=line@entry=4127) at ./../rx/rx_user.c:261
261 osi_Panic(assertion failed: %s, file: %s, line: %d\n, expr,
(gdb) up
#4  0x7f82c23f97b7 in SalvageVolume (
salvinfo=salvinfo@entry=0x7fffcf109530, rwIsp=rwIsp@entry=0x7f82c2dba290, 
alinkH=0x7f82c2db7e80) at ../vol/vol-salvage.c:4127
4127osi_Assert(Delete(dh, ..) == 0);
(gdb) list
4122SetSalvageDirHandle(dh, vid, 
salvinfo-fileSysDevice,
4123
salvinfo-vnodeInfo[class].inodes[v],
4124salvinfo-VolumeChanged);
4125pa.Vnode = LFVnode;
4126pa.Unique = LFUnique;
4127osi_Assert(Delete(dh, ..) == 0);
4128osi_Assert(Create(dh, .., pa) == 0);
4129
4130/* The original parent's link count was decremented 
above.
4131 * Here we increment the new parent's link count.
(gdb) p dh
$1 = {dirh_volume = 536904480, dirh_device = 0, dirh_inode = 173619825082415, 
  dirh_handle = 0x7f82c2db8980, dirh_cacheCheck = 44, 
  volumeChanged = 0x7fffcf109570}
(gdb) p dh
$2 = (DirHandle *) 0x7fffcf108c00
(gdb) p pa 
$3 = {Volume = 4294967295, Vnode = 1, Unique = 1}


Should we not just make a .. in this situation?

And now lunch.
Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Salvageserver 1.6.1-3+deb7u1 core dump

2014-06-17 Thread Harald Barth


Well, I did add a patch like:


Index: openafs-1.6.9/src/vol/vol-salvage.c
===
--- openafs-1.6.9.orig/src/vol/vol-salvage.c2014-06-12 08:30:48.0 
+
+++ openafs-1.6.9/src/vol/vol-salvage.c 2014-06-17 10:34:23.857444175 +
@@ -4124,7 +4124,8 @@
salvinfo-VolumeChanged);
pa.Vnode = LFVnode;
pa.Unique = LFUnique;
-   osi_Assert(Delete(dh, ..) == 0);
+   if(Delete(dh, ..) != 0)
+ Log(Delete of .. failed, but will try to recreate it 
anyway\n);
osi_Assert(Create(dh, .., pa) == 0);
 
/* The original parent's link count was decremented above.


Which created two empty __ORPHANDIR__* in the volume.

Then I have the following logs from my backup script which tried a vos backup 
home.katy

Tue Jun 17 01:39:36 2014 beef.stacken.kth.se : Could not start a transaction on 
the volume 536904474
Tue Jun 17 01:39:36 2014 beef.stacken.kth.se : Volume needs to be salvaged
Tue Jun 17 01:39:36 2014 beef.stacken.kth.se : Error in vos backup command.
Tue Jun 17 01:39:36 2014 beef.stacken.kth.se : Volume needs to be salvaged

However, my salvage log says, that the volume was salvaged OK:

06/16/2014 23:39:37 Salvaged home.katy (536904474): 23897 files, 732753 blocks

and that the salvage ended 06/16/2014 23:57:23 which is several hours before.

When I did a vos backup home.katy recently, everything went good. What's going 
on here?

Followup question: Should I now run a salvage over all volumes? How do
I do that with as little impact as possible manually?

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Salvageserver 1.6.1-3+deb7u1 core dump

2014-06-16 Thread Harald Barth

Is this a known bug of openafs-fileserver 1.6.1-3+deb7u1 (Debian)?

06/16/2014 23:42:11 dispatching child to salvage volume 536904480...
06/16/2014 23:42:23 SYNC_getCom:  error receiving command
06/16/2014 23:42:23 SALVSYNC_com:  read failed; dropping connection (cnt=190)
06/16/2014 23:42:23 salvageserver core dumped!
06/16/2014 23:42:23 salvageserver (pid=3622) terminated abnormally!
06/16/2014 23:42:12 2 nVolumesInInodeFile 64 
06/16/2014 23:42:12 CHECKING CLONED VOLUME 536904573.
06/16/2014 23:42:12 home.maxz.backup (536904573) updated 11/20/2013 09:20
06/16/2014 23:42:12 SALVAGING VOLUME 536904480.
06/16/2014 23:42:12 home.maxz (536904480) updated 11/20/2013 09:20
06/16/2014 23:42:12 totalInodes 1238
06/16/2014 23:42:13 dir vnode 47: invalid entry deleted: ??/.. (vnode 249, 
unique 38300)
06/16/2014 23:42:13 dir vnode 51: invalid entry deleted: ??/.. (vnode 249, 
unique 38300)

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] OpenAFS and windows/unix versioning

2014-05-07 Thread Harald Barth

In sync would have been nice, but as in sync has been problematic
in the past and I don't expect that to change, I suggest to go with
the last suggestion. I would call it marketing numbers and these
should have another range so that they have clearly differing version
numberings (like the 5.x example from Andrew). Or call the Windows
versions 14.x this year and 15.x next year. Then we will never reach
the feared OfW-13 version for sure ;-)

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] OpenAFS and windows/unix versioning

2014-05-07 Thread Harald Barth
 But overall not a major issue for us.

Unfortunately, we have to _guess_ a lot about this, because many of
the issues are probably not issues for the folks here on openafs-info.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] vos move operation stalls

2014-03-25 Thread Harald Barth

 I have installed a new fileserver, running Gentoo Linux, kernel 3.8.13,
 OpenAFS 1.6.5 (also tried 1.6.6). The server has two /vicep partitions,
 each with an XFS filesystem:
 * ~800GB /vicepa on the system disk - mdadm mirror with two SATA disks
 * 24TB /vicepb on an external RAID system, connected via iSCSI
 
 Moves to the /vicepa partitions work as expected. Moves to the /vicepb
 start, but after some time (usually 10-20sec) there is no progress anymore:
 - no disk i/o
 - package counters (vos status/rxdebug) stay constant forever
 - transactions do not time out within 3-4 days

That's very strange, as the AFS server processes do access the file
system as any other process would. So there is nothing special about
them. We have used XFS/Linux for AFS server /vicep* for a long time
(however not on iSCSI) and did not have any problems with it. We have
moved now to ZFS on local HD.

Can you strace the hanging process and see what syscall it's hanging on?

 Other fileserver/volserver operations on other volumes seem to be
 unaffected. 

At least something.

 Also, access to the iSCSI partition from the OS is still
 possible, there are no disk/iSCSI problems reported.

Does a find /vicepb/ run through completely?
Can you write files to /vicepa/TEST/ or something like that?

 Has anyone seen similar problems before? Does anyone have suggestions
 what I could try to debug the problem?

Have you tried something else that XFS?

have you tried to put the log part of XFS somewhere else? For performance,
you might want it on mirrored local HD.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: OpenAFS client cache overrun?

2014-03-10 Thread Harald Barth

 Thanks also for the mention of AFS cache bypass, I think that may be a
 BIG help with this problem.
 
 'Cache bypass' I don't believe is considered the most stable of
 features. It could indeed maybe help here, but I'd be looking out for
 kernel panics.

I have not investigated further, but I suspect the cache bypass
feature to be at least part responsible for a panic. This was however
not on a normal system (Cray's version of SuSE ES 11 SP1). In
addition, that system is running memcache.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: DB servers quorum and OpenAFS tools

2014-01-24 Thread Harald Barth
 The problem is that you the client to scan quickly to find a server
 that is up, but because networks are not perfectly reliable and drop
 packets all the time, it cannot know that a server is not up until that
 server has failed to respond to multiple retransmissions of the request.
 Those retransmissions cannot be sent quickly; in fact, they _must_ be
 sent with exponentially-increasing backoff times.  Otherwise, when your
 network becomes congested, the retransmission of dropped packets will
 act as a runaway positive feedback loop, making the congestion worse and
 saturating the network.


You are completely right if one must talk to that server. But I think
that AFS/RX sometimes hangs to loong on waiting for one server
instead of trying the next one. For example for questions that could
be answered by any VLDB. I'm thinking of operation like group
membership and volume location.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: DB servers quorum and OpenAFS tools

2014-01-24 Thread Harald Barth

 I have long thought that we should be using multi for vldb lookups, 
 specifically to avoid the problems with down database servers.

The situation is a little bit different for cache managers who can
remember which servers are down and command line tools which normally
discocver how the world looks on each startup.

If the 'we ask everyone' strategy is not used all the time but only on
startup, it will not happen that frequent. Probably not as frequent to
cause problems for the scalability folks.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Extract files from /vicepa

2014-01-17 Thread Harald Barth

 If I understood correctly, the easiest way to restore the files is to setup
 another afs server and just overwrite the /vicepa folder with the one I
 have. Is this correct?

Yes, I think that's still correct. The easiest way to set up a AFS
server is probably to take a linux distro which has pre-packaged
binaries for AFS client and server. Debian for example.

 I don't understand the part about the salvager deleting the data.

I think the ownership and mode bits conain information if the file in
question is active. The salvager may delete inactive data. But prior
to copying your old data into your new /vicepa/ you can remove the
salvager from BosConfig and then run the salvager by hand with
-nowrite which will tell you what the salvager would have done.

 I have
 the recovered /vicepa folder on a ntfs partition. I'm trying to recover
 again the folder but to an ext4 partition trying to preserve ownership and
 modes ...

Good if you can do that. Zip and Tar archives can be told to preserve
ownership as well.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: Ubik trouble

2014-01-15 Thread Harald Barth

 - On a client-mode connection, the source address is always ignored.
 This actually should have the effect of making small requests like votes
 always work.  But for some reason it doesn't.

That's my observation as well.

 The practical effect of this is that it is possible
 for voting to work fine, because that's a single-round-trip operation,
 while larger calls such as transferring a database update fare not so
 well (or consistently).

I don't know exactly what went wrong, but when you udebug the
problem, the last vote rcvd X secs ago counter goes up and up and
up...

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: Ubik trouble

2014-01-13 Thread Harald Barth

 This turned out to be a subtle network problem.

This reminds me of another Ubik problem that I had which was because
of (1) a config error and (2) something about Ubik that I still do not
understand.

(1) I had an old NetInfo file with a wrong IP addr lying around. This
id _not_ prevent the server to start nor to prevent sync completely.
The protection server synced fine and the volume location server
refused.

(2) I have a machine where the database server is known as X.Y.Z.43
but the machine's primary IP is X.Y.Z.46. This seems to work well
until something somewhere checks the source address of the traffic
when sync is tried. Result: The protection server synced fine and the
volume location server refused. 


After I corrected the NetInfo on machine (1) and added -rxbind to the
vlserver on machine (2) everything synced fine. I am still puzzled why
when -rxbind is _not_ given then ubik says to me it first worked fine
for 13 hours (866491 secs ago, see below) but not after that.

$ udebug  130.237.234.3 7002
Host's addresses are: 130.237.234.3 
Host's 130.237.234.3 time is Mon Jan 13 14:39:58 2014
Local time is Mon Jan 13 14:39:58 2014 (time differential 0 secs)
Last yes vote for 130.237.234.3 was 8 secs ago (sync site); 
Last vote started 8 secs ago (at Mon Jan 13 14:39:50 2014)
Local db version is 1369329952.4
I am sync site until 52 secs from now (at Mon Jan 13 14:40:50 2014) (3 servers)
Recovery state 1f
The last trans I handled was 0.360296
Sync site's db version is 1369329952.4
0 locked pages, 0 of them for write

Server (130.237.234.101): (db 1369329952.4)
last vote rcvd 8 secs ago (at Mon Jan 13 14:39:50 2014),
last beacon sent 8 secs ago (at Mon Jan 13 14:39:50 2014), last vote was yes
dbcurrent=1, up=1 beaconSince=1

Server (130.237.234.43): (db 1369329952.4)
last vote rcvd 866491 secs ago (at Fri Jan  3 13:58:27 2014),
last beacon sent 866474 secs ago (at Fri Jan  3 13:58:44 2014), last vote 
was yes
dbcurrent=1, up=0 beaconSince=0


On 130.237.234.43:

# ps augxww | grep ptserv
root 22308  0.0  0.3   9216  7264 ?SJan03   0:23 
/usr/lib/openafs/ptserver

Log looks fine:

# cat /var/log/openafs/PtLog
Fri Jan  3 13:58:28 2014 ubik: primary address 130.237.234.46 does not exist
Fri Jan  3 13:58:28 2014 Using 130.237.234.43 as my primary address
Fri Jan  3 13:58:28 2014 Starting AFS ptserver 1.1 (/usr/lib/openafs/ptserver)

# cat  /var/lib/openafs/local/NetInfo
130.237.234.43

So how should I know that this would cease to work after 13 hours? Or some 
other odd number of hours?

Now restarting everything on  130.237.234.43

After that: This seems not to be self-healing either. On the sync site:

Fri Jan  3 13:58:44 2014 assuming distant vote time 19270408 from 
130.237.234.43 is an error; marking host down
Mon Jan 13 14:48:42 2014 ubik: A Remote Server has addresses:

Looks like I have to restart the server on the syncsite as well (so it
forgets the bad vote time). And I'm not sure what 19270408 actually
means. 223 days ago?

Well, after that restart:

$ udebug  130.237.234.3 7002
Host's addresses are: 130.237.234.3 
Host's 130.237.234.3 time is Mon Jan 13 14:59:03 2014
Local time is Mon Jan 13 14:59:07 2014 (time differential 4 secs)
Last yes vote for 130.237.234.3 was 5 secs ago (sync site); 
Last vote started 5 secs ago (at Mon Jan 13 14:59:02 2014)
Local db version is 1369329952.4
I am sync site until 51 secs from now (at Mon Jan 13 14:59:58 2014) (3 servers)
Recovery state 1f
The last trans I handled was 0.0
Sync site's db version is 1369329952.4
0 locked pages, 0 of them for write

Server (130.237.234.101): (db 1369329952.4)
last vote rcvd 9 secs ago (at Mon Jan 13 14:58:58 2014),
last beacon sent 5 secs ago (at Mon Jan 13 14:59:02 2014), last vote was yes
dbcurrent=1, up=1 beaconSince=1

Server (130.237.234.43): (db 1369329952.4)
last vote rcvd 8 secs ago (at Mon Jan 13 14:58:59 2014),
last beacon sent 5 secs ago (at Mon Jan 13 14:59:02 2014), last vote was yes
dbcurrent=1, up=1 beaconSince=1


But was a lot of hazzle to get there...

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: Ubik trouble

2014-01-13 Thread Harald Barth

 (...) This includes cases where one server has multiple addresses or
 interfaces on the same subnet.

Yes that was the case here. The puzzling thing was that it worked at
first, which probably means that the adress of B's outgoing reply
probably changed through time (urk). No I don't have tcpdumps from
back when it worked. Well, from the kernels point of view, any source
adress is good enough.


 The sad truth is that in order to properly support multi-homed hosts, Rx
 needs to be fixed so that it identifies all available interfaces, binds
 a separate socket for each interface, and keeps track of to which
 interface an incoming connection belongs, so that it can send responses
 out the same interface.

I don't know if it has to be sent out that interface (normally it
probaby will) but the responses need to have that source adress if I
understand that right.

Currently I have no multihomed Ubik servers (besides from the one that
should not have been, see above) and very few multihomed file servers.
So I can not say if the rx breaks when routing is asymetric
behaviour has given us any trouble for fileservers. At least not as
notable as this Ubik problem where I have shot myself in the foot real
good ;-)

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: How to remove a bogus (127.0.1.1) server entry for readonly volume?

2013-12-13 Thread Harald Barth


My test:

# cat /etc/hosts
127.0.0.1   localhost
127.0.1.1   bananshake.stacken.kth.se   bananshake

# bos restart bananshake -local -all

# cat FileLog
...
Fri Dec 13 14:38:18 2013 Getting FileServer address...
Fri Dec 13 14:38:18 2013 FileServer bananshake.stacken.kth.se has address 
127.0.1.1 (0x101007f or 0x7f000101 in host byte order)
Fri Dec 13 14:38:18 2013 File Server started Fri Dec 13 14:38:18 2013
...

So the server thinks somewhere it is 127.0.1.1, but is this message
bogus as it would be more interresting to know which addresses the
file server actually registered in the db.

When I check, it has not registered it in the address list, bananshake
is still with only one IP under UUID 007cadf8-b425-124e-91-2e-e8eded82aa77:

# vos listaddr -local -printuuid -nores -noauth -c stacken.kth.se
UUID: 000a40c4-cfeb-1228-b1-12-0101007faa77
130.237.234.220

UUID: 00230816-42a1-1361-ae-c3-2ceaed82aa77
130.237.234.101

UUID: 00438a50-e3e5-115d-8f-56-d8eaed82aa77
130.237.234.216

UUID: 0047a130-223f-1244-9b-0b-0101007faa77
130.237.234.151

UUID: 003f1f1a-0189-106e-b6-45-0101007faa77
130.237.234.150

UUID: 007cadf8-b425-124e-91-2e-e8eded82aa77
130.237.237.232

Nevertheless you can create volumes:

# vos create bananshake a -name broken.volume -local -verbose 
Volume broken.volume 536911797 created and brought online
Created the VLDB entry for the volume broken.volume 536911797
Volume 536911797 created on partition /vicepa of bananshake

# vos listvldb -server bananshake.stacken.kth.se -nores
vsu_ClientInit: Could not get afs tokens, running unauthenticated.
VLDB entries for server bananshake.stacken.kth.se 

broken.volume 
RWrite: 536911797 
number of sites - 1
   server 127.0.1.1 partition /vicepa RW Site 

Total entries: 1

To clean up I did a vos remove -id 536911797 -local as I knew that it
was a throwaway volume which makes it easier than if you want only to
remove one replica.

Btw, this is
# rxdebug localhost -v
Trying 127.0.0.1 (port 7000):
AFS version:  OpenAFS 1.6.1-3+deb7u1-debian built  2013-07-25 

And I have not found where this filtering of 127/something actually
takes place. Pointers welcome.

Harald.


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Question about how to use vos shadow

2013-12-13 Thread Harald Barth

Am am experimenting with vos shadow, and

# vos shadow H.haba.test.alanine -fromserver beef.stacken.kth.se -frompartition 
c -toserver bananshake.stacken.kth.se -topartition a -local -verbose

works as expected 

# vos listvol bananshake
Total number of volumes on server bananshake partition /vicepa: 1 
H.haba.test.alanine   536901865 RO   1544 K On-line

(and nothing in the VLDB about it). However, when I try do make shadow
readonly vols or shadow vols which are readonly, I'm not as successful:

# vos shadow H.haba.test.alanine -fromserver beef.stacken.kth.se -frompartition 
c -toserver bananshake.stacken.kth.se -topartition a -toname 
H.haba.test.alanine.readonly -readonly -local -verbose
vos: the name of the root volume H.haba.test.alanine.readonly exceeds the size 
limit of 22

(I would like to that this would result in the same result as if I
would do a addsite, release, remsite which would leave a stranded
unknown readonly copy with the .readonly suffix on the added server)

Then am I right, that -toname in this kind of usage has to be used
together with -toid because vos shadow can not make up an ID in
the VLDB for something that then should not exist in the VLDB?

# vos shadow H.haba.test.alanine -fromserver beef.stacken.kth.se -frompartition 
c -toserver bananshake.stacken.kth.se -topartition a -toname 
X.haba.test.alanine -readonly -local -verbose
VLDB: no such entry

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: How to remove a bogus (127.0.1.1) server entry for readonly volume?

2013-12-13 Thread Harald Barth

 All of the places in the code tree that filter eventually test each
 address with rx_IsLoopbackAddr() which is defined in rx.h.

When I look at vos.c, then I find GetServer which uses
rx_IsLoopbackAddr. Now let's assume we feed that something that
resolves to loopback. That will be detected at #1, and as
a second route we look up the local hostname we are on. But
if that at #2 is STILL loopback, that goes through at #3...

GetServer(char *aname)
{
struct hostent *th;
afs_uint32 addr; /* in network byte order */
afs_int32 code;
char hostname[MAXHOSTCHARS];

if ((addr = GetServerNoresolve(aname)) == 0) {
th = gethostbyname(aname);
if (!th)
return 0;
memcpy(addr, th-h_addr, sizeof(addr));
}

if (rx_IsLoopbackAddr(ntohl(addr))) {   /* local host */  #1
code = gethostname(hostname, MAXHOSTCHARS);
if (code)
return 0;
th = gethostbyname(hostname); #2
if (!th)
return 0;
memcpy(addr, th-h_addr, sizeof(addr)); #3
}

return (addr);
}

I think there should be a is this still $#%^* a loopback addr test
just before return(addr).

Does that sound correct?

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: How to remove a bogus (127.0.1.1) server entry for readonly?

2013-12-10 Thread Harald Barth

 $ more hosts
 127.0.0.1localhost
 127.0.1.1peter.cae.uwm.edu   peter

 I know various Linux distributions do
 this by default, ...

So at least the AFS server installed from the .debs for these distros
really should cope with that fact.

 but that's only because they don't have a way of
 knowing what the real IP is for the system, and they want the hostname
 to be able to resolve. For a real server system, you'd want the
 hostname to resolve to the actual public IP you use for it (either put a
 real IP in there, or rely on DNS). 

 Otherwise, various tools where you
 specify the name 'peter' will get resolved to 127/8, and that's not good
 if we're storing that in a distributed database like for AFS.

And not good at all if _all_ your servers think they are 127.0.1.1 and
have that volume...

If this should be catched by the avoid 127/16 code path then there
must be a bug somewhere. My first guess would be host vs network byte
order before even looking at the code.

I'll see if I can find a test server that I can break...

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: How to remove a bogus (127.0.1.1) server entry for readonly volume?

2013-12-09 Thread Harald Barth

 The reason it doesn't work is because 'vos' has logic to convert any
 localhost-y address to the address the local hostname resolves to, to
 try and avoid people from adding localhost addresses into the vldb.
 
 However, you shouldn't need to do this, and I'm a little confused as to
 how you got the vldb in this state. If you know what fileserver it is
 that registered the 127.0.1.1 address, and you NetRestrict it, when you
 bring the fileserver up, it should register its addresses properly in
 the vldb, and you wouldn't see that entry for 127.0.1.1 again.

Not in the listaddrs list but will it go away from the volume location list?

 But you also shouldn't need to NetRestrict that address, since the code
 for detecting the local addresses should ignore loopback-y addresses
 when the fileserver registers its addresses. Is there any more
 information you can provide on the server that did this, or how you got
 the vldb in this state?

I supect that 127.0.1.1 is not loopback-y enough for the code which might
only detect 127.0.0.x or 127.0.0.1.

Then there might be different tests for loopbackness in different parts of
OpenAFS.

This seems to bite everyone who installs the Debian or Ubuntu packages on
a non-modified server which has

127.0.0.1 localhost
127.0.1.1 myhostname

in /etc/hosts and then does a vos create myhostname or vos addsite
myhostname on the same server (which is a natural thing to do).

So if you do the remsite on the server who has the 127.0.1.1 in
/etc/hosts it should work.

I think the AFS code should reject everything that smells loopback
which means the whole 127.0.0.0/8.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: How to remove a bogus (127.0.1.1) server entry for readonly volume?

2013-12-09 Thread Harald Barth
 Not in the listaddrs list but will it go away from the volume location
 list?
 
 I don't understand this sentence.

Is the 127.0.1.1 deleted or replaced as a replication site location if
the server registers as 129.89.38.124 when previously known as
127.0.1.1.

 No, OpenAFS 1.6.5 detects 127/16; that is, 127.0.x.x. See src/rx/rx.h,
 rx_IsLoopbackAddr (in this specific case, called from GetServer in
 src/volser/vos.c). This is a kind of balance (or an argument
 compromise :) between trying to catch all loopback addresses an
 administrator is likely to accidentally specify, but allowing a range of
 127/8 addresses if you actually do want to use loopback for testing or
 something.

Then something in the server startup should probably check the same range.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Help migrating to ubuntu from Solaris

2013-11-08 Thread Harald Barth
 Always before, I copied the contents of /usr/afs/etc
 to the new machine 

My guess is that you have copied it to the wrong place. It's not the same.
The deb should contain info about the correct locations, probably
/etc/openafs/server/

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] ZFS-on-Linux on production fileservers?

2013-10-04 Thread Harald Barth

 * Are you using ZFS-on-Linux in production for file servers?

Yes.

 * If not, and you looked into it, what stopped you?

Long there was fear and doubt, but the (not) quality of HW-Raid
solutions and hassle of Linux SW-Raid convinced us that it could not
be worse with ZFS.

 * If you are, how is it working out for you?

It does.

The are-the-zpools-OK reporting could be more comfortable and
the zpool status output is not compatible to the old one.
But compared to problems we had before, that's a no-brainer.

Don't be surprised if raidz needs CPU power to calculate the
checksums, so you need to get the balance between I/O and CPU/cores
right for the raidz level you want.

 ext3/ext4 people: What is your fsck strategy?

Before that we used xfs on HW- and SW-Raid. We had no problems with
the xfs part of it. However we felt all the time that the possible max
log sizes were 1990-ish.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Questions about multihoming servers

2013-09-25 Thread Harald Barth

Thanks Jeffrey, I think that was a good summary.

  2. Resiliency to network interface or switch failure

If you want that for your DB servers, you'll have do something like
this:

1. Assign a routed IP adress (/32) to a loopback interface.

2. Make sure that this IP is reachable even if you have the failures
you want to cover with this setup.

3. Configure your server (NetInfo/NetRestrict) to only use that adress.

That will give you redundancy at level 3. I will probably do that as
well for some of my fileservers as I trust OSPF more to do the right
thing than fileserver and cache manager (sorry ;)

Then there are a lot of other ways to give you redundancy at level 2
(which I not facy ;)

Related: I think it would be nice if there would be some caching so
that vos would not need to figure out at every invocation that it can
not reach a particular server. Has anyone already written such code?

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] file server crashed

2013-09-02 Thread Harald Barth

 Sep  2 13:47:28 afsfs03 kernel: fileserver[8563]: segfault at 106c46f8 ip 
 0043f448 sp 7fd0f75fde10 error 4 in fileserver[40+b3000]
 Sep  2 13:47:29 afsfs03 abrt[12333]: Saved core dump of pid 8534 
 (/usr/afs/bin/fileserver) to /var/spool/abrt/ccpp-2013-09-02-13:47:28-8534 
 (211034112 bytes)

Yes, that core dump would be nice to have for analytics.

 Sep  2 13:47:29 afsfs03 abrtd: Directory 'ccpp-2013-09-02-13:47:28-8534' 
 creation detected
 Sep  2 13:47:30 afsfs03 abrtd: Package 'openafs-server' isn't signed with 
 proper key
 Sep  2 13:47:30 afsfs03 abrtd: 'post-create' on 
 '/var/spool/abrt/ccpp-2013-09-02-13:47:28-8534' exited with 1
 Sep  2 13:47:30 afsfs03 abrtd: Corrupted or bad directory 
 /var/spool/abrt/ccpp-2013-09-02-13:47:28-8534, deleting

Seems like abrtd deleted it. Looks to me that this abrtd process is not doing 
you a favour.
Can you have a look if it's really gone?
Do you have DeleteUploaded=yes in your abrt.conf? 

My Scientiffic Linux 6.0 does not run an abrtd, so I don't know that much about 
it.
Is that automatically installed in 6.3 and with what conf?

 openafs-server-1.4.14.1-1.1.x86_64

I am running openafs-server-1.6.5-145.sl6 on SL 6.0. From the
sl-security repo. Is there any reason to stick with 1.4.14?

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] file server crashed

2013-09-02 Thread Harald Barth

 I checked the dump file, however, it was deleted by abrtd.

:-(

 But I didn't got the key lines DeleteUploaded=yes
 
 Do you mean to add this line manually?

I don't know.

  openafs-server-1.4.14.1-1.1.x86_64
 
 I am running openafs-server-1.6.5-145.sl6 on SL 6.0. From the
 sl-security repo. Is there any reason to stick with 1.4.14?
 
 I just follow the specification of CERN and many tests on it passed.

Then the guys from CERN should answer instead ;-)

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] vos shadow to backup user homes

2013-08-26 Thread Harald Barth

 However the main reason I'm replying is your comment about RAID. IMO,
 anytime you're configuring a mission-critical system without RAID
 you're probably asking for future headaches.

My experiences with RAID, especially HW-Raid is mixed. Last week I
got an DL360 G4 with some built-in HW-RAID(5) that returned read
errors from the RAID to the OS without failing any drive(s). On an
email server. Looks to me like a serious bug in the RAID-firmware. I
even have found high-end RAID (Rio) whose memory did not deploy any
ECC. One device did make stripes of zeroes into every block that got
through it. Then there are HW-RAIDs which can detect silent bit-rot
on your HDs and some that can't. 

 All of my database and fileservers currently use hardware raid (3ware
 or LSI/PERC). But one of my idle time projects -- a bit of an inside
 joke since I have no idle time -- is to play around with ZFS on linux
 to see if I feel it's ready for prime time yet or not.

I currently trust linux SW RAID (MD) and ZFS more than any HW RAID. So
almost all our file servers have been migrated from HW-RAID to SW-RAID
or ZFS.

Plan is to complement that with shadow volumes for some volumes which
have data that need an way of instant resore.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: Heimdal KDC bug mentioned in rekeying document

2013-08-10 Thread Harald Barth

The versions where I have seen the problem were:
* 1.5.2 master on Solaris and slave on amd64 FreeBSD
* 1.3.3 master and slave on i386 OpenBSD
The patch which changes the abort() to a warning is at
file:///afs/pdc.kth.se/public/ftp/outgoing/heimdal-1.3.3-kadmlog.patch
ftp://ftp.pdc.kth.se/outgoing/heimdal-1.3.3-kadmlog.patch

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: Heimdal KDC bug mentioned in rekeying document

2013-08-08 Thread Harald Barth

 Because I'm doing lots of updates to 1.5.2 patched
 with the patch I posted, using kadmin from 1.6~git20120403+dfsg1-3, and
 having no trouble.

That's good. I will have to double check versions of everything. Maybe
I'm confused, maybe there is another patch at another place in there,
that prevents the failure to happen.

 What type of update? 

What I understand from the reports I got, some verson of kadmin
sets something called policy after setting attributes. The
policy is set to default whatever that means.

kadmin mod haba

Attributes [requires-pre-auth, disallow-postdated]: ENTER
Policy [default]: ENTER

On Ubuntu 13.04:
This is kadmin 1.5.99 (as it calls itself :-() or
1.6~git20120403+dfsg1-2 as the package version is called.

If you have the bug:

This policy change to default for the principal is then propagated
through iprop from the master to the slave. The recieving end then
calls abort() on the unknown content in the iprop modify. It does
not fail if you use hprop.

So the test for the bug is to set up a system with master and slave
and then issue a mod like above, containing the policy change to
default. If your ipropd-slave then aborts, you have the bug. If not,
is has been fixed somewhere in the chain
kadmin-kadmind-ipropd-master-ipropd-slave.

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: Heimdal KDC bug mentioned in rekeying document

2013-08-07 Thread Harald Barth
 You should package the tip of heimdal-1-5-branch.

Agree. But you might want to know:

Your slaves will abort() if you update a pricipal with the Heimdal
kadmin shipped with modern Debian/Ubuntu That one was cut from some
snapshot. To fix that you will need another patch. We have one, but
that only fixes the abort in the slave code instead of the cause in
the kadmind. I think I worte something about that on the Heimdal
list and maybe I should dig out that thread again...

Harald.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


  1   2   3   4   >