from:"Jeffrey Altman"

Re: [OpenAFS] Changing host- and domainname

2024-01-20 Thread Jeffrey Altman

What files are present in /etc/openafs/server and what are the contents of 
CellServDB in that directory?

> On Jan 20, 2024, at 4:01 PM, Sebix  wrote:
> 
> On 1/20/24 21:58, Jeffrey E Altman wrote:
>>> On 1/20/2024 3:49 PM, Sebix wrote:
>>> Hi,
>>> 
 On 1/20/24 21:46, Jeffrey E Altman wrote:
>>> 
 On 1/20/2024 3:32 PM, Sebix wrote:
> We already replaced the IP address in /etc/openafs/CellServDB and 
> restarted the server.
> 
 Did you update /etc/openafs/server/CellServDB as well?
>>> 
>>> yes, the two files are identical.
>>> 
>> Do you have NetInfo and/or NetRestrict files in /etc/openafs/server/?
> 
> No:
> 
> # grep -cr Net /etc/openafs/server/ :(
> /etc/openafs/server/UserList:0
> /etc/openafs/server/KeyFile:0
> /etc/openafs/server/CellServDB.old:0
> 
>> Does the output of "ip addr" or "ifconfig -a" list the address 192.168.1.43?
> 
> Yes:
> 
> 2: eth0:  mtu 1500 qdisc pfifo_fast state UP 
> qlen 1000
> link/ether 9e:41:32:2b:59:b0 brd ff:ff:ff:ff:ff:ff
> inet 192.168.1.43/24 brd 192.168.1.255 scope global eth0
> inet6 fe80::9c41:32ff:fe2b:59b0/64 scope link
>valid_lft forever preferred_lft forever
> 
>> The error is being generated from verifyInterfaceAddress() in 
>> src/ubik/beacon.c.

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

Re: [OpenAFS-devel] Re: [OpenAFS] 2020 AFS Technologies Workshop Cancelled.. kafs update

2020-04-06 Thread Jeffrey Altman

On 4/6/2020 8:59 AM, David Howells wrote:
> Giovanni Bracco  wrote:
>> My feeling is that to  put it really in production the main missing points
>> are:
>>
>> 1) pam module
> 
> Yep.  But the systemd folks are doing their best to make this tricky, I
> believe...

When "systemd --user" services its not safe to use session keyrings.
Network credentials must be stored in user keyrings so that the user
services have access to the credentials.

>> 2) user commands, essentially "fs" first of all and also "pts"
> 
> And there's another issue with implementing the fs tools - and that's that I'm
> not allowed to implement pioctl(2) or afs(2), so I have to find other ways of
> doing things:
> 
>   https://www.infradead.org/~dhowells/kafs/user_interface.html
> 
> But the main issue is that, for the most part, I'm the only one working on
> them - and that's in addition to my normal job.

The "fs" command suite does not talk to the fileserver directly.  For
kafs the "fs" command should be implemented as a front-end to the
interfaces that are described by the above URL.  OpenAFS should consider
implementing those interfaces as well; at least on Linux.

The vos and pts commands from OpenAFS currently have a dependency on the
existence of a cache manager.  If that dependency was removed there
would be no need for kafs to provide its own implementation of these
tools that are somewhat specific to the administration of OpenAFS cells.

Which functionality from "pts" do your users require?

>> 3) inotify
> 
> Implementing inotify/dnotify/fanotify is hard because I can't tell from a
> callback what changed - only that something has.  By examining the data
> version I can tell whether the contents of the object changed or whether it
> was an attribute/ACL change, but then I have to compare the attributes or, if
> a directory, the contents, to see which event to generate.

One of the benefits of the Extended Callback protocol is to provide this
level of detail

  https://tools.ietf.org/html/draft-benjamin-extendedcallbackinfo-02

>> The bos, vos and backup command can be run on server nodes, which can be
>> standard OpenAFS systems, am I right?
> 
> The OpenAFS bos, vos and backup commands can be run from the client too, I
> think, since they don't require any interaction with the afs kernel module.

The OpenAFS version of these tools requires an OpenAFS cache manager.
The AuriStorFS version does not.

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Borderline offtopic: OpenAFS as ~ for Samba AD?

2020-02-15 Thread Jeffrey Altman

On 2/15/2020 5:09 PM, Måns Nilsson wrote:
>> You only would create a system:authuser@smab4.realm group and then
>> create @samb4.realm entries if you were treating the two sets of
>> identifies as unique.
> 
> My first impression is that this is something one does only if there is no 
> other
> way. Keeping accounts as similar across the board seems a bit easiser,
> if doable. Here it is so, so we'll stick to that.

Originally there was no cross-realm (aka cross-cell in Transarc AFS
lingo).  The fact that Kerberos v4 principals contained a realm didn't
matter because the realm would be stripped when looking up the identity
in the protection service.

Later on cross-realm was created for Kerberos v4.  To ensure that names
from EXAMPLE.COM weren't mistaken for identities from EXAMPLE.NET the
realm is only stripped for the local authentication realm.

It is safe to treat more than one realm as a local authentication realm
provided that there is a guarantee that all principals from both realms
always represent the same entity.  It that is not true, then using
cross-realm identities is required.

An alternative approach would be to add support for entity aliasing to
the protection service.  The protocol extensions to do so were
standardized nine years ago but no implementation was ever developed for
OpenAFS.

I believe in your scenario, treating both realms as local is sufficient.

Jeffrey Altman

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Limit on number of servers?

2019-04-09 Thread Jeffrey Altman

On 4/9/2019 12:05 PM, Susan Litzinger wrote:
> Does anyone know if there is a limit or preference on how many database
> servers should be in an AFS installation? 
> 

For OpenAFS the hard coded server limit for UBIK services is 20 and the
maximum number of vlservers that can be specified in a CellServDB is 8.

The maximum number of servers that any particular cell can successfully
operate might be as low as 3 depending on how busy the servers are, how
frequently write transactions occur, and the network path characteristics.

A minimum of 3 non-clone servers are required to permit write
transactions to succeed when one of the servers is shutdown or becomes
unreachable.

For AuriStorFS the maximum UBIK server limit and maximum number of
vlservers that can be specified in a cell configuration is 80.
There are no performance restrictions that limit their use.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

[OpenAFS] Re: [OpenAFS-devel] July? Re: Proposal for AFS Conference - June 2019

2019-04-02 Thread Jeffrey Altman

On 4/1/2019 3:31 PM, Dave Botsch wrote:
> Are folk better able to attend a July 10-12 conference?

Here is a partial list of 2019 conferences and events that might be
important to members of the community. The dates specified are the
approximate week of the event.

April 29th: Linux Storage, Filesystem and MM Summit
 and DockerCon

May 6th: Microsoft Build
 and Red Hat Summit
 and Google I/O

June 3rd: 12th ACM International Systems and Storage Conference
 and Apple WWDC

June 10th: SINET Innovation Summit in New York.

July 8th: 11th USENIX Workshop on Hot Topics in Storage and File Systems

July 10th: 2019 USENIX Annual Technical Conference
  and Red Hat at AWS conference

July 20th: IETF 105
and Debian Conference
and SNIA Symposium

August 11th: SOUPS - Symposium on Usable Privacy and Security
  and USENIX Security 2019

August 19th: Linux Open Source Summit NA

August 24th: VMWorld

September 9th: Linux Plumbers Conference
   and Linux Kernel Maintainer Summit

September 16th: Oracle Open World

September 23rd: SNIA Storage Developers Conference

October 14th - HEPIX Fall 2019

October 28th: USENIX LISA
   and Linux Open Source Summit Europe

November 4th: Microsoft Ignite
   and SINET Showcase

November 16th: IETF 106
   and KubeCon and CloudNativeCon NA

December 9th: Linux Open FinTech Forum






<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Re: Starting an server (both DB and FS) without `BOS` (e.g. on Linux with systemd)

2019-03-09 Thread Jeffrey Altman

On 3/9/2019 5:05 AM, Ciprian Dorin Craciun wrote:
> [I'm adding to the previous question also the issue of salvaging.  I'm
> quoting what I've asked on a previous thread.]
> 
> BTW, on the topic of volume salvaging, when I define my DAFS / FS node
> I start a node of `salvager` (for FS) and `dasalvager` and
> `salvageserver`.  However looking at the running processes the
> `salvager` and `dasalvager` don't seem to be running after the initial
> startup.  Thus I wonder how the salvage process actually happens?
> 
> Does the `fileserver` / `dafileserver` actually start the salvage
> process, or do they communicate this to the `bos` to restart only that
> service?

The BOS Overseer Service plays a number of roles:

1. It is a cross-platform remote management interface for the
   creating, deleting, starting, and stopping b-node services
   There are four types of bnodes:

   a. simple - any single process service for example ptserver,
  vlserver, buserver, one of the MIT/Heimdal kerberos services,
  etc.  Simple services do not have any special interaction
  with bosserver.

   b. cron - like a simple bnode but which can be executed once
  at start (now), daily, or weekly

   c. fs - a bnode which defines the process group for
  the legacy fileserver.  The bosserver has special knowledge
  related to process restart in case of failure and integration
  with the "bos salvage" command.

   d. dafs - a bnode which defines the process group for the
  demand attach fileserver.  The bosserver has special knowledge
  related to process restart in case of failure and integration
  with the "bos salvage" command.

2. The bosserver is responsible for managing the content of many
   configuration files including BosConfig, UserList, and
   the server version of the CellServDB file.  The KeyFile can
   also be updated via bosserver.  The files other than BosConfig
   are shared with the AFS services

3. The bosserver is used to request manual salvages of individual
   volumes or whole partitions.  When the "fs" bnode is in use,
   the bnode will be stopped and started while the salvage takes
   place.  With the "dafs" bnode, single volume salvages do not
   require the "dafs" bnode to be halted but full partition
   salvages do.

4. Remote fetching of log files.

5. Remote execution of arbitrary commands.

Most but not all of these functions could be performed with other tools.
 Managing the special inter-dependencies of the "fs" and "dafs" bnode
processes and salvaging are the two exceptions.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] new RW root.cell

2019-03-07 Thread Jeffrey Altman

On 3/7/2019 11:59 AM, Susan Litzinger wrote:
> Hmm.. I moved removed the incorrect RO and created a new RO on velma,
> then tried to 'release' the new one prior to moving it to a different
> server and that doesn't work.  I'm hesitant to go ahead and move it if
> it's not in a good state.  

"vos remsite" only modifies the location database.  It does not remove
volumes from vice partitions.  You needed to execute "vos remove" not
"vos remsite".  You are still receiving the EXDEV error from velma
because there are still two vice partitions attached to velma each of
which have a volume from the same volume group.

The fact that you were able to get into this situation is due to bugs in
OpenAFS which were fixed long ago in AuriStorFS.  To cleanup:

  vos remove -server vlema.psc.edu -partition vicepcb -id 537176385

and then

  vos release -id root.cell

If you are still seeing errors, examine the VolserLog on velma.psc.edu
and use

  vos listvol -server velma.psc.edu -fast | grep 537176385

to see if there are stranded readonly volumes left on somewhere.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] AFS Performance / ZFS

2019-03-07 Thread Jeffrey Altman

On 3/7/2019 9:43 AM, Andreas Ladanyi wrote:
> Hi,
> 
> iam testing a box with FreeNAS  (BSD) and ZFS. On this box i use
> virtualized byhve guest as afs server.
> [...]
> Any ideas why afs speed is only about 25 MByte/s ? Maybe i have to
> adjust another afs server parameter ?

There are performance bottlenecks in the byhve network virtualization
that severely impact RX throughput.  The weaknesses in the OpenAFS RX
implementation related to flow control, congestion avoidance, and pacing
exacerbate the throughput limitations.

AuriStorFS customers use TrueNAS to back vice partitions but do so by
exporting the ZFS storage via iSCSI to RHEL7 systems connected to the
TrueNAS server with dedicated bonded 10-gbit NICs.  This combination is
reliable and is capable of filling the iSCSI path.

Jeffrey Altman
AuriStor, Inc.

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] kafs client bugs

2019-03-07 Thread Jeffrey Altman

On 3/7/2019 6:12 AM, David Howells wrote:
> From just the filenames, I don't see what some of the tests are meant to do -
> take "discon-create" for example.  This seems to be using some feature of Arla
> that isn't in OpenAFS.

disconnected mode was added to OpenAFS.  However, there is no mechanism
for ensuring that whole files are present in the cache or for pinning
files to the cache.  See "fs discon".

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] new RW root.cell

2019-03-06 Thread Jeffrey Altman

On 3/6/2019 3:56 PM, Susan Litzinger wrote:
> We are updating our very old AFS servers and chose to create new,
> updated systems then migrate all the volumes over and eventually just
> turn the older ones off. 
> 
> We've moved everything but the root.cell volumes.  We have one RW
> root.cell on an older server and would like to have a RW root.cell on a
> new server before turning the older one off.  My co-worker gave this a
> try and it allowed her to run 'vos addsite' but did not allow her to
> 'vos release' the new RW root.cell.  She got this error: 
> 
> root@afs-vmc 2019-March]# vos addsite afs-vmc.psc.edu
> <http://afs-vmc.psc.edu> /vicepga 
> root.cell -localauth
> Added replication site afs-vmc.psc.edu <http://afs-vmc.psc.edu> /vicepga
> for volume root.cell
> [root@afs-vmc 2019-March]# vos release root.cell -localauth
> Failed to clone the volume 537176384
> : Invalid cross-device link
> Error in vos release command.
> Clone volume is not in the same partition as the read-write volume.


Susan,

The problem with root.cell is on velma.  The RW and RO must be on the
same partition.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Offline migration of AFS partition files (i.e. the contents of `/vicepX`)

2019-03-02 Thread Jeffrey Altman

On 3/2/2019 3:42 PM, Ciprian Dorin Craciun wrote:
> On Wed, Dec 5, 2018 at 3:29 PM Harald Barth  wrote:
>>> Can I safely `rsync` the files from the old partition to the new one?
>>
>> For Linux (The "new" server partition layout):
>>
>> If the file tree really is exactly copied (including all permissions
>> and chmod-bits) then you have everything you need.
> 
> 
> 
> I would like to follow-up on this with some additional questions,
> which I'll try to keep as succinct as possible.  (Nothing critical,
> however I would like to have a little bit more insight into this.)
> 
> (A)  When you state `exactly copied` you mean only the following
> (based on the `struct stat` members):
> * `st_uid` / `st_gid`;
> * `st_mode`;  (i.e. permissions;)
> * `st_atim`, `st_mtim` and `st_ctim`?  (i.e. timestamps)
> * no ACL's, no `xattr` (or `user_xattr`);
> * anything else?

The vice partition directory hierarchy is used to create a private
object store.   The reason that Harald said "exact copy" is because
OpenAFS leverages the fact that the "dafs" or "fs" bnode services
execute as "root" to encode information in the inode's metadata that is
not guaranteed to be a valid state from the perspective of normal file
tooling.

The current Linux OpenAFS vice partition format is endian sensitive.

For many years there was discussion of creating a plug-in interface for
the vice partition object storage.  This would permit separate formats
depending on the underlying file system capabilities and use of non-file
system object stores.

> (B)  Also (based on what I gathered by "peeking" into the `/vicepX`
> partition) there are only plain folders and plain files, without any
> symlinks or hard-links.

This is true for the current OpenAFS vice partition format.  It is not
true for all vice partition formats used by non-OpenAFS implementations.

> (C)  Moreover based on the same observations, I guess that the
> metadata (i.e. uid/gid/permissions/timestamps) for the actual folders
> inside of `/vicepX` don't matter much.  (Only the matadata for the
> actual files do.)

This is true for the existing OpenAFS vice partition format on Linux.

> (D)  (Not really related to migration)  Am I to assume that some of
> the files inside `AFSIDat` are identical in contents to the actual
> files on the `/afs` structure?  (Disregarding all meta-data, including
> filenames.)  Moreover am I to assume that all the files accessible
> from `/afs` are found somewhere inside `AFSIDat` with identical
> contents?

There is a mapping from AFS3 File ID to vice partition path.  AFS3
directories are stored as files as are AFS3 files, symlinks and mount
points. OpenAFS stores each AFS3 File ID data stream in a single file in
the current format.

> I.e. formalizing the last one:  if one would take any file accessible
> under `/afs` and would compute its SHA1, then by looking into all
> `/vicepX` partitions belonging to that cell, one would definitively
> find a matching file with that SHA1.

This is true for the current format.

> My curiosity into all this is because I want to prepare some `rsync`
> and `cpio` snippets that perhaps could help others in a similar
> endeavor.  Moreover (although I know there are at least two other
> "official" ways to achieve this) it can serve as an alternative backup
> mechanism.

The vice partition format should be considered to be private to the
fileserver processes.  It is not portable and should not be used as a
backup or transfer mechanism.

> BTW, is there a document that outlines the actual layout of the
> `/vicepX` structure?  I've searched a bit but found nothing useful.

The source code comments provide the best documentation.

Jeffrey Altman

smime.p7s
Description: S/MIME Cryptographic Signature

[OpenAFS] BoF at Vault '19: Justifying the inclusion of Linux kernel AFS in Enterprise Distributions

2019-02-19 Thread Jeffrey Altman

If you will be in Boston next Monday, February 25, an attendee of one of:

* 2019 Linux Storage and File Systems Conference (VAULT'19)
* 17th USENIX Conference on File and Storage Technologies (FAST'19)
* 16th USENIX Symposium on Networked Systems Design and Implementation
  (NSDI'19)

or for any other reason, please consider attending the AuriStor, Inc
sponsored Birds of a Feather meeting from 6:30pm to 8:30pm.  The subject
of discussion is

  "Justifying the inclusion of Linux kernel AFS in Enterprise
   Distributions"

Linux kernel AFS is included some Linux distributions such as Fedora
Core 29 and Debian Linux "Buster" is not included in any Enterprise
Class distributions from Red Hat, SuSE, Ubuntu or other vendors.

At the BoF the attendees will discuss, and perhaps demonstrate,
non-legacy deployments and workflows for AFS family file systems include
AuriStorFS and OpenAFS.  The goal is to build the case for Enterprise
Linux distributions that inclusion of Linux kernel AFS supports future
growth.

The BoF will be held on the 3rd Floor of the Sheraton Boston in the
Hampton Room.

  Sheraton Boston
  3rd Floor - Hampton Room
  39 Dalton St
  Boston, MA 02199 USA

Pizza and non-alcoholic beverages will be provided.

I hope to see you there.

  https://www.auristor.com/events/kafsvault19

Jeffrey Altman
<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Vhosts as AFS servers?

2019-01-15 Thread Jeffrey Altman

Hi Steve,

Virtualization is fine and in many ways is preferred for OpenAFS file
servers because the fileserver process is incapable of leveraging more
than 1.7 processors threads due to lock and other resource contention.
Compared to the bare metal a virtualized system might reduce overall
throughput by 6%. On the flip side its nearly impossible to purchase a
two processor thread system these days.  The maximize the capabilities
of the hardware host multiple VMs each with two processor threads.
For vice partitions ext4 formatted iscsi block storage can be used.

When purchasing hardware for OpenAFS, minimize the number of processor
threads and obtain the fastest clock speeds available. The number of
packets per second that can be sent and received is directly
proportional to the processor clock speed.

On the other hand, if the hardware is going to be purchased with the
intention of eventually deploying AuriStorFS servers on it, maximize the
number of processor threads and use more power efficient processors with
lower clock speeds.  AuriStorFS servers unlike OpenAFS can make use of
as much CPU and I/O bandwidth as is available.

Jeffrey Altman

On 1/15/2019 1:56 PM, Steve Simmons wrote:
> The AFS support group (of whom I'm a former member) is considering
> moving to virtual hosts under vmware for their next generation of
> servers. So far they're looking at attempting to have the vhosts attach
> the vice partitions directly from our SAN rather than having storage
> mediated through vmware. If you're using or used virtualized servers,
> we'd love to hear how their working out for you.
> 
> At the moment, the servers are running OpenAFS 1.6.17. I don't know if
> they're considering upgrading to 1.8.X as part of this or not.
> 
> Advance thanks,
> 
> Steve Simmons
> ITS Unix Support/SCS Admins
<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Update time loses 67 seconds on new volume

2019-01-03 Thread Jeffrey Altman

On 1/3/2019 2:54 PM, Kristen Webb wrote:
> This is an old issue that is cropping up in a different way, so I wanted to
> share this before I attempt to perform any sort of repair that might
> lose state.
> 
> A client moved a volume and currently the .backup volume copy date does not
> match the rw volume copy date.  As far as I can tell, the version of
> openafs was
> 1.8.2 (I am confirming this now).
> 
> Here are some current time stamps:
> 
> # vos ex test.volume
>     Copy        Thu Nov 29 15:10:47 2018
> 
> # vos ex test.volume.backup
>     Copy        Fri Nov 30 02:38:44 2018
> 
> When I try to manually update the .backup volume the Copy time
> is unchanged and still different from the rw volume.
> 

Kris,

I'm not sure what if any data corruption is taking place during volume
moves with OpenAFS 1.8.x but I believe looking at the volume Copy date
is a red herring.  OpenAFS destroys the ".backup" as part of a RW volume
move.  Therefore, the "Copy" date is not expected to match the "Copy"
date of the RW when the BACK is re-created.

The Copy field displays the date and time this copy of this volume was
created. This is the time when the volume was created on the File Server
and partition from which statistics are obtained.

For read/write volumes, it is the date and time of initial creation if
the volume has never been moved or restored, or the date and time of the
most recent move or restore operation.

For read-only volumes, it is the date and time of the initial vos
release command that copied the volume data to the File Server and
partition.

For backup volumes, it is the date and time of the initial vos backup or
vos backupsys command on the specified File Server and partition. It
will match the Creation field.

The copy date is not stored in volume dumps and cannot be restored or
migrated to another File Server or partition.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Red Hat EL Support Customers - Please open a support case for kafs in RHEL8

2018-12-08 Thread Jeffrey Altman

On 12/8/2018 5:21 AM, Dirk Heinrichs wrote:
> Dirk Heinrichs:
> 
>> Did a quick test (on Debian, btw., which already ships kafs) and it
>> works fine.
> 
> While getting tokens at login work with this setup, things start to fail
> once the users $HOME is set to be in /afs. While simple scenarios like
> pure shell/console logins work, graphical desktop environments have lots
> of problems. XFCE4 doesn't even start, Plasma works to some degree after
> presenting lots of error dialogs to the user.

As Harald indicated, "systemd --user" services are a problem not just
for kafs but for openafs as well.  There has been discussions on this
mailing list of the issues dating back more than a year.  In summary,
"systemd --user" services are incompatible with "session keyrings" which
are used to represent AFS Process Authentication Groups.

You have no indicated which kernel version you are using nor am I aware
of the options used to build AF_RXRPC and KAFS on Debian.  The Linux
kernel versions that are recommended are 4.19 with a couple of back port
patches from the forthcoming 4.20 and the 4.20 release candidate series.

Regardless, it would be useful for you to file bug reports with the
Linux distribution describing the issues you are experiencing.

Debian: https://wiki.debian.org/reportbug

Fedora: https://fedoraproject.org/wiki/Bugs_and_feature_requests

> Seems there's still some work to do until this becomes an alternative
> for the standard OpenAFS client.

All software including OpenAFS has work to do.  The kafs to-do list of
known work items is here:

 https://www.infradead.org/~dhowells/kafs/todo.html

> So I wonder why RH customers would want that?

Obviously, no one wants bugs, but at the same time this community does want:

 1. A solution to "systemd --user" service compatibility with AFS.
The required changes are going to require Linux distribution
intervention because systemd is integrated with differences
to each distribution.  At the moment there is no interest among
the systemd developers to work to fix a behavior they consider
to be a bug in OpenAFS, an out of tree file system.

 2. The RHEL AFS user community needs an end to the repeated breakage
of /afs access following each RHEL dot release.  How many times
has getcwd() broken because RHEL kernels updates preserve the API
between releases but do not preserve the ABI.  While this permits
third party kernel modules to load it does not ensure that they
will do the right thing.  If the community is lucky the symptoms
are visible.  If unlucky, the symptoms are hidden until someone
reports silent data corruption.

 3. Day zero support for new kernel releases.  It took OpenAFS a month
to support 7.4 and two months to fix the breakage from 7.5.
There were also extensive delays in fixing OpenAFS to work with
5.10 and 6.5.

The need for an in-tree Linux AFS client extends to all Linux
distributions not just Red Hat.  Any OpenAFS Linux developer can attest
to the extensive effort that must be expended to maintain compatibility
with the mainline Linux kernel.  Then multiply that effort by all of the
Linux distributions that ship modified kernels such as RHEL, SuSE,
Ubuntu, Oracle, 

RHEL 8 is in beta.  The next opportunity to argue for inclusion of the
in-tree AFS client will be RHEL 9.  The clock is ticking 

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Red Hat EL Support Customers - Please open a support case for kafs in RHEL8

2018-12-07 Thread Jeffrey Altman

On 12/7/2018 4:00 AM, Harald Barth wrote:
> 
> Hi Jeff, hi David!
> 
> Has it been 17 years? Well, we are all getting - mature ;-) 
> 
> Obviously a file system is ready for use if it's old enough to buy
> liquor (which difffers a little between countries).
> 
>> When opening a support case please specify:
>>
>>  Product:  Red Hat Enterprise Linux
>>  Version:  8.0 Beta
>>  Case Type:Feature / Enhancement Request
>>  Hostname: hostname of the system on which RHEL8 beta was installed
> 
> We have a hen and egg problem here: Why would I install 8.0b on
>  if it does not have kafs? Install a Fedora
> test, sure, but RHEL 8.0b?

You need to register for and install the RHEL 8.0 Beta to be eligible to
submit a feature / enhancement request.

RHEL 8.0 Beta does not include kafs.  Hence the need to request its
inclusion as a feature request.

>> If you are eligible, please attempt to open a support request by
>> December 11th.
> 
> 3 workdays. Optimistic.

If you personally are unable to do so, then you won't.

However, others have already submitted feature requests in response to
this thread.  To them the community owes thanks.  The more requests that
are received the better.

>> As part of the Linux kernel source tree, kafs is indirectly supported by
>> the entire Linux kernel development community.  All of the automated
>> testing performed against the mainline kernel is also performed against
>> kafs.
> 
> But the automated testing does probably not (yet) fetch a single file
> from an AFS server. 

The automated testing performed by Red Hat doesn't as yet perform such
testing but AuriStor's testing does.

> Testing that requires infrastructure is a lot of work to
> automate.

Not true.  At the 2015 AFSBPW Marc Dionne presented on AuriStor's docker
based infrastructure for automated testing of multi-server cells
including clients.

  http://workshop.openafs.org/afsbpw15/talks/friday/dionne-docker.pdf

Since that time AuriStor has added live testing of kernel modules.

Each run currently performs approximately 7000 individual tests.

> Sorry, I may sound much more pessimistic here than I am actually are.
> This _might_ fly. I wish :-)

There is ample reason to be pessimistic   Its really easy for Fedora
Core and Debian to ship kernels built with kafs and AF_RXRPC enabled
because there are no guarantees.  I believe that if Red Hat enables kafs
in Enterprise Linux there are substantial costs and commitments
associated with that decision:

 * Documentation of AFS and AuriStorFS capabilities and limitations
   not only as a filesystem but with regards to interactions with
   other Red Hat supported components.

 * Training for system engineers.

 * Integration of AFS and AuriStorFS support into other Red Hat
   supported technologies

 * Development of Certification and Training programs for partners.

 * A ten year commitment to develop, maintain and fix kafs and AF_RXRPC

 * Development and maintenance of test infrastructure.

The truth is that the costs associated with code development is a small
component of the total costs.  As such Red Hat must be convinced that
inclusion of kafs and AF_RXRPC will provide functional capabilities to
end users that cannot be achieved via other Red Hat supported storage
technologies; and that supporting kafs and AF_RXRPC will provide a
long-term benefit to their bottom line.

That said, I believe a case can be made by the members of this
community.  In 2001 Red Hat couldn't support AFS because of GPL vs IPL10
conflicts.  Now that kafs is available, it becomes possible for Red Hat
to do so.  Its up to all OpenAFS and AuriStorFS end user organizations
to make the case.

Good luck to all.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

[OpenAFS] Red Hat EL Support Customers - Please open a support case for kafs in RHEL8

2018-12-06 Thread Jeffrey Altman

 will significantly simplify the lives of end users not to
mention system administrators and help desk support staff.  When kafs is
distributed as part of the Linux kernel, there can never be a conflict
between the kernel version and the AFS kernel module since they are one
and the same.  There can never be a delay between the availability of a
new kernel version and the matching OpenAFS or AuriStorFS kernel module.
 AuriStor promises a new kernel module release within 48 hours of the
release of a kernel for a supported Linux distribution.  With OpenAFS
there have been many circumstances when the delay has been measured in
weeks or months.

Organizations that have support contracts with Linux vendors are often
told that support does not apply when the Linux kernel has been tainted
by a third-party kernel module.  When using kafs, the Linux kernel is
never tainted.

As part of the Linux kernel source tree, kafs is indirectly supported by
the entire Linux kernel development community.  All of the automated
testing performed against the mainline kernel is also performed against
kafs.  All kernel interface changes impacting kafs or af_rxrpc must be
implemented in kafs and af_rxrpc by the developer(s) promoting the
change.  All-in-all kafs and af_rxrpc will receive reviews by a much
larger community of developers.

Finally, as an out-of-the-box solution, /afs becomes a first class file
system namespace.  As a result, AFS adoption will increase and /afs will
become accessible on systems that are managed by third-party such as
those in the cloud.

9. Is there anything I shouldn't say to Red Hat?

Red Hat is going to make a business decision based upon its evaluation
of customer needs and their impact on growth of RHEL licensing.  If I
were in their shoes I would not find a request to add support for kafs
compelling if it were combined with a statement that the requesting
organization intends to discontinue use of /afs within the next three to
five years.  RHEL8 will have a support lifetime of at least a decade and
there is little justification to commit new engineering resources to a
technology that customers believe has no future.

10. Will AuriStor stop developing its own Linux client?

No.  AuriStor will always be able to ship new functionality in its own
clients first.  AuriStor believes that kafs will be the AFS client for
99.9% of end users with Linux desktops and servers.   The AuriStorFS
client for Linux will be used by organizations that have special needs
and highly managed environments.


Thanks for your assistance on behalf of the entire AFS/AuriStorFS community.

Jeffrey Altman


[1] https://www.infradead.org/~dhowells/kafs/
[2] https://copr.fedorainfracloud.org/coprs/jsbillings/kafs/
[3] https://lists.openafs.org/pipermail/openafs-info/2018-July/042481.html
[4] https://developers.redhat.com/rhel8/getrhel8/
[5]
https://access.redhat.com/support/cases/#/case/new?intcmp=hp%7Ca%7Ca3%7Ccase




smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Tracing VLDB queries

2018-12-03 Thread Jeffrey Altman

On 12/3/2018 6:56 AM, Robert Milkowski wrote:
> Hi,
> 
>  
> 
> Just a teaser:
> 
>  
> 
> # dtrace -q -n vlserver*:::*GetEntry-done* \
> 
>    '{printf("%Y client: %s volname: %s rc: %s\n", walltimestamp, \
> 
>  args[1]->ci_remote, args[3]->volname, afs_errorstr[args[0]]);}'
> 
>  
> 
> 2018 Nov 19 15:55:15 client: 10.170.57.130 volname: zz.vldb.13 rc: OK
> 
> 2018 Nov 19 15:55:28 client: 10.170.57.130 volname: ms.dist rc: OK
> 
> 2018 Nov 19 15:55:57 client: 10.170.57.130 volname: ms.distXX rc: VL_NOENT
> 
> ...
> 
>  
> 
> This is a probe provided by vlserver directly. Obviously one can achieve
> the same by using PID provider, but it gets more complicated and does
> require understanding of the code and is more involved.
> 
> This works on Solaris and should work on FreeBSD as well. It shouldn’t
> be hard to get it working with SystemTAP on Linux either (although looks
> like Linux will be going with ebpf in the future).

Hi Robert,

Once trace points are added tracing can be used to answer all sorts of
questions.  The specific output from your example is very similar to the
data that is collected by the baked in audit infrastructure.   The
following is output from AuriStorFS vlserver:

Mon Dec 03 06:25:30 2018 [71] EVENT AFS_VL_GetEntByN CODE 363524 NAME
--UnAuth-- HOST [204.29.154.74]:7001 STR symbols

Mon Dec 03 06:35:08 2018 [71] EVENT AFS_VL_GetEntByN CODE 0 NAME
--UnAuth-- HOST [204.29.154.72]:7001 STR 536872388

Mon Dec 03 06:35:10 2018 [71] EVENT AFS_VL_GetEntByN CODE 0 NAME
--UnAuth-- HOST [204.29.154.72]:7001 STR 536872388

Mon Dec 03 06:36:05 2018 [71] EVENT AFS_VL_GetEntByN CODE 0 NAME
--UnAuth-- HOST [2604:2000:1741:a019:6d77:8346:dab0:49c0]:7001 STR root.cell

Mon Dec 03 06:36:05 2018 [71] EVENT AFS_VL_GetEntByN CODE 0 NAME
--UnAuth-- HOST [2604:2000:1741:a019:6d77:8346:dab0:49c0]:7001 STR
root.public

In OpenAFS the audit infrastructure can be enabled per-service and its
output can be set to files, named pipes, syslog, Linux message queues
and on AIX its integrated with the OS auditing system.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Current "balance" practice?

2018-11-27 Thread Jeffrey Altman

On 11/27/2018 2:21 PM, Chaskiel Grundman wrote:
> There is another problem beyond 64-bit safety. It appears that the some of
> the openafs devs didn't learn from the project's own experience with the
> linux developers, as
>
> extern afs_int32 vsu_ClientInit(int noAuthFlag, const char *confDir,
> char *cellName, afs_int32 sauth,
> struct ubik_client **uclientp,
> int (*secproc)(struct rx_securityClass *,
> afs_int32));
> 
> 
> in <= 1.6 has become
> 
> extern afs_int32 vsu_ClientInit(const char *confDir, char *cellName,
> int secFlags,
> int (*secproc)(struct rx_securityClass *,
>afs_int32),
> struct ubik_client **uclientp);
> 
> 
> in 1.8. and I can't even use #ifdef VS2SC_NEVER to detect the change --
> it's an enum.

That would be AuriStor's fault.  The change in question was

  commit 3720f6b646857cca523659519f6fd4441e41dc7a
  Author: Simon Wilkinson 
  Date:   Sun Oct 23 16:21:52 2011 +0100

  Rework the ugen_* interface

The vsu_ClientInit() signature change was a side-effect of the
refactoring of ugen_ClientInit().  No one remembered the possible out of
tree usage of vsu_ClientInit().  vsu_ClientInit() is not an exported
function.  As such its status as public is murky at best.

I suggest using the existence of one of these CPP macros as a test.
They were added shortly after the vsu_ClientInit() signature change was
merged.

  /* Values for the UV_ReleaseVolume flags parameters */
  #define REL_COMPLETE0x01  /* force a complete release */
  #define REL_FULLDUMPS   0x02  /* force full dumps */
  #define REL_STAYUP  0x04  /* dump to clones to avoid offline
time */

The introduction of enum vol_s2s_crypt came much later.

If you would prefer AuriStor can submit a change to restore the prior
signature.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] in tree kernel module kafs fedora 29

2018-11-20 Thread Jeffrey Altman

On 11/20/2018 2:29 PM, Gary Gatling wrote:
> Cool. Thanks for the information about that. I am surprised openafs
> doesn't work with the kernel module. But it is what it is. :)

kafs and af_rxrpc are clean room implementations.  They are not derived
from any IBM Public License 1.0 source code.  That is why they can be
part of the Linux kernel as in-tree networking stack and file system
components.

OpenAFS does not work with the kafs kernel module because the kafs file
system is an alternative client compatible with IBM AFS 3.6, OpenAFS and
AuriStorFS services.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Unexpected no space left on device error

2018-11-13 Thread Jeffrey Altman

I'm placing a beer on the directory being full.  For extra credit I will guess 
that the directory is full as a result of abandoned silly rename files.  You 
should try salvaging the volume with the rebuild directories option.

Jeffrey Altman

> On Nov 14, 2018, at 4:36 AM, Benjamin Kaduk  wrote:
> 
>> On Tue, Nov 13, 2018 at 08:46:28PM -0500, Theo Ouzhinski wrote:
>> Hi all,
>> 
>> Sorry for my previous incorrectly formatted email.
>> Recently, I've seen an uptick in "no space left on device" errors for
>> some of the home directories I administer.
>> 
>> For example,
>> 
>> matsumoto  # touch a
>> touch: cannot touch 'a': No space left on device
>> 
>> We are not even close to filling up the cache (located at
>> /var/cache/openafs) on this client machine.
>> 
>> matsumoto ~ # fs getcacheparms
>> AFS using 10314 of the cache's available 1000 1K byte blocks.
>> matsumoto ~ # df -h
>> Filesystem   Size  Used Avail Use% Mounted on
>> 
>> /dev/mapper/vgwrkstn-root456G   17G  417G   4% /
>> 
>> AFS  2.0T 0  2.0T   0% /afs
>> 
>> 
>> Nor is this home directory or any other problematic home directory close
>> to their quota.
>> 
>> matsumoto  # fs lq
>> Volume NameQuota   Used %Used   Partition
>>   4194304 1944035% 37%
>> 
>> According to previous posts on this list, many issues can be attributed
>> to high inode usage.  However, this is not the case on our machines.
>> 
>> Here is sample output from one of our OpenAFS servers, which is similar
>> to all of the four other ones.
>> 
>> openafs1 ~ # df -i
>> Filesystem Inodes   IUsed  IFree IUse% Mounted on
>> udev  1903816 41319034031% /dev
>> tmpfs 1911210 55119106591% /run
>> /dev/vda1 1905008  15482117501879% /
>> tmpfs 1911210   119112091% /dev/shm
>> tmpfs 1911210   519112051% /run/lock
>> tmpfs 1911210  1719111931% /sys/fs/cgroup
>> /dev/vdb 19660800 3461203   16199597   18% /vicepa
>> /dev/vdc 19660800 1505958   181548428% /vicepb
>> tmpfs 1911210   419112061% /run/user/0
>> AFS2147483647   0 21474836470% /afs
>> 
>> 
>> We are running the latest HWE kernel (4.15.0-38-generic) for Ubuntu
>> 16.04 (which is the OS for both server and client machines). We are
>> running on the clients, the following versions:
>> 
>> openafs-client/xenial,now 1.8.2-0ppa2~ubuntu16.04.1 amd64 [installed]
>> openafs-krb5/xenial,now 1.8.2-0ppa2~ubuntu16.04.1 amd64 [installed]
>> openafs-modules-dkms/xenial,xenial,now 1.8.2-0ppa2~ubuntu16.04.1 all
>> [installed]
>> 
>> and on the servers, the following versions:
>> 
>> openafs-client/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
>> openafs-dbserver/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
>> openafs-fileserver/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
>> openafs-krb5/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
>> openafs-modules-dkms/xenial,xenial,now 1.6.15-1ubuntu1 all [installed]
> 
> (Off-topic, but that looks to be missing some security fixes.)
> 
>> What could be the problem? Is there something I missed?
> 
> It's not really ringing a bell off the top of my head, no.
> 
> That said, there's a number of potential ways to get ENOSPC, so it would be
> good to get more data, like an strace of the failing touch, and maybe a
> packet capture (port 7000) during the touch, both from a clean cache and
> potentially a second attempt.
> 
> -Ben
> ___
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info


smime.p7s
Description: S/MIME cryptographic signature

Re: [OpenAFS] automatic replication of ro volumes

2018-11-09 Thread Jeffrey Altman

On 11/9/2018 9:48 AM, Andreas Ladanyi wrote:
> Hi,
> 
> it is common an openafs admin has to sync an ro volume after something
> is added to rw volume. This is done by the vos release command. I think
> its the only way. Are there automatic sync functions in the vol / fs server.

The risk of automated volume releases is that the automated system does
not know when the volume contents are in a consistent and quiescent state.

Sites often use remctl to grant end users the ability to release their
own volumes.

Automated releases of RO volumes are a poor substitute for replicated RW
volumes.  RW replication is a feature which was never completed for OpenAFS.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

[OpenAFS] accessing /afs processes go into device wait

2018-11-08 Thread Jeffrey Altman

On 11/8/2018 12:22 PM, John Sopko wrote:>
> I am running afsd with:
>
> /usr/vice/etc/afsd -dynroot -fakestat-all -afsdb

-dynroot

do not mount a root.afs volume. instead populate the /afs directory
with the results of cell lookups

-afsdb

if the requested name does not match a cell found in the CellServDB
file, query DNS first for SRV records and if no match, then AFSDB
records

Note that default RHEL6 configuration for the DNS resolver does not
cache negative DNS results.

An attempt to open /afs/.htaccess therefore results in DNS queries for
"htaccess" plus whatever domains are in the search list. If the search
list is cs.unc.edu and unc.edu then for each access there will be the
following DNS queries

SRV _afs3-vlserver._udp.htaccess.cs.unc.edu
SRV _afs3-vlserver._udp.unc.edu
AFSDB htaccess.cs.unc.edu
AFSDB htaccess.unc.edu

You can add a dummy htaccess.cs.unc.edu entry to CellServDB. You can
add a blacklist for that name. You can stop using -afsdb or you can
stop using -dynroot and rely upon a locally managed root.afs volume.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Current "balance" practice?

2018-10-19 Thread Jeffrey Altman

On 10/19/2018 11:04 AM, Jonathan Proulx wrote:
> Hi All,
> 
> We're currently running a sightly hacked up version of balance-1.1a (I
> belvieve originally from ftp://ftp.andrew.cmu.edu/pub/AFS-Tools/ ) on
> a old 32bit linux server ( well 32bit kernel and userland hardware
> under it is 64bit ) with openafs 1.6.1
> 
> Shockingly this seesm to still work, but I'm trying kill off the
> oldest straggling servers and neither copying the binary (and using
> 32bit libs ) nor recompiling it seems to work on new 64bit OS with
> openafs 1.8.2

The "balance" sources date back to IBM 3.4 or earlier and relies upon
ubik_Call() instead of

  ubik_VL_SetLock()
  ubik_VL_ReleaseLock()

ubik_Call() is no longer used internally by OpenAFS but it is still
exported.  ubik_Call() relies upon varargs that are unlikely to
interpret parameter types properly on systems with 64-bit pointers
and/or size_t.

Unless there are other 64-bit safety issues within balance itself,
switching to ubik_VL_SetLock() and ubik_VL_ReleaseLock() might be
sufficient.

> I've also noticed that the Wiki points to an abandoned perl
> implementation from 2007
> ( https://www.eyrie.org/~eagle/software/afs-balance/ )

Although similarly named Russ Allbery's balance which was developed at
Stanford is unrelated to Dan Lovinger's balance.  Russ' balance can make
decisions based upon volume count and volume size whereas Dan's can make
decisions based upon weekly volume usage as well.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] rxmaxmtu for volserver

2018-09-29 Thread Jeffrey Altman

On 9/29/2018 6:43 PM, Ximeng Guan wrote:
>
> Although setting the file server’s NIC mtu (e.g., in the file
> ifcfg-eth0) to the true path max MTU value seems to be an ad-hoc
> solution, we notice that the volserver has a new option of -rxmaxmtu in
> 1.8. The auristor version also has the same option in its documentation.
> But the 1.6.x version does not.
>
> Is there a plan to introduce this option into 1.6.x? Can someone
> provide any background or history on this option for the volume
> server?

I contributed the -rxmaxmtu option for bosserver, ptserver, fileserver,
volserver and vlserver  in 2006.

The option is present in all 1.6 releases.

Jeffrey Altman




<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] OpenAFS Security Releases 1.8.2, 1.6.23 available --> butc & backup security update question --> why only root?

2018-09-27 Thread Jeffrey Altman

On 9/27/2018 9:11 AM, Giovanni Bracco wrote:
> I have made some tests - ok it works - but I wonder why the key
> autentication method is allowed only to root user
> 
>> -localauth
>> All butc RPCs require superuser authentication.
>> This option must be run as root, and server key material must be present.
> 
> Our backup scripts, which have been running on a dedicated server for
> many years, run under a dedicated user with administrative powers.
> 
> Why the availability of a admin token is not sufficient to run butc in a
> secure way?
> 
> Giovanni

A user token can be used to authenticate outgoing connections such as
those from butc to the buserver or the volserver.  It cannot be used to
authenticate incoming connections to butc from the backup coordinator
command ("backup" or "afsbackup" depending upon the packaging.)

The privilege escalation attack is possible because of butc accepting
unauthenticated "anonymous" requests that would then result in RPCs
being issued as a privileged identity to the buserver and the volserver.
 To close the security hole butc must authenticate all incoming RPCs.
To do so butc must have knowledge of the cell-wide key because without
knowledge of that key it cannot decrypt the AFS token presented by the
RPC issuer.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

[OpenAFS] OpenAFS Security Releases 1.8.2, 1.6.23 available --> butc & backup security update question

2018-09-13 Thread Jeffrey Altman

It is unfortunate that the announcement e-mail included neither a URL to
the https://www.openafs.org/security/ page nor a link to the individual
security advisory text files:

  https://www.openafs.org/pages/security/OPENAFS-SA-2018-001.txt
  https://www.openafs.org/pages/security/OPENAFS-SA-2018-002.txt
  https://www.openafs.org/pages/security/OPENAFS-SA-2018-003.txt

In the case of OPENAFS-SA-2018-001.txt, both 'butc' and 'backup' (or
'afsbackup' as it is installed on some systems) must be at least:

 * AuriStorFS v0.175
 * OpenAFS 1.8.2
 * OpenAFS 1.6.23

The version of the vlserver, buserver and volserver does not matter.
Those services already supported authenticated and potentially encrypted
connections.

The underlying cause of the incompatibility is that the 'butc' service
would only accept unauthenticated (rxnull) connections and therefore the
'backup' command could only create unauthenticated (rxnull) connections
even if the 'backup' command was executed with -localauth.

As of the releases above, the 'butc' service (by default) will not only
accept authenticated connections but will require that the authenticated
identity be a super-user as reported by the butc host's "bos listusers"
command.

There is no incompatibility with vlserver, buserver and volserver
because those services already accepted authenticated connections and
required that authenticated identities be super-users in order to
create, read, modify, or delete sensitive information.

The privilege escalation is due to 'butc' accepting unauthenticated
requests and executing them using a super-user identity when contacting
the vlserver, buserver, and volserver.

I cannot stress enough how important it is for sites that are running
the AFS backup suite to immediately:

 . upgrade all instances of 'butc' and 'backup'.

 . firewall the 'butc' ports from all machines except those from
   which 'backup' is expected to be issued from.  The butc port is
   (7021 + butc port offset)/udp.  The default offset is 0.

Otherwise, an anonymous attacker can read, alter or destroy the content
of any volume in the cell as well as any backups that do not require
manual intervention by a system administrator to gain access to.

AuriStor coordinated the release of these changes with the OpenAFS
Security officer(s) because this privilege escalation is not only
remotely exploitable but compromises the security and integrity of all
data stored within an AFS cell that operates a Backup Tape Controller
(butc) instance.

The AuriStorFS v0.175 release extends the AuriStorFS security model to
backup with the use of AES256-CTS-HMAC-SHA1-96 wire encryption for all
volume data communications and the use of volume security policies to
ensure that volumes cannot be restored to a fileserver with an
incompatible security policy.

Jeffrey Altman
AuriStor, Inc.

On 9/13/2018 3:12 AM, Giovanni Bracco wrote:
> Hello everybody!
> 
> I have read about the butc & backup security update.
> 
> We run daily the AFS backup and I would like to understand if I need
> just to update the backup server with the new butc/backup modules or I
> need also to update all our file servers in order to match the new
> security improvements connected to backup.
> 
> Giovanni
> 
> On 11/09/2018 21:04, Benjamin Kaduk wrote:
>>
>> OPENAFS-SA-2018-001 only affects deployments that run the 'butc' utility
>> as part of the in-tree backup system, but is of high severity for
>> those sites which are affected -- an anonymous attacker could replace
>> entire volumes with attacker-controlled contents.
>>
>> The changes to fix OPENAFS-SA-2018-001 require behavior change in both   
>>  
>> butc(8) and backup(8) to use authenticated connections; old and new
>> versions of these utilities will not interoperate absent specific
>> configuration of the new tool to use the old (insecure) behavior.
>> These changes also are expected to cause backup(8)'s interactive mode
>> to be limited to only butc connections requiring (or not requiring)
>> authentication within a given interactive session, based on the initial
>> arguments selected.
>>
>> Bug reports should be filed to openafs-b...@openafs.org.
>>
>> Benjamin Kaduk
>> for the OpenAFS Guardians
>>
> 

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Obtaining tokens at login on Ubuntu 18.04

2018-08-23 Thread Jeffrey Altman

On 8/18/2018 6:46 PM, Prasad K. Dharmasena wrote:
> pam_afs_session "nopag" should be used in conjunction with USM.
> 
> 
> If no PAG is set, the 'two advantages' described
> in http://docs.openafs.org/Reference/1/pagsh.html go away. 
> Specifically, this part "If the credential structure is identified by a
> UNIX UID rather than a PAG, then the local superuser root can assume a
> UNIX UID and use any tokens associated with that UID." is unacceptable
> for us. Traditionally, we've had departmental admins and lab managers
> who have root access to machines but no rights to users' AFS
> directories.  I believe, this is the point you made in the
> systemd/issues thread.
> 
> So, we must pick our poison?  A: live w/o '"systemctl --user" and all
> that stuff'  or B: pam_afs_session with 'nopag'

There are a couple of issues here.

First, the OpenAFS pagsh man page makes a claim that cannot be enforced.
 The local "root" user is capable of:

 * installing kernel modules
 * debugging any process on the system and can therefore
   . execute code in any process context
   . read memory
   . write memory
   . start new processes in any process context

In the legacy PAG implementation that makes use of special group
identifiers, the "root" user can change the group assignment of any
process.

Even if PAGs were a strong security boundary, as long as "root" can load
kernel modules and debug arbitrary processes, the "root" user can simply
read the tokens accessible to a process via the GetTokens ioctl.

At best PAGs provide the capability for separate processes running as
the same UID to execute with a different set of AFS identities.

Creating separate PAGs for each session by default is incompatible with
"systemd --user".   PAGs can still be used to transition to a separate
AFS identity for administrative operations.

Please do not make assumptions that AFS PAGs can somehow protect end
users from trusted administrators who choose to violate that trust or
whose accounts have been compromised.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Obtaining tokens at login on Ubuntu 18.04

2018-08-20 Thread Jeffrey Altman

On 8/17/2018 5:38 AM, Gaja Sophie Peters wrote:
> Am 17.08.2018 um 02:41 schrieb Prasad K. Dharmasena:
>> I've installed OpenAFS and pam-afs-session on Ubuntu 18.04 (bionic)
>> via (a)
>> vendor supplied packages, and (b) building from source (1.6.22.3).  On
>> both
>> machines, logging in via gdm doesn't get me a token. 
>> Has anyone else seen this on Ubuntu 18.04?  (I've had this working for a
>> while now on Ubuntu 16.04 -- building from 1.6.20+ source with
>> pam-afs-session 2.6.)
> 
> We had some success with an "aklog.service" as described in
> 
> https://www.mail-archive.com/openafs-info@openafs.org/msg40604.html
> 
> The main problem that we face at the moment is that there are TWO
> sessions opened, and (especially in "Ubuntu"-Session) depending on which
> program is started, it lives in the one or the other. Most notable
> "xterm" and "gnome-terminal" have to different sessions - which in the
> end means that an "aklog" needs to be performed in both... The above
> mentioned script tries to help with that, but it's not quite perfect yet.

Gaja,

The "aklog.service" approach introduces a significant amount of
complexity with zero security improvement over the pam_afs_session
"nopag" configuration.  The reason that aklog can be executed by
"aklog.service" is because the Kerberos credentials from which the AFS
tokens are derived are accessible to any process running as the UID.

Sincerely,

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Obtaining tokens at login on Ubuntu 18.04

2018-08-17 Thread Jeffrey Altman

On 8/17/2018 8:44 PM, Prasad K. Dharmasena wrote:
> Thanks for the pointer.  I did 'dpkg -r dbus-user-session' and
> rebooted.  Now 'pam-afs-session' does the right thing and obtains a token. 
> 
> However, @poettering points out in the systemd/issues/7261 thread,
> 
> Are there any downsides?
> 
> Yes, many. You turned off user service management entirely. Hence
> "systemctl --user" and all that stuff won't work anymore.

Prasad,

User service management (USM) is incompatible with the creation of a new
process authentication group (PAG) for each user login session. USM
relies upon the assumption that all processes running with the same UID
share the same security context including network authentication tokens.

pam_afs_session "nopag" should be used in conjunction with USM.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Does vos release/volume replication always initiates data transfer from RW site?

2018-08-06 Thread Jeffrey Altman

On 8/6/2018 12:12 PM, Ximeng Guan wrote:
> Indeed if we have two RO replicas at Site B and RW at site A, subsequent 
> release will be twice as consuming, unless "vos release" can be made to be 
> even smarter, such that data is "relayed" among RO sites instead of being 
> broadcasted from the RW site.
> 
> I guess that touches some fundamental part of "vos release" which is not 
> trivial: How to optimize the data propagation path for different network 
> scenarios? How does that affect the integrity among different replicas?

In OpenAFS, all of the "smarts" are in the vos process.  This process
has no knowledge of the network topology which is why OpenAFS has
difficulties in network environments that lack full end-to-end
connectivity between all peers.

In theory, someone could write a volserver topology map that could be
provided to "vos" as input.  It could be used in conjunction with the
volume site list from the location service to decide how to replicate
the volumes.

The OpenAFS location service doesn't provide clients (cache managers)
any locality information and the Unix cache manager does not rank volume
sites based upon performance characteristics.  How are you ensuring that
clients contact the local fileserver?

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Does vos release/volume replication always initiates data transfer from RW site?

2018-08-06 Thread Jeffrey Altman

On 8/5/2018 11:58 PM, Ximeng Guan wrote:
> Hello,
>
> We have one cell covering two sites. The WAN bandwidth between the two
> sites is relatively low, so we use volume replication to speed up the
> access. 
> 
> Those replicated volumes are often large in size. So replication to the
> remote site is an operation whose cost cannot be neglected.
> 
> Now with RW volumes at site A and their RO replication on servers at
> site B, we want to bring up a new file server at site B to balance the
> load. In other words we would like to “offload” a majority of the RO
> volumes from one server to a different server at Site B, without
> touching their RW masters at Site A.
> 
>[...]>
> I wonder if there is a way to directly transfer those RO volumes btw
> servers at site B, without breaking the data integrity among the RO
> sites or affecting the atomicity of “vos release”.

AuriStorFS supports the desired functionality including the ability to
copy and move readonly sites between file servers or vice partitions
attached to the same file server.

  https://www.auristor.com/openafs/migrate-to-auristor/

OpenAFS does not contain explicit functionality but it is possible using

  vos dump
  vos restore -id -readonly
  vos addsite -valid

to achieve similar results.  From the source server use "vos dump" to
generate a dump stream of the readonly volume you wish to replicate.
Pipe the output to "vos restore" specifying the destination server,
partition, the readonly volume id and the -readonly flag to specify the
volume type.  Finally, use "vos addsite" with the -valid flag to update
the location service entry for the volume.  The -valid flag indicates
that the readonly volume data is known to be present and consistent with
other sites.  Note that the -valid switch will not mark a site as "new"
if a "vos release" failed to update one or more sites.

Be careful to use publicly visible addresses when executing these commands.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] check in c (linux) whether a directory entry is a mount point for an AFS volume

2018-08-04 Thread Jeffrey Altman

On 8/4/2018 12:40 AM, Ken Hornstein wrote:
>> is there an easy  way to check in C (under linux) whether a directory
>> entry is a mount point for an afs volume and maybe also obtain the name
>> of the volume mounted?
> 
> Assuming vanilla AFS ... the absolute easiest way to check to see if a
> directory entry is a mount point is stat() the directory.  If the inode
> number of the directory is odd, it's a "real" directory.  If the inode
> number is even, it's a mount point.
> 
> Determing the mount point NAME is more code from C; popen("fs lsm ")
> might be the easist.  You won't have to do it that often once you figure
> out what is and isn't a mountpoint, though.
> 
> --Ken

I'm not sure that the application will have the ability to stat the
mount point object.  The OpenAFS cache manager will always provide the
details of the target volume root directory unless the target volume
cannot be located or accessed.

Jeffrey Altman




smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] fs newcell / clients CellServDB / adding new db server

2018-06-18 Thread Jeffrey Altman

On 6/18/2018 9:07 AM, Andreas Ladanyi wrote:
>>
>> The ubik clients do not rank servers based upon IP address.  What they
>> do is:
> ok. Then maybe i misunderstood the documentation
> (http://docs.openafs.org/QuickStartUnix/HDRWQ114.html) which tells me
> the machine with lowest ip is "usually"  elected as the ubik coordinator.

The algorithm used to elect the coordinator is specific to the ubik
servers that maintain a synchronized database.  The clients (vos, pts,
cache managers, backup, aklog, pam_afs_session, etc) do not speak ubik;
they speak the application specific protocols (VL, PR, BUDB, etc.).  The
clients do not have any visibility into which ubik instances are
electable, which instances have network connectivity to elicit
sufficient votes, nor what algorithm is used to rank (order) the ubik
instances for election purposes.

AuriStorFS ubik for example permits arbitrary ranking of servers based
upon configuration.  Just because a server has a smaller numeric IPv4
address doesn't mean that it is the best server to be the read/write
copy of the database.

> I followed the instruction on this paper to add a new db server machine
> with lowest ip.
>>
>> 1. compute the length of the ordered server list
>>
>>   A B C D
>>
>> 2. then generate a random number from 0..
>>
>> 3. use that number as an index into the list to decide which is first
>>
>> 4. and reorder the list as if it were a circular queue.  So if the
>> random number selected was 2, then the list would become
>>
>>   C D A B
>>
>> The only time the coordinator must be contacted is for a write
>> transaction.  All read transactions are processed by the first server
>> contacted.
> ok. thanks for explanation.
>>
>> My conclusion is that there is something about your cell configuration
>> that results in a write transaction for each token requested.  For example:
> I straced aklog for some tests and could see if aklog sometimes ask the
> new db server (which is offline) and then wait for a timeout (hangs
> about 15 sec) and if ask the old online db servers from CellServDB
> without timeout (hang).
> 
> This seems to cause the ssh login hanging symptom because pam debug
> shows me hanging about 15 sec when pam_afs calls aklog.
> 
> So on summary it seems to be better to first add the new db server to
> all db servers CellServDB / bos addhost and to bos restart the pt/vl
> instances for ubik corrdinator election on the servers and then to
> update the clients CellServDB.

That depends on whether or not the clients need to be able to find a
writable copy of the database or not.  If the clients must be able to
find the coordinator and the coordinator is a server that is not present
in the client's configuration, then the client won't simply experience a
random timeout but a failure.

> The documentation tells to first update clients CellServDB (when new db
> server with lowest ip) and then bring up new db server.
>>
>>  1. cell name:   example.com
> no, cellname a.b.c
>>
>>  2. One of the following is true:
>>
>> a. realm name:   AD.EXAMPLE.COM
> no AD
> 
> REALM = A.B.C, MIT Kerberos
>>
>> b. CellServDB's zeroth ubik server host domain:
>>
>>  subnet.example.com
> I dont understand this example.

If the cell name is

   foo.example.com

and the Kerberos realm is

   FOO.EXAMPLE.COM

and the host names of the ubik servers are

   afsdb1.bar.example.com
   afsdb2.bar.example.com
   afsdb3.bar.example.com

then the default host to realm mapping of afsdb1.bar.example.com will be
to realm BAR.EXAMPLE.COM not FOO.EXAMPLE.COM.  Since BAR.EXAMPLE.COM !=
FOO.EXAMPLE.COM a foreign cell registration will be attempted.  However,
that doesn't appear to be the source of the delay.  If it were, the
tracing would show aklog attempting to access every protection server
until the coordinator was discovered.

>>  3. auto-registration of foreign PTS IDs enabled:
>>
>> a. pam_afs_session configuration doesn't disable it
>>
>> b. aklog executed without -noprdb
> yes, pam_afs_session calls aklog without -noprdb

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] fs newcell / clients CellServDB / adding new db server

2018-06-15 Thread Jeffrey Altman

On 6/15/2018 9:52 AM, Andreas Ladanyi wrote:
> ok. so the process of change CellSrvDB on db servers and bos restart AND
> updating (copying) new CellServDB to clients has to be done in a very
> short time to minimize timeout symptoms for users, because db servers
> has to be in sync and ubik coordinator has to be elected and the afs
> clients with new CellServDB with the new db server (lowest ip) asks the
> new db server (ubik coordinator) first.

The ubik clients do not rank servers based upon IP address.  What they
do is:

1. compute the length of the ordered server list

  A B C D

2. then generate a random number from 0..

3. use that number as an index into the list to decide which is first

4. and reorder the list as if it were a circular queue.  So if the
random number selected was 2, then the list would become

  C D A B

The only time the coordinator must be contacted is for a write
transaction.  All read transactions are processed by the first server
contacted.

My conclusion is that there is something about your cell configuration
that results in a write transaction for each token requested.  For example:

 1. cell name:  example.com

 2. One of the following is true:

a. realm name:  AD.EXAMPLE.COM

b. CellServDB's zeroth ubik server host domain:

subnet.example.com

 3. auto-registration of foreign PTS IDs enabled:

a. pam_afs_session configuration doesn't disable it

b. aklog executed without -noprdb

If the "realm of cell" guessing algorithm decides that the current login
is likely to be a foreign cell login, then an attempt to allocate a PTS
ID for the authentication name will be performed.  This request is a
write transaction and the ubik client will attempt to contact every ubik
server in order until the coordinator is determined.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] fs newcell / clients CellServDB / adding new db server

2018-06-13 Thread Jeffrey Altman

On 6/13/2018 11:35 AM, Dirk Heinrichs wrote:
> Am 13.06.2018 um 14:06 schrieb Andreas Ladanyi:
> 
>> i understand that a change in CellServDB on client does have no effect
>> until reboot.
> 
> Hmm, is this also true when using DNS SRV records instead of CellServDB?

For OpenAFS, any server lists provided by the client's CellServDB file
take precedence over DNS SRV records.

If DNS SRV records are in use, the client CellServDB file should not
list any servers.

One of the benefits of DNS SRV records is that the DNS SRV record TTL
value is used to determine the validity period for the server list by
the cache manager.

In this way, clients automatically update their server list information
and administrators can control how frequently the server lists are updated.

Jeffrey Altman
<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Add new database server with lowest IP

2018-06-12 Thread Jeffrey Altman

On 6/11/2018 11:44 AM, Andreas Ladanyi wrote:
> Hi,
> 
> i red this page http://docs.openafs.org/QuickStartUnix/HDRWQ114.html
> 
> In my case the new database server has lowest ip and has no database
> content.
> 
> So how is the database synchronised from the old database (old ubik)
> servers which are currently running and with up to date database content ?
> 
> Do i have to backup / restore the DB0 files to the new ubik coordintor
> once ? Or should i dont care about because ubik will do all magic for me ?

When a coordinator (aka sync site) is elected it enters recovery mode.

The first step in recovery mode is "find the latest database".  In
OpenAFS, the coordinator queries all of the non-clone peers for their
database version.  It then decides whether it has the latest database or
another peer does.

At this point the coordinator is in recovery state "Found DB".

If another peer does, then it fetches the more recent database.

At this point the coordinator is in recovery state "Have DB".

The coordinator then ensures that all peers receive the current DB.

At this point the recovery state is "Sent DB".

The coordinator can now begin processing write requests.  After the
first write request the recovery state becomes "Modified DB".

Although there isn't any need to copy the database to the new ubik
server before it is started, it is critically important that the new
ubik server be added to the server CellServDB on all of the existing
ubik servers before the new ubik server is started.

Imagine that your existing ubik servers are B, C and D.  Since the
lowest ranked server receives an extra half vote it is extremely
important that there be agreement on which server is the lowest ranked
server.

In the current configuration all of the servers are in agreement that
the ranking is order from lowest to highest is:

  B < C < D

Therefore, B gets the extra half vote and there are a total of 3.5
votes.  To be elected coordinator requires a minimum of 2 votes.

But what happens if you add server A

  A < B < C < D

In this configuration A gets the extra half vote and there are a total
of 4.5 votes.  To be elected coordinator requires a minimum of 2.5 votes.

When adding a new server without shutting down the cell there will be
some servers that are running with the old configuration and some with
the new one.  For example, if the knew configuration is known by servers
A and D but not B and C then A will receive 2.5 votes and it will be
elected coordinator.  However B will receive 2 votes and believe that it
has been elected coordinator.  This results in two servers accepting
write transactions which will result in data loss.

The underlying problem is that the ubik protocol variant implemented by
OpenAFS does not have a method of verifying which servers share the same
configuration and only permit votes to be cast for and accepted from
servers that share the same configuration.

In order to avoid the risk of database forking I recommend the following
procedure:

1. Update the client CellServDB and DNS SRV/AFSDB records to add the
   new server

2. Update the server CellServDB on all of the fileservers to add the
   new server and restart them (only restart if not also ubik servers)

3. Update the server CellServDB on all of the ubik servers to add the
   new server but do not restart.

4. In order from highest rank to lowest rank (D, C, B):

 a. Stop server  (bos stop  -all)

 b. Wait three minutes (to ensure that all other servers notice this
server shutdown which is necessary to avoid bugs in most OpenAFS
versions that can lead to database corruption.)

 c. Start server (bos start  -all)

 d. Repeat for the next lower ranked server.

5. Start the new server

This order will ensure that there is never any confusion for clients or
ubik servers.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] About the upgrading from kaserver toKerberos 5

2018-05-15 Thread Jeffrey Altman

On 5/15/2018 11:24 AM, huangql wrote:
> Hi Jeffrey,
> 
> 
> Thanks for your prompt and constructive reply.
> 
> 
> If the afs2k5db tool was compiled against OpenAFS 1.2 and MIT Kerberos
> 1.2, does it work for Openafs-1.4.14-1 version under 64bit ?

As I indicated, the kaserver database file format has not changed.
Therefore, it should not matter.

> Or is there other method to migrate the users to kdc 5?

Well, IHEP could just bring up a new Kerberos v5 realm and create all
necessary client and server principals from scratch.

At this point that might not be such a bad idea.   The kaserver (being
Kerberos v4 based) only supports DES-CBC-CRC 56-bit keys.  Those keys
can be brute forced in under 20 hours.  The krbtgt and afs keys are
particularly vulnerable.  Theft of them permits any identity to be
forged.  Copying these keys into the new Kerberos v5 realm is pointless
as they must be replaced immediately.

The client configurations will have to be updated in any case to deploy
Kerberos v5 libraries and configuration files.  My recommendation is to
start from scratch with Kerberos v5 and configure the AFS cell to accept
both kaserver and Kerberos v5 for authentication.  See the OpenAFS
krb.conf man page.

Again, good luck.

Jeffrey Altman
<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] aklog: unknown RPC error (-1765328377) while getting AFS tickets

2018-04-25 Thread Jeffrey Altman

On 4/25/2018 2:34 PM, Steven Schoch wrote:
>
> Kerberos error code returned by get_cred : -1765328352
> aklog: Couldn't get example.com  AFS tickets:
> aklog: unknown RPC error (-1765328352) while getting AFS tickets
> 
> Did I mess up file permissions somewhere? Running klist as xdemo shows I
> have tickets.

-1765328352 (krb5).32 = Ticket expired


<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Linux: systemctl --user vs. AFS

2018-03-08 Thread Jeffrey Altman

>  2. let AFS use the per-user keyring instead of the per-session one
> (suggested in the systemd bug discussion)
> 
> Does the second one sound reasonable?

Switching to the user keyring is unreasonable.  The impact of such a
change is that all user sessions on a system share the same tokens and
an effective uid change permits access to those same tokens.

Process Authentication Groups (PAGs) exist explicitly to establish a
security barrier to prevent such credential leakage.

Just my two cents ...

Jeffrey Altman



<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Invalid AFSFetchStatus - inaccesible data

2018-03-01 Thread Jeffrey Altman

On 3/1/2018 9:41 AM, Michal Švamberg wrote:
> Hi,
> in volume are inaccesible files and client wrote message:
> [   56.306458] afs: FetchStatus ec 0 iv 1 ft 0 pv 947 pu 17570
> [   56.306461] afs: Invalid AFSFetchStatus from server 147.228.54.17
> [   56.306463] afs: This suggests the server may be sending bad data
> that can lead to availability issues or data corruption. The issue has
> been avoided for now, but it may not always be detectable. Please
> upgrade the server if possible.
> [   56.306469] afs: Waiting for busy volume 875764977 (user.wimmer) in
> cell zcu.cz <http://zcu.cz>

As recorded on disk, the file type for 875764977.947.17570 is 0
(invalid).  As such, it is neither a directory, a file, a symlink nor a
mount point and cannot be processed by the client.

> I try bos salvage, vos move, vos dump & restore, but nothing help to me.

The salvager doesn't know how to fix vnode's whose on-disk meta data
contains an invalid vnode type.

Moving, dumping and restoring the volume will simply move, dump and
restore the vnode with the invalid type value.

These are the available options:

1. restore the volume from a backup prior to the introduction of the
   on-disk damage.  If vnode 875764977.947.17570 is damaged then it
   is possible that other vnodes are as well.

2. edit the vnode metadata stored in the vice partition

3. delete the damaged vnode by removing its directory entry and
   restoring the file data from backup or other sources

4. delete the vnode in the vice partition and salvage to cleanup
   the directory

The warning message from the client is misleading in that the fileserver
is not generating bogus information but the data on-disk is already bogus.

Jeffrey Altman
AuriStor, Inc.
<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-05 Thread Jeffrey Altman

On 2/5/2018 12:31 PM, Stephan Wiesand wrote:
> the usual way to use DKMS is to either have it build a module for a newly
> installed kernel or install a prebuilt module for that kernel. It may be
> possible to abuse it for providing a module built for another kernel, but
> I think that won't happen accidentally.
> 
> You may be confusing DKMS with RHEL's "KABI tracking kmods". Those should
> be safe to use within a RHEL minor release (and the SL packaging has been
> using them like this since EL6.4), but aren't across minor releases (and
> that's why the SL packaging modifies the kmod handling to require a build
> for the minor release in question.

On RHEL DKMS and KABI are tightly related because of the way in which
Red Hat engineers back port feature and functionality changes.  During
mainline kernel development a change is likely to break an existing
interface.  Doing so is encouraged so that compilation errors will
identify where code modifications are required.

On RHEL there is a strong desire to maintain KABI compatibility.
Whenever possible, backports are altered to preserve the existing binary
interfaces at the risk of changing the interface semantics.  As a
result, compilation failures do not occur but semantic differences can
result in breakage for third party kernel modules that have not been
modified at the source level to be aware of the change.

The breakage of OpenAFS by RHEL 7.4 and 7.5 (minor releases) were both
due to back porting functionality in this manner.  Such
incompatibilities can result in system panics or silent data corruption
depending upon the change.

Jeffrey Altman
AuriStor, Inc.
<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] connection timed out, how long is the timeout?

2018-02-04 Thread Jeffrey Altman

On 2/4/2018 7:54 AM, Dirk Heinrichs wrote:
> Am 04.02.2018 um 13:29 schrieb Jose M Calhariz:
> 
>> The core of my infra-structure are 4 afsdb
> 
> Wasn't it so that it's better to have an odd number of DB servers (with
> a max. of 5)?

The maximum number of ubik servers in an AFS3 cell is 20.  This is a
protocol constraint.  However, due to performance characteristics it is
unlikely that anyone could run that number of servers in a production
cell.  As the server count increases the number of messages that must be
exchanged to conduct an election, complete database synchronization
recovery, maintain quorum, and complete remote transactions.  These
messages compete with the application level requests arriving from
clients.  As the application level calls (vl, pt, ...) increase the risk
of delayed processing of disk and vote calls increases which can lead to
loss of quorum or remote transaction failures.

The reason that odd numbers of servers are preferred is because of the
failover properties.

one server - single point of failure.  outage leads to read and write
failures.

two servers - single point of failure for writes.  only the lowest ipv4
address server can be elected coordinator.  if it fails, writes are
blocked.  If it fails during a write transaction, read transactions on
the second server are blocked until the first server recovers.

three or four servers - either the first or second lowest ipv4 address
servers can be elected coordinator.  any one server can fail without
loss of write or read.

five or six servers - any of the first three lowest ipv4 address servers
can be elected coordinator.  any two servers can fail without loss of
write or read.

Although adding a fourth server increases the number of servers that can
satisfy read requests, the lack of improved resiliency to failure and
the increased risk of quorum loss makes its less desirable.

The original poster indicated that his ubik servers are virtual
machines.  The OpenAFS Rx stack throughput is limited by the clock speed
of a single processor core.  The 1.6 ubik stack is further limited by
the need to share a single processor core with all of the vote, disk and
application call processing.  As a result, anything that increases the
overhead reduces increases the risk of quorum failures.

This includes virtualization as well as the overhead imposed as a result
of Meltdown and Spectre fixes.  Meltdown and Spectre can provided a
double whammy as a result of increased overhead both within the virtual
machine and within the host's virtualization layer.

AuriStor's UBIK variant does not suffer the scaling problems of AFS3
UBIK.  AuriStor's UBIK has been successfully tested with 80 ubik servers
in a cell. This is possible because of a more efficient protocol that is
 incompatible with AFS3 UBIK and the efficiencies in AuriStor's Rx
implementation.

Jeffrey Altman
AuriStor, Inc.
<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] connection timed out, how long is the timeout?

2018-02-04 Thread Jeffrey Altman

tion. If not,
the notification is added to the delayed callback queue for the
CM.

16. FS releases any volume and vnode locks.

17. FS updates call statistics.

18. FS completes the call to the CM with success or failure.


For any rx connection there are three timeout values that can be set on
both sides of the connection.

 1. Connection timeout.  How long to wait if no packets have been
received from the peer.

 2. Idle timeout.  How long to wait if ping packets are received
but no data packets have been received.  This is usually set
only on the server side of a connection.

 3. Hard timeout. How long is the call permitted to live before it is
killed for taking too long even if data if flowing slowly.


The defaults are a connection timeout of 12 seconds, an idle timeout of
60 seconds on the server side, and no hard dead timeout.

A CM typically sets a 50 second connection timeout and no idle or hard
timeout on calls to the FS.

The FS sets a 50 second connection timeout and 120 second hard timeout
on calls to the CM callback service; except for the ProbeUuid calls
which are assigned a connection time of 12 seconds.

The FS connections to the PT service use the defaults.

I selected the GiveUpCallBacks call statistics because that call doesn't
require any volume or vnode locks, nor can it involve any notifications
to other CMs.  Long timeouts for GUCBs means one or more of the following:

 a. this is the first call on a new connection and the CM's one and
only callback service thread is not responding to the FS promptly

 b. this is the first call on a new connection and the connection
endpoint and the CM's UUID do not match and there is a conflict to
resolve

 c. this is the first call on a new connection and the FS's two CPS
queries to the protection service take a long time or timeout if
the selected ptserver stops responding to ping ack packets.

 c. the FS's host table / callback table lock is in use by other threads
and this thread cannot make progress

The FetchStatus call is similar except that it can also block waiting
for Volume and Vnode locks which might not be released until callback
notifications are issued.

So what are the potential bottlenecks that can result in extended delays
totally tens or hundreds of seconds?

1. The single callback service thread in the cache manager which is
   known to experience soft-deadlocks.

2. The responsiveness of the ptservers to the file servers.

3. Blocking on callback invalidations due to the callback table being
   too small.

4. Network connectivity between the FS and both PT servers and CMs.

Its time for the Super Bowl so I will send off this message as is.
Perhaps it will be useful.

Jeffrey Altman
AuriStor, Inc.
<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up

2018-02-03 Thread Jeffrey Altman

On 2/2/2018 6:04 PM, Kodiak Firesmith wrote:
> I'm relatively new to handling OpenAFS.  Are these problems part of a
> normal "kernel release; openafs update" cycle and perhaps I'm getting
> snagged just by being too early of an adopter?  I wanted to raise the
> alarm on this and see if anything else was needed from me as the
> reporter of the issue, but perhaps that's an overreaction to what is
> just part of a normal process I just haven't been tuned into in prior
> RHEL release cycles?

Kodiak,

On RHEL, DKMS is safe to use for kernel modules that restrict themselves
to using the restricted set of kernel interfaces (the RHEL KABI) that
Red Hat has designated will be supported across the lifespan of the RHEL
major version number.  OpenAFS is not such a kernel module.  As a result
it is vulnerable to breakage each and every time a new kernel is shipped.

There are two types of failures that can occur:

 1. a change results in failure to build the OpenAFS kernel module
for the new kernel

 2. a change results in the OpenAFS kernel module building and
successfully loading but failing to operate correctly

It is the second of these possibilities that has taken place with the
release of the 3.10.0-830.el7 kernel shipped as part of the RHEL 7.5 beta.

Are you an early adopter of RHEL 7.5 beta?  Absolutely, its a beta
release and as such you should expect that there will be bugs and that
third party kernel modules that do not adhere to the KABI functionality
might have compatibility issues.

There was a compatibility issue with RHEL 7.4 kernel
(3.10.0_693.1.1.el7) as well that was only fixed in the OpenAFS 1.6
release series this past week as part of 1.6.22.2:

  http://www.openafs.org/dl/openafs/1.6.22.2/RELNOTES-1.6.22.2

Jeffrey Altman
AuriStor, Inc.

P.S. - Welcome to the community.

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] convert 'vos dump' output to tar or zip?

2018-01-30 Thread Jeffrey Altman

Hi Todd,

Its been a while ...

On 1/30/2018 4:20 PM, Todd Lewis wrote:
> Has anybody a tool to convert "vos dump" output to a tar or zip format?

To the best of my knowledge there is not such a tool.  Part of the
reason a tool doesn't exist is that there would not be a lossless
conversion from dump to either tar or zip.

Assuming that the dump contains the full contents of a volume (not an
incremental) it would be possible to transfer the directory tree, file
data, and symlinks.  However, mount points and AFS specific metadata
including ACLs, policy bits, etc. would be lost.

> Is the "vos dump" format usable by anything other than "vos restore"?

Yes.  dumpscan and restorevol can process dump files.

Teradactyl's TiBS backup system can I believe import AFS dumps and
restore to non-AFS file systems.

There have also been several uncompleted projects that permit dumpfiles
to be mounted as if they were ISOs on Linux and Windows.

Perhaps it would be useful if you could discuss the end goal of the
conversion.  For example, you might describe the problem as:

   UNC is shutting down its cell and we have 20+ years of volume
   dumps as backups.  We would like to be able to access the
   contents of these backups without deploying a new cell.

Sincerely,

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Is member of a machine group honored as system:authuser?

2018-01-25 Thread Jeffrey Altman

On 1/24/2018 2:31 PM, Ximeng Guan wrote:
> Hello,
> 
> I am trying to make some effective use of machine groups in AFS to 
> accommodate certain requirement of licensed software. I read about the 
> feature, and noticed that in the 1998 edition of the book "Managing AFS, The 
> Andrew File System" by Richard Campbell, the following text appeared in 
> Chapter 7 p.230:
> 
> "...
> There is one final quirk to the implementation: it's common for several 
> top-level directories of the AFS namespace to be permitted only to 
> system:authuser, that is, any user can access the rest of the namespace, but 
> only if the user has been authenticated as a user, any user, of the current 
> cell. Machine groups are intended to be useful for any person logged in to a 
> workstation so that software licenses can be honestly followed. Therefore, 
> when an unauthenticated user is using a machine that is a member of a group 
> entry on an ACL, the user's implicit credential is elevated to 
> system:authuser, but only if the machine entry in the group is an exact 
> match, not a wildcard.
> 
> This rule permits any user of a given desktop to effectively have 
> system:authuser credentials for a directory. As long as that directory has an 
> ACL that includes the specific machine's IP address as a member of a group 
> entry, any user of the desktop, and only that desktop, would have access to 
> the directory. 
> ...
> "

This text was true prior to IBM AFS 3.2 but has not been true for any
release since IBM AFS 3.3.  As of AFS 3.3 the Current Protection Set for
a host includes neither system:anyuser nor system:authuser.  In other
words, hosts are not users.

> That is exactly what we have in the top-level directories in our cell: We 
> have "system:authuser rl" on the ACL of root.cell. 
> 
> Access list for . is
> Normal rights:
>   system:administrators rlidwka
>   system:authuser rl
> 
> Then when I create a machine-based pts entry 10.12.8.31, add it to a new 
> group named machinegrp, and wait for >2 hours to let it be effective 
> (according to dafileserver's man page)
> 
> $ pts member machinegrp
> Members of machinegrp (id: -250) are:
>   10.12.8.31
> 
> I would expect that a local user on 10.12.8.31, even without an AFS token, 
> would be able to "cd" into the top directory of the cell. But in reality that 
> does not happen. An unauthenticated user is denied of access. 

This is working as designed because the ACL does not include the host
identity.
> 
> When I explicitly put "machinegrp rl" on the ACL of the cell's top directory 
> (root.cell), an unauthenticated user is indeed able to access the AFS space. 
> 
> This is not quite convenient, because to allow the user of that specific 
> machine to launch a license software installed in a certain (deep) directory 
> under AFS, for example /afs/cellname/tools/vendors/abc/softwarexx/bin, we 
> would have to explicitly place "machinegrp l" on the ACL of the parent 
> directories of ./bin from /softwarexx all the way up to /cellname. 
> 
> Then if we have another software and another machine group, we will have to 
> do the same again, and the ACL of our root.cell directory will soon be 
> populated with machine group entries. That does not seem to be an elegant 
> solution. 
> 
> Did I miss anything here? 

Perhaps.

The problem you are attempting to solve is that there exist directories
and files that must be accessible only from a particular subset of
trusted machines.  This data is only supposed to be visible to users of
those machines and no one else.

What you are possibly missing is that IP ACLs are not a form of
authentication and they cannot be used to provide any integrity
protection or wire privacy.  Any data that is accessed by the host
without a user's AFS token is going to be transmitted in the clear.
In addition, IP addresses can be spoofed.

This use case is one that AuriStorFS was explicitly designed to address.
 AuriStorFS client hosts can be keyed using Kerberos v5 principals.
AuriStor's RX security class supports combined identity authentication
providing the file server both the identity of the user and the identity
of the host.  Finally, the AuriStorFS Access Control language permits
different access permissions to be granted to each of the following
combinations:

  authenticated user on unauthenticated host
  authenticated user on authenticated host
  anonymous user on authenticated host
  anonymous user on anonymous host

The anonymous user on authenticated host communications with the file
server are authenticated using the host principal and all data is both
integrity protected and encrypted for wire privacy.

Jeffrey Altman
AuriStor, Inc.

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] KeyFile issues upgrading servers from 1.4 to 1.6

2017-12-22 Thread Jeffrey Altman

y advise deploying 1.6.22 or later.

Once all of the servers have been upgraded to at least 1.6.22 it is
critical that the DES cell key be replaced with an
AES256-CTS-HMAC-SHA1-96 Kerberos service key.  Failure to do so leaves
the cell vulnerable to brute force attacks.

AuriStor provides professional OpenAFS support services to assist
organizations such as PSC when upgrading cells.

  https://www.auristor.com/openafs/

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] changing just the name of a database server?

2017-12-21 Thread Jeffrey Altman

On 12/21/2017 2:36 AM, Benjamin Kaduk wrote:
> On Wed, Dec 20, 2017 at 12:08:33PM -0500, Steve Gaarder wrote:
>>
>> On Wed, 20 Dec 2017, Benjamin Kaduk wrote:
>>
>>> Hi Steve,
>>>
>>> On Tue, Dec 19, 2017 at 09:19:36AM -0500, Steve Gaarder wrote:
>>>> I want to change the name of one of my database servers, while keeping the
>>>> IP address the same.  Besides making the change in the DNS and the
>>>> machine's hostname, is there anything else I need to do?
>>>
>>> You should also notify cellser...@grand.central.org so that the
>>> central CellServDB records can be updated.  IIRC, at least windows
>>> clients use the name after the '#' for address lookups -- it is not
>>> just a comment field.
>>>
>>
>> Thanks for that info.  Does it matter whether the name is a CNAME or not? 
>> I'm thinking that, to ease the transition, I could make the new name a 
>> CNAME for the existing name, tell grand.central to change the cellservdb, 
>> and later rename the machine.
> 
> This is pretty far outside my area of expertise, but some code
> diving suggests that the OpenAFS client will follow a CNAME if the A
> record is included in the additional section of the DNS response
> that has the CNAME in the answer section, but will not initiate an
> additional DNS request to follow the CNAME.  It's possible that the
> Auristor client has implemented that TODO item, but I have no way to
> check.
> 
> -Ben

When building a DNS SRV or DNS AFSDB record you MUST use names that can
be resolved by A and  records.   However, you can use CNAME records
for the name listed in the CellServDB file.

This is true not only for OpenAFS but for AuriStorFS as well.

Jeffrey Altman


<>

smime.p7s
Description: S/MIME Cryptographic Signature

[OpenAFS] URGENT: macOS High Sierra to be released next week - must upgrade to AuriStorFS v0.160 first

2017-09-21 Thread Jeffrey Altman

To all OpenAFS and AuriStorFS macOS users,

Next week Apple is expected to release macOS High Sierra to the public
as a free upgrade.

Apple has published the following document describing the upcoming
changes that institutional end users must prepare for.

  Prepare your institution for iOS 11, macOS High Sierra, or macOS
  Server 5.4

https://support.apple.com/en-us/HT207828

In that document Apple references two sets of changes that impact
macOS systems upgraded to High Sierra if an OpenAFS or AuriStorFS client
is installed.

 1. Apple File System (APFS)

As part of the High Sierra upgrade the local file system will
be converted from HFS+ to APFS.  Unfortunately, the OpenAFS
and AuriStorFS disk cache is file system dependent.  All OpenAFS
clients for macOS and all AuriStorFS clients for macOS prior
to AuriStorFS v0.160 are unaware of APFS and will panic the machine
if the disk cache is located on an APFS volume.

Today, AuriStorFS v0.160 has been published to the AuriStorFS
download web page:

  https://www.auristor.com/filesystem/client-installer/

Prior to performing the macOS High Sierra upgrade, any installed
OpenAFS or AuriStorFS must be uninstalled or upgraded to
AuriStorFS v0.160.

As with any macOS upgrade, after the upgrade the High Sierra
specific installer for AuriStorFS should be installed.  The
macOS High Sierra v0.160 dmg was published to the AuriStorFS
download site today.

 2. User Approved Kernel Extension Loading

https://support.apple.com/en-us/HT208019

Apple has decided that newly installed kernel extensions will
require "User Approval" before they can load.  "User Approval"
is more than just running an installer with Administrator
privileges.  After installing AuriStorFS on macos High Sierra,
an end user will have to manually approve the kernel extension
before /afs can be accessed.

I realize the time is short.  Apple is expected to release macOS High
Sierra on Tuesday 26 September 2017.  AuriStor, Inc. became aware of the
panic after upgrade issue a couple of days ago.

Jeffrey Altman
AuriStor, Inc.






smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] OpenAFS on OpenBSD

2017-08-28 Thread Jeffrey Altman

On 8/28/2017 3:55 AM, Harald Barth wrote:
> 
>> a half-measure probably isn't worth the effort
> 
> If you care about a more flexible license, for the client, it might be
> worth reviving Arla. The concept with a small kernel module and a
> userland daemon is different from OpenAFS as well. Another idea would
> be to port the relevant parts to the FUSE interface instead (both Arla
> and OpenAFS are older than FUSE).
> 
> Harald.

Hi Harald,

The Arla architecture is similar to the OpenAFS Windows Redirector
architecture.  The kernel module implements all of the required VFS
functionality and a userland process is responsible for implementing all
of the cache management, the Rx network stack, the RXAFSCB service, and
all of the VL and RXAFS RPCs.

This approach reduces complexity by removing the Rx networking and cache
management file operations from the kernel.  I'm not sure the reduced
complexity saves all that much considering that the OpenAFS Rx stack was
implemented in kernel for prior versions of OpenBSD.

There is also a performance cost since upcalls from the kernel module to
the userland process must be performed to satisfy the requests delivered
through the VFS.

As for FUSE, the OpenAFS cache manager has been implemented as a FUSE
module.  Unfortunately, there remains a significant unresolved design
question: "How to implement path ioctls for FUSE?"  Without pioctls the
OpenAFS command line tools can not be used for token management or
otherwise manage the cache manager.   Therefore, the existing FUSE
implementation only supports anonymous operations.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Problem deleting volumes

2017-07-24 Thread Jeffrey Altman

On 7/24/2017 11:22 AM, Susan Litzinger wrote:
> I have a number of temp volumes that I'm trying to delete but having
> problems within our AFS filesystem.   Does anyone have a suggestion on
> how to diagnose the 'Possible communication failure' error?  I'm not
> finding anything on google.  Here is an example of a volume that I try
> to remove but it won't go totally away. 
> 
> 
> bash-3.2# vos remove -server velma.psc.edu <http://velma.psc.edu>
> -partition /vicepcd -id tmp.users.9.zzhao3 -localauth
> WARNING: Volume 537606306 does not exist in VLDB on server and partition
> Volume 537606306 on partition /vicepcd server velma.psc.edu
> <http://velma.psc.edu> deleted
> 
> bash-3.2# vos volinfo -id tmp.users.9.zzhao3 -localauth
> Could not fetch the information about volume 537606306 from the server
> Possible communication failure
> Error in vos examine command.
> Possible communication failure
> 
> Dump only information from VLDB
> 
> tmp.users.9.zzhao3
> RWrite: 537606306
> number of sites -> 1
>server velma.pvt.psc.edu <http://velma.pvt.psc.edu> partition
> /vicepcd RW Site
> 
> 
> bash-3.2# vos syncvldb -server velma.psc.edu <http://velma.psc.edu>
> -partition vicepcd -volume tmp.users.9.zzhao3 -dryrun -localauth -verbose
> Processing VLDB entry tmp.users.9.zzhao3 .

Susan,

According to DNS velma.psc.edu != velma.pvt.psc.edu:

Non-authoritative answer:
Name:velma.pvt.psc.edu
Address:  10.32.5.186

Non-authoritative answer:
Name:velma.psc.edu
Addresses:  2001:5e8:2:42::b8
  128.182.66.184


The psc.edu cell's VLDB believes that these addresses are separate
fileservers:

UUID: None
[10.32.5.185]:7005

UUID: None
[128.182.73.70]:7005

UUID: None
[128.182.73.72]:7005

UUID: None
[128.182.73.73]:7005

UUID: None
[128.182.40.71]:7005

UUID: None
[128.182.73.74]:7005

UUID: None
[128.182.73.75]:7005

UUID: None
[128.182.73.77]:7005

UUID: None
[10.32.5.186]:7005

UUID: None
[127.0.0.1]:7005

UUID: 002167fe-84dd-1ace-8751-b63bb680aa77
[128.182.59.182]:7005

UUID: 008e9914-f6d3-1a6d-b108-b942b680aa77
[128.182.66.185]:7005

UUID: 0029cfd4-6cc8-1a32-9a9e-b842b680aa77
[128.182.66.184]:7005

UUID: 002fb7a0-0e2a-13b8-a68d-b53bb680aa77
[128.182.59.181]:7005

UUID: 00376e3c-e32a-1acc-a42b-017faa77
[128.182.59.77]:7005

Since your cell is newer than IBM AFS 3.4, there should no longer be
file server entries in the VLDB that are not assigned a UUID.  My guess
is that they are left over from an attempt to manually modify a
fileserver's IP address.

velma.pvt.psc.edu [10.32.5.186] has 4131 volume entries in the VLDB.

velma.psc.edu [128.182.66.184] has 22947 volume entries in the VLDB.

If these are intended to be the same server, you might want to consider
rebuilding your VLDB from scratch.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] OpenAFS windows clients (Orpheus' Lyre)

2017-07-14 Thread Jeffrey Altman

On 7/14/2017 5:45 AM, Toby Blake wrote:
> Hi,
> 
> The Orpheus' Lyre vulnerability has thrown up a few questions with respect
> to AFS clients on windows.  Apologies if these are a little vague, but
> this seems like the right place to ask them.
> 
> We have been using the windows OpenAFS clients, as kindly provided by
> Auristor/YFS.  My understanding is that this comes bundled with Heimdal
> Kerberos.  Is this client vulnerable and requiring an update?

The Heimdal Kerberos bundled with the OpenAFS 1.7.3301 client as with
all versions of Heimdal Kerberos prior to version 7.4 include the
Orpheus' Lyre (CVE-2017-11103) bug.  The OpenAFS client does not require
an update but Heimdal does.

Heimdal 7.4 installers for Windows are available from

  https://www.secure-endpoints.com/heimdal/#download

Heimdal Kerberos releases are produced by staff from AuriStor, Inc. and
Two Sigma Investments.  Secure Endpoints, Inc. continues to package and
distribute the Windows release.

> Prior to using this client, we used the one provided on openafs.org,
> along with (a separate) Heimdal Kerberos from secure-endpoints.  On
> earlier versions of windows, I think we used MIT Kerberos.
> 
> Which I suppose brings me to my wider question: what AFS clients are
> others using on Windows?

I am unaware of any AFS client for Microsoft Windows 10 that is
available from anywhere other than AuriStor, Inc.

Jeffrey Altman
AuriStor, Inc.

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] New installation, linux server, AD kerberos

2017-06-28 Thread Jeffrey Altman

On 6/28/2017 11:33 PM, John D'Ausilio wrote:
> Ben and Jeffrey, I appreciate the help .. and I have a document in progress 
> describing start-to-finish 1.8 installation on Ubuntu.
> 
> Everything is working fine, server and linux clients are running. I'd like to 
> try a windows client but I don't see anything for 1.8. 
> Is 1.7 the dev branch for 1.8, and will that client work?

John,

All IBM AFS 3.6, OpenAFS, Arla, kAFS and AuriStorFS clients and servers
interoperate at the AFS3 protocol level.

The OpenAFS 1.7.3301 client built, signed and distributed by AuriStor,
Inc. will work with IBM AFS 3.6, OpenAFS and AuriStorFS servers.

  https://www.auristor.com/openafs/client-installer/

Jeffrey Altman
<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] New installation, linux server, AD kerberos

2017-06-23 Thread Jeffrey Altman

On 6/23/2017 11:00 PM, John D'Ausilio wrote:
> Setting tokens. john.dausilio @ corp.1010data.com

The rxkad security class was originally designed for Kerberos v4 names
and Kerberos v5 principal names must be converted to Kerberos v4 names
before they can be used.  Since Kerberos v4 uses '.' as the component
separator, a Kerberos v5 principal name with a '.' in the first
component cannot be safely converted to Kerberos v4.  To override that
restriction you must add

  -allow-dotted-principals

to all server command lines.

Jeffrey Altman



<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] New installation, linux server, AD kerberos

2017-06-23 Thread Jeffrey Altman

On 6/23/2017 12:33 PM, John D'Ausilio wrote:
> So .. I downloaded and installed the 1.8 debs, and everything seems to be 
> good. The packages end up starting bosserver ..
> I keep getting stuck at doing anything with bos .. most commands result in 
> the error "bos: could not find entry (configuring connection security)"
> Tried setcellname .. maybe this is already done at client install? Weird that 
> the client is a dependency of the fileserver ..
> 
> root@njdev216083:/home/sysdev# bos setcellname njdev216083 corp.1010data.com 
> -localauth
> bos: could not find entry (configuring connection security)

My guess is that you need to add the cell wide key via asetkey before
you can start the service.  Key management is an area that has changed
from OpenAFS 1.6 and OpenAFS 1.8 went in a different direction than
AuriStorFS so I'm not entirely sure.

Jeffrey Altman



<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] vos move: Error reading dump file

2017-06-01 Thread Jeffrey Altman

On 6/1/2017 7:43 AM, Andreas Breitfeld wrote:
> Hello,
> 
> 'vos move' of a big volume (8.7TB) id 537070263 failed with reporting
> the following messages in VolserLog on the target server:
> 
> ###
> Tue May 30 11:41:49 2017 1 Volser: WriteFile: Error reading dump file
> 171404 size=240744204 nbytes=215734028 (0 of 8192): File exists; restore
> aborted

The target server is reporting that it is attempting to read 8K from the
incoming rx_call but received 0 bytes.  The "File exists" is a red
herring because 'errno' is not set by rx.

You need to look at the source server to determine why the call ended.

Have you moved this volume in the past?  I ask because an 8.7TB volume
exceeds the maximum safe volume size in OpenAFS (2TB - 1) by quite a
bit.  Values such as disk usage, file count, and others are signed
32-bit integers that can overflow.  While the volume remains in place
OpenAFS servers will happily serve the data but it is possible that the
volume can no longer be represented by the dump format and that the
salvager can no longer reference all of the vnodes that are present in
the volume.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] ls: : Operation timed out

2017-04-24 Thread Jeffrey Altman

On 4/24/2017 4:01 AM, Yannick Ulrich wrote:
>
> To further investigate this, I ran fs checkservers on my computer
> 
> $ fs checkservers
> These servers unavailable due to network or server problems:  
> afsfs11.psi.ch.
> 
> From other machines this works fine so it appears to be somewhat
> localised to my computer.
> 
> I've rebooted my computer and reinstalled the client to no avail.
> 
> Installing the client on a virtual machine on my computer works out of
> the box.

Your investigation has been focused on a client issue but there are two
other possibilities:

1. a network or firewall related issue

2. an OpenAFS fileserver issue

If your machine reboots and obtains the same IP address each time, it is
possible that the problem is specific to that IP address and port 7001.
Given that running the client in a VM on your machine works, that lends
weight to the theory it is specific to the one IP address and port number.

You will need the assistance of the AFS administrator to further isolate
the problem by capturing network packets on afsfs11.psi.ch.  If no such
packets arrive from the client, then its a problem within the network.
If the packets do arrive, its a problem with the fileserver.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] volume throughput

2017-04-19 Thread Jeffrey Altman

On 4/19/2017 4:22 AM, Andreas Breitfeld wrote:
> Hello,
> 
> 'vos exa -id ' prints information about volume
> accesses in the past day. Is there any AFS command printing the amount
> of data which is read/written to a volume in a given time?

There is not.   OpenAFS does not collect such information.  It should be
noted that the reported "accesses in the past day" refers to the number
of accesses since:

 1. the most recent midnight, or
 2. the most recent volume move

whichever occurred most recently.

With AuriStorFS (but not OpenAFS) the quantity of data transferred can
be computed from fileserver audit logs.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Re: build error when linking with heimdal-dev (Re: [OpenAFS] bosserver -noauth& changes cell to localcell)

2017-04-13 Thread Jeffrey Altman

On 4/13/2017 4:08 PM, Michael Meffie wrote:
> On Thu, 13 Apr 2017 15:31:39 -0400
> Michael Meffie <mmef...@sinenomine.net> wrote:
> 
>> On Thu, 13 Apr 2017 16:58:57 +
>> Ted Creedon <tcree...@easystreet.net> wrote:
>>
>>> Looks like the compile failure is described in:
>>> https://lists.openafs.org/pipermail/openafs-info/2016-August/041890.html
>>>
>>> trying to figure that  out now.
>>
> 
> Hello Ted,
> 
> Does your build work if you manually change the following line in
> src/config/Makefile.config  (*after* running ./configure)
> 
> KRB5_LIBS = -L/usr/lib/x86_64-linux-gnu/heimdal -lkrb5
> 
> to:
> 
> KRB5_LIBS = -L/usr/lib/x86_64-linux-gnu/heimdal -lkrb5 -lasn1
> 
> It looks like we need to add -lasn1 anywhere libauth.a is linked (when using
> heimdal libs). Currently that is only done when building aklog.

That is an incorrect fix. -lasn1 should neither be added for aklog nor
libauth.

akimpersonate_v5gen.c is wrong in the Heimdal case.  It is making direct
usage of Heimdal ASN1 macros when it should be following the model used
for rxkad.  I'm not entirely sure why akimpersonate has its own v5gen
source files.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] bosserver -noauth& changes cell to localcell

2017-04-12 Thread Jeffrey Altman

On 4/12/2017 8:43 PM, Ted Creedon wrote:
> anyone know why executing bosserver -noauth& overwrites cellname.com in 
> ThisCell & CellServDB with localcell?
> 
> thanks
> 
> tedc

You didn't say which version of OpenAFS you are using but when 1.6.x
bosserver is executed and its attempt to load a valid configuration
fails, it then attempts to create a valid configuration.

  src/bozo/bosserver.c line 1032 of openafs-stable-1_6_x

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] vos dumps to S3 via S3 Storage Gateway?

2017-03-03 Thread Jeffrey Altman

On 3/3/2017 12:29 PM, Harald Barth wrote:
> 
> adsmpipe replacement:
> 
> /afs/hpc2n.umu.se/lap/tsmpipe/x.x/src/
> 
> Used with some scripts do put vos dumps into TSM archive. This is the
> current backup solution for at least 3 AFS cells I know about.
> 
> Harald.

There is also LTU's tsmafs which is available on GitHub

  https://github.com/mattiaspantzare/tsmafs

On 3/1/2017 1:52 PM, Dave Botsch wrote:
> Sounds like the most recent TSM patches may or may not be in the
> OpenAFS tree?
>
> Are you aware of any reason that this api is not enabled by default? I
> believe it would be a huge win for OpenAFS to be able to advertise
> native TSM support.

The primary reason is because a third party sdk is required to
use the api and that sdk is licensed for use only to customers
that have a valid license for the commercial product.

Beyond that there were other reasons.  The current TSM support was
merged into the OpenAFS repository prior to the existence of the
Gerrit review system and the buildbot continuous integration system.
It was merged without significant review by third parties.  The code
quality is quite poor as Anders noticed in Aug 2015.

  https://gerrit.openafs.org/#/c/11960/

Since there was no method by which the Gatekeepers could test
the TSM functionality nor guarantee that it didn't alter the behavior
of backups for non-TSM using organizations, the decision was made
to merge the code as a build time option for those organizations
that wanted it.

On 3/3/2017 11:00 AM, David Boyes wrote:
> The IBM-supplied TSM butc support relies on a XBSA (an OpenGroup
> standard) compatibility library that was not updated past version 6.1
> of the TSM client on HP/UX, Solaris SPARC and AIX. Linux and Solaris
> x86 were never supported for the XBSA-based client. A fairly
> substantial amount of work would be needed to bring that support up
> to the current client levels (basically recoding to support the
> native TSM API). There was some discussion about doing that circa
> 2009, unclear if anything happened with that.

The XBSA standard was adopted not only by Tivoli Storage Manager but
also by Veritas NetBackup.  As David Boyes said, IBM abandoned the
XBSA standard and now only supports their own proprietary API (loosely
based on the XBSA model.)  OpenAFS only ever included support for TSM,
not NetBackup.

The Backup Tape Controller (butc) when XBSA is supported permits a
remote XBSA enabled backup system to be used in place of a tape
device or local file system.  Full and incremental volume dumps
are sent to butc and stored in the XBSA service and the object
identifier of the specific backup is stored in the AFS backup database
just as if the dump had been stored to a tape device.

XBSA and the Spectrum Protect SDK are fresh in my mind because AuriStor
recently finished integrating Spectrum Protect support into the AuriStor
File System. AuriStorFS now supports IBM Spectrum Protect and the older
Tivoli Storage Manager releases. This is in addition to our support of
Teradactyl's True Incremental Backup System and BackupAFS.
The XBSA implementation is modular so we can add support for Veritas
NetBackup and object stores in the near future.

Jeffrey Altman
AuriStor, Inc.

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] vos dumps to S3 via S3 Storage Gateway?

2017-02-27 Thread Jeffrey Altman

Few are aware that OpenAFS can be built to support IBM TSM as a virtual
tape controller via the XBSA API.  AWS S3 could be added in a similar
manner.

The primary thing that I would want to add whenever storing backups
off-site is encryption.  AFS3 volume dumps are unencrypted.  I would
pipe the dump stream through a block cipher before passing it into the
AWS CLI.


Jeffrey Altman



On 2/27/2017 2:42 PM, Shane wrote:
> We have a legacy EC2 environment setup in which vos dumps are pulled in
> by Zmanda backup, using a custom tar wrapper. These are stored on s3 via
> Zmanda's virtual tape devices. We have a lifecycle setup to migrate the
> vos dumps to Glacier after some time. 
> 
> Looks like a feature was added to the AWS CLI that allows for dumping
> via a stream which looks interesting
> though: https://github.com/aws/aws-cli/pull/903
> 
> On Mon, Feb 27, 2017 at 10:50 AM, Walter Tienken <walter.tien...@asu.edu
> <mailto:walter.tien...@asu.edu>> wrote:
> 
> Hello all,
> 
> __ __
> 
> We currently perform nightly dumps to an on-prem NFS mount. I was
> curious if anyone has had any experience with using an Amazon S3
> Storage Gateway for similar purpose? With much of the focus of “to
> the cloud” with many leadership members/organization administrators,
> I figured it is a good question to ask here. If yes, what has been
> your experience so far? Do you pass through the public internet or
> perhaps use Amazon Direct Connect?
> 
> __ __
> 
> Thanks in advance for your input!
> 
> __ __
> 
> Walter Tienken
> 
> walter.tien...@asu.edu <mailto:walter.tien...@asu.edu>
> 
> UTO OPS Systems and Security
> 
> __ __
> 
> 
<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Is the OpenAFS-info mailing list still working?

2017-02-19 Thread Jeffrey Altman

On 2/19/2017 11:19 AM, Garance A Drosehn wrote:
> Hi.  On the last few messages I've sent to openafs-info@, the mailing
> list does not send me a copy of the email that I sent.  Looking back
> at older emails, the mailing list did always send me a copy.
> 
> I do see that my messages show up at:
>   https://lists.openafs.org/pipermail/openafs-info/2017-February/date.html
> 
> so the mailing list isn't completely broken.  But is there something
> odd going on with it?
> 

Most likely the rpi.edu mail server is blocking mail that has an rpi.edu
from address but was not sent from one of the approved rpi.edu mail
servers as listed in the rpi.edu TXT record

"v=spf1 ip4:128.113.2.225/29 ip4:128.113.2.231 ip4:128.113.2.232
ip4:128.113.2.233 ip4:128.113.26.109 -all"

Since the mailing list mail does not originate from one of the approved
mail servers it is blocked.

The mailing list is not broken.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] OpenAFS Windows build environment

2017-02-14 Thread Jeffrey Altman

On 2/14/2017 12:15 PM, Kostas Liakakis wrote:
> Even in 1.8.0pre1 tarball the normalize.h inclusion in cm_nls.c is
> unconditional and in Platform SDK 6.0a a quick search comes back empty
> for NORM_FORM.

[C:\Program Files\Microsoft SDKs\Windows\v6.0\Include]grep _NORM_ *
WinNls.h:typedef enum _NORM_FORM {

[C:\Program Files\Microsoft SDKs\Windows\v6.0A\Include]grep _NORM_ *
WinNls.h:typedef enum _NORM_FORM {

[C:\Program Files\Microsoft SDKs\Windows\v7.0\Include]grep _NORM_ *
WinNls.h:typedef enum _NORM_FORM {

[C:\Program Files\Microsoft SDKs\Windows\v8.1\Include\um]grep _NORM_ *
WinNls.h:typedef enum _NORM_FORM {

etc...

> And maybe, since you probably
> have a solid and current build environment already setup, could you
> please share your tool and SDK versions so I can match them instead of
> trying to solve problems already dealt with?

If I thought there was benefit in maintaining an out of date tool chain
I would not have shutdown the prior builders.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] OpenAFS Windows build environment

2017-02-14 Thread Jeffrey Altman

On 2/14/2017 7:37 AM, Kostas Liakakis wrote:
> 
> Hi Jeffrey,
> 
> Thanks for taking the time to answer. Please read below inline.
> 
> 
> On 2017-02-14 03:53, Jeffrey Altman wrote:
>> They are built with WiX 3.9 scripts.  The
>> installation packaging in the OpenAFS tree can no longer be used.
> I see. So I'll switch to a later version as Ben suggested and see what
> happens. Hopefully we won't have to reinvent the wheel.

The WiX 2.0 scripts cannot be used.  Someone will have to write a new
installer.

>> As a reminder, the 1.7.3301 installers that AuriStor, Inc. distributes
>>
>>   https://www.auristor.com/openafs/client-installer/
>>
>> can be installed on Windows 10 and Windows Server 2016 because they are
>> grand-fathered.  If the same sources were built today they would not
>> produce a working file system.
>
> This statement is a bit disturbing.

But should not be surprising since I gave the community plenty of warning:

https://lists.openafs.org/pipermail/openafs-info/2015-March/041324.html
https://lists.openafs.org/pipermail/openafs-info/2015-April/041325.html
https://lists.openafs.org/pipermail/openafs-info/2015-April/041328.html
https://lists.openafs.org/pipermail/openafs-info/2015-April/041330.html
https://lists.openafs.org/pipermail/openafs-info/2015-April/041332.html
https://lists.openafs.org/pipermail/openafs-info/2015-June/041392.html
https://lists.openafs.org/pipermail/openafs-info/2015-July/041449.html

http://workshop.openafs.org/afsbpw15/talks/thursday/AFS-on-Windows-AFSBPW15.pdf

http://workshop.openafs.org/afsbpw15/talks/friday/Securing_The_OS.pdf

> Do you mean that given the state of things, even if Windows buildbots
> start springing back to life and binaries for current OpenAFS versions
> become available, no later version than the already built and signed
> 1.7.3301 can be installed on Win10 and later? Will Win7 be ok at least?

I've answered these questions in the e-mails and presentations listed above.

> Do you consider the whole effort to revive the Windows port is in vain?

In April 2015 I provided estimates of what I believe the on-going costs
are for obtaining and maintaining a Microsoft signature for a file
system.  (Second link above.)

Last week I attended IFS PlugFest 29. Microsoft is serious about
improving the reliability of Windows and its resistance to root kits,
ransomeware, and other forms of malware.  Each year the requirements
that driver vendors must satisfy become more demanding.

 * Mandatory to implement functionality

 * Mandatory to use build chains

 * Mandatory to submit testing reports for each OS variant on which the
   driver might be installed

 * Mandatory use of EV code signing certificates

The challenge for the OpenAFS community is that the requirements cannot
be satisfied simply by compiling the Windows client with the latest tool
chains.  The mandatory to implement functionality requires support from
AFS (VL, FILE, RX, ...) that simply does not exist today.  With the
release of Windows 10 Creators Update the bar will be raised once again.

> Or is there something the the Foundation is planning to do in order to
> move forward again on the Windows platform?

I am not a member of the Foundation Board.  Their minutes can be
reviewed at

  http://www.openafsfoundation.org/about/minutes/

Perhaps a board member could comment on their plans.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] OpenAFS Windows build environment

2017-02-14 Thread Jeffrey Altman

On 2/14/2017 8:14 AM, Kostas Liakakis wrote:
> Well, I gave up on it :)
> Instead commented the #include in cm_nls.c and added a typedef for
> _NORM_FORM enum copied out of the MS documentation.

The NORM_FORM enum is defined in the Windows SDK.  The normalization.h
include is protected by a conditional

  #if (_WIN32_WINNT < 0x0600)
  /* This is part of the Microsoft Internationalized Domain Name
 Mitigation APIs. */
  # include 
  #endif

which matches the conditional in the SDK headers:

  #if (WINVER >= 0x0600)
  //
  //  Normalization forms
  //

  typedef enum _NORM_FORM {
...
  } NORM_FORM;

> Amazingingly, this was enough for the build to proceed and end in
> success. 

The cm_nls.c uses run-time loading of the functions so that the code can
be compiled for versions of Windows that do not ship with the
NormalizeString() and IsNormalizedString() functions.

The IDN is required when building for pre-Vista because you have
to install the normaliz.dll library as part of the installer.

> Well, at least my openafs/dest directory is now populated and I
> could at least run "pts help" and "fs help".

Neither pts nor fs use the normalization routines directly.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] OpenAFS Windows build environment

2017-02-13 Thread Jeffrey Altman

On 2/13/2017 3:58 PM, Kostas Liakakis wrote:
> Does anybody have any knowledge of what version of WiX did Secure
> Endpoints used in their builds, provided the information that they had
> been using VS2008 is correct?

Kostas,

The OpenAFS 1.7.3301 installers distributed by AuriStor, Inc. are built
from OpenAFS sources but they do not use the OpenAFS WiX 2.0 scripts for
the installation packages.   They are built with WiX 3.9 scripts.  The
installation packaging in the OpenAFS tree can no longer be used.

As a reminder, the 1.7.3301 installers that AuriStor, Inc. distributes

  https://www.auristor.com/openafs/client-installer/

can be installed on Windows 10 and Windows Server 2016 because they are
grand-fathered.  If the same sources were built today they would not
produce a working file system.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] OpenAFS 1.6.20.1 on AIX 7.1

2017-02-02 Thread Jeffrey Altman

On 2/2/2017 9:46 AM, Pascal Salet wrote:
> Hello all,
> 
> we used OpenAFS on AIX 6.1, and upgraded to AIX 7.1.
> 
> openafs-1.6.20.1 compiles on AIX 7.1, but fails to install.
> 
> I'd like to kindly ask if OpenAFS is known to install on AIX 7.1, and
> which would be the appropriate AFS-source-code-version to use.
> 
> excerpt from config.log:
> ===
> it was created by OpenAFS configure 1.6.20.1, which was
> generated by GNU Autoconf 2.63.  Invocation command line was
> 
>   $ ./configure --with-afs-sysname=rs_aix61 -- --enable-transarc-paths
> --disable-pam --includedir=/usr/include
> 
> ## - ##
> ## Platform. ##
> ## - ##
> ...
> uname -r = 1
> uname -s = AIX
> uname -v = 7
> ...
> /usr/bin/uname -p = powerpc
> ...
> /usr/bin/oslevel   = 7.1.0.0
> ...
> ===
> 
> I would be very grateful for any advice on this matter.

Pascal,

To the best of my knowledge there has been no active development of
OpenAFS on any AIX release since January 2015.  At that time the AIX 6
build machine was removed from the build farm because the organization
that provided it for many years could no longer afford to do so.

It does not surprise me if over the last two years there has been
drift between OpenAFS and AIX; especially a new major release of AIX.

To answer your question, I am unaware of anyone that has installed
OpenAFS on AIX 7.1.

If you can provide more details on the failure, perhaps the community
can assist?

If the ability to use OpenAFS on AIX 7.1 is important to your
institution, perhaps it would be willing to provide a build host for use
by the developer community.

Jeffrey Altman
AuriStor, Inc.

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Check free space on AFS share before login

2017-02-01 Thread Jeffrey Altman

On 2/1/2017 5:08 AM, Richter, Michael wrote:
> Hi,
> 
> we are using  OpenAFS for the home drive. /home/users is a symlink to
> the AFS path with all the home shares. The users home is for example
> /home/users/username.
> 
> The users only have 1 GB of space available in that share. It often
> happens that the quota is reached and they are unable to login. Ubuntu
> doesn’t give a meaningful error message. I think, Ubuntu doesn’t know
> what’s the problem, because it sees only “/” as mountpoint, which has
> enough free space available.

The OpenAFS Unix cache manager exposes AFS mount points as directories
not as symlinks and not as mount points.  From the perspective of
applications all of /afs is a single device consisting of every AFS
volume in the world.

In addition, while the file server offers the RXAFS_GetVolumeStatus RPC
which returns

 . the size of the partition
 . the amount of free space on the partition
 . the size of the volume quota (if any)
 . the remaining free volume quota (if any)

the OpenAFS Unix cache manager never queries it.  As a result, the
application only finds out that partition is full or the quota exceeded
during the close() system call.  If the quota is 2MB and an application
opens a file and writes 100MB and then closes the file without checking
the error code, the data is lost and the application believes the data
was written to the file server successfully.

As others have indicated, this is not how the Windows cache manager
works.  The Windows cache manager is aware of how much free space the
volume has and returns an error to the application as soon as the free
space reaches zero.  In addition, because the Windows cache manager
exposes each AFS volume as a separate device, it is possible to:

 . report some volumes as readonly and others as read/write
 . return accurate volume size and free space info for each path
 . report accurate quota information for each path
 . return out of space and out of quota errors on one path without
   causing the VFS to report those same errors on other paths

David Howell's kafs, the Linux in-tree AFS client, behaves in a manner
similar to the Windows client.

  https://www.infradead.org/~dhowells/kafs/

kafs requires testing, it requires that end user organizations inform
their preferred Linux distributions that building and distributing kafs
is important.  AuriStor, Inc. supports David Howells' development of
kafs.  Others should as well.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Procedure for changing database server IP addresses

2017-01-17 Thread Jeffrey Altman

On 1/17/2017 3:45 PM, Stephen Joyce wrote:
> I know the current best-practice for changing the IP addresses of AFS
> database servers is don't do it.
> 
> But assuming that I want/need to change IPs and have available hardware,
> is the use of clone dbservers the preferred method? I can tolerate short
> service interruptions of up to a few minutes as long as they're planned
> for low-utilization times.

um, not really.

> Initial condition is 3 dbservers ("OLD") located via AFSDB & SRV,

I assume these servers are who, what and when as listed in the
CellServDB file distributed from

  http://www.central.org/csdb.html

and included in every OpenAFS distribution.

> running 1.6.x. Desired final condition is 3 dbservers ("NEW") with
> different IP addresses, also running 1.6.x (for now).

The first thing to be aware of is that any entries in the CellServDB
file take precedence over information provided via DNS.  For recent
OpenAFS releases the precedence order is

 * CellServDB file
 * DNS SRV
 * DNS AFSDB

The Unix cache manager only uses the IPv4 addresses that are provided in
the CellServDB file.  Whereas the Windows cache manager only uses the
host name and performs a DNS A query on the name to obtain the IP
address to use.

The CellServDB file contains entries for physics.unc.edu but not
cas.unc.edu.  Although physics.unc.edu lists the same DB servers as
cas.unc.edu.

The second thing to be aware of is that a UBIK quorum is defined by the
set of dbservers that share a common configuration.  Running OpenAFS
UBIK servers with a mixture of configurations can lead to more than one
dbserver believe it is the master.

The UBIK clone servers are interesting because they are documented as
being non-voting.  That isn't exactly true.  All UBIK dbservers must
maintain connectivity with every other UBIK dbserver in its
configuration.  What is special about clones is not that they don't vote
but that

 1. they cannot vote for themselves
 2. their vote for other servers are received and then discarded
 3. a clone cannot be the source of the best database.

Many sites have experienced problems with UBIK quorums consisting of
more than 3 servers.  Some sites have successfully run with as many as 5
servers.  It really depends on the number of number of clients and the
average rate of application RPCs (VL, PT, ...).

The primary benefit of using clones in OpenAFS is when you wish to
prevent a server with a low IPv4 address from being elected the
coordinator (aka sync site).

> I'm roughing out a procedure, but my current thinking involves..
>
>  add 3 NEW dbservers as r/o clones (restarting db procs)

I don't believe that using clones at this stage is helpful.

Also, you should leave all of the DB servers shutdown for at least 90
seconds when modifying the configuration.

>  modify DNS to show all 6 IPs.
>  'fs newcell' or restart all afsd's (including on servers)

You will also need to update the configuration and restart the
fileservers.  The fileservers are clients of the PT and VL servers but
use the server CellServDB file for their server info.

>  swap clone/non-clone roles so that NEW dbservers are r/w and OLD
> dbservers are r/o clones (restarting db procs). At this point, sync must
> be a non-clone, r/w "NEW" server. 

Using clones to prevent the old servers from becoming coordinator is the
proper use.  You might want to consider only leaving one of the old
servers running at this point.  Be sure to shutdown all dbservers when
the configuration is changed.

> Verify with udebug. Any client afsd's
> not restarted/newcell'ed won't be able to make pt/vl changes.

The fileservers when started modify their VL entry. If their CellServDB
files are not updated as well, then they won't be able to registered.

>  modify DNS to show only 3 NEW IPs
>  'fs newcell' or restart of all afsd's (including on servers)
> 
>  remove 3 OLD dbservers which must be r/o clones (restarting db procs).
> Any client afsd's not restarted/newcell'ed won't be able to query
> pt/vlservers.

correct.

> Because it could take some time to restart/newcell all clients, I'm
> thinking of doing the clone addition/dns steps then waiting some time
> (week+) before doing the role swap and second dns change. Then waiting
> another period of time (week+) before doing the last removal.
> 
> I'm assuming that I can use -auditlog (or even a packet sniffer) to see
> what clients might still be using the OLD dbservers prior to the final
> decommissioning.

rxdebug   -peer

> Seems a bit too simple. What am I missing?

Good luck.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] OpenAFS 1.8.0 alpha 1 available

2017-01-01 Thread Jeffrey Altman

On 1/1/2017 1:38 PM, Gaja Sophie Peters wrote:
> I wonder if this problem is related to the matlab "bug" (a little
> off-topic maybe, but who knows) Matlab simply won't start from a
> directory with @sys in the path, so one has to specifically go via
> amd64_linux26 (or whatever is appropriate). Since it's just "luck",
> which path to the final directory the kernel sees first (depending on
> what else got called in what order), sometimes Matlab won't even start
> from amd64_linux26 because it STILL thinks, it's called from @sys --
> whatever the kernel sees first, it remembers and will continue to
> remember until a reboot...

I'm curious about this "matlab" incompatibility.  Can you clarify what
you mean by "a directory with @sys in the path"?  Do you mean:

1. a path component "@sys" is being passed to Matlab?

  /afs/cell-name/appl/@sys/bin/prog

2. a path component that is a symlink whose target path
   contains @sys?

/afs/cell-name/appl/bin/prog

  where "bin" is a symlink to "@sys" or to ".bin/@sys"
  where ".bin" is a directory containing sub-directories
  "amd64_linux26".

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Fw: it would be nice to have an administrators guide

2016-12-26 Thread Jeffrey Altman

On 12/26/2016 2:58 PM, Benjamin Kaduk wrote:
> On Mon, Dec 26, 2016 at 11:37:17AM +, Ted Creedon wrote:
>> Ben,
>>
>> Thank you.
>>
>> So far by using the eclipse IDE , I've added afslog=true to [appdefaults]
> 
> Hmm, somehow I thought [appdefaults] was for applications that did not ship
> with the kerberos distribution (but could be wron), and I have more 
> familiarity
> with MIT krb5 than heimdal anyway.  But that doesn't seem like it would
> give better debugging output anyway.

Ben,

The Heimdal kinit has the ability to obtain afs tokens.  That is
controlled with the "afslog" configuration setting.

The order of precedence is

[appdefaults]
  kinit = {
 [ {
   afslog = 
}
  }

[appdefaults]
  kinit = {
afslog = 
  }

[appdefaults]
   [ {
afslog = 
  }

[appdefaults]
  afslog = 


[realms]
   [ {
afslog = 
  }

[libdefaults]
  afslog = 

The option doesn't have any impact on OpenAFS aklog.

Jeffrey Altman




<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] OpenAFS 1.8.0 alpha 1 available

2016-12-13 Thread Jeffrey Altman

On 12/13/2016 11:57 PM, Benjamin Kaduk wrote:
> The OpenAFS Guardians are happy to announce the availability of the first
> pre-release candidate of OpenAFS 1.8.0.
>
> A large number of bugfixes and new features are included, and there are also
> behavior and functional changes that may require administrator action as
> part of the upgrade; please consult the release notes for details.
> 
> Please assist the guardians ...

New stable release series of OpenAFS are few and far between; averaging
more than five years between major release.  This alpha release is a
major milestone.  A point at which I believe it is valuable to
acknowledge those who have contributed and the scope of their contributions.

The changes contributed towards 1.8 since the initial 1.6.x branch are
summarized as follows:

  4283 files changed, 325205 insertions(+), 311590 deletions(-)

in 4953 commits authored by the following individuals:

1334 Jeffrey Altman
913  Simon Wilkinson
715  Andrew Deason
320  Michael Meffie
292  Daria Phoebe Brashear
289  Benjamin Kaduk
233  Marc Dionne
88   Chas Williams
83   Garrett Wollman
53   Mark Vitale
49   Peter Scott
43   Anders Kaseorg
41   Rod Widdowson
40   Heimdal Developers
38   Russ Allbery
33   Antoine Verheijen
31   Christof Hanke
28   Stephan Wiesand
25   Ken Dreyer
23   Chaskiel Grundman
20   Jonathan A. Kollasch
16   Perry Ruiter
15   Hartmut Reuter
15   Jeffrey Hutzelman
14   Marcio Barbosa
13   Nathaniel Wesley Filardo
12   Jason Edgecombe
12   Matt Benjamin
11   Jeff Blaine
11   Nickolai Zeldovich
10   Sami Kerola
8Hans-Werner Paulsen
8Tom Keiser
7Arne Wiebalck
7Phillip Moore
6Jonathan Billings
6Michael Laß
6Steve Simmons
5Dave Botsch
5Rainer Toebbicke
5Stefan Kueng
4Chaz Chandler
4Ken Hornstein
3Asanka C. Herath
3Edward Z. Yang
2Andy Cobaugh
2Christer Grafström
2Dan van der Ster
2GCO Public CellServDB
2Geoffrey Thomas
2Jacob Thebault-Spieker
2Marcus Watts
2Thorsten Alteholz
1Adam Megacz
1Alejandro R. SedeÃ±o
1Brandon S Allbery
1Brian Torbich
1Charles Hannum
1Chris Orsi
1Felix Frank
1Georg Sluyterman
1Gergely Risko
1Jeff Layton
1Jens Wegener
1Joe Gorse
1Jonathon Weiss
1Karl Ramm
1Lukas Volf
1Magnus Ahltorp
1Matt K. Light
1Matt Smith
1Nathan Dobson
1Niklas Jonsson
1Paul Smeddle
1Rainer Strunz
1Ryan C. Underwood
1Terry Long
1Thomas L. Kula
1Tim Creech
1Toby Burress
1Todd Lewis
1Troy Benjegerdes
1Vaibhav Kamra
1Vincent Archer
1Will Maier
1Yadav Yadavendra

It is very important to acknowledge the many roles that Ben Kaduk is
serving for the OpenAFS community including Guardian, 1.8 Release
Manager, and Security Officer.  The 1.8 alpha release would not have
occurred without his substantial and uncompensated efforts.  Ben is
neither employed by a company that uses AFS nor does the OpenAFS
Foundation nor anyone else pay Ben for his time and expertise.

I don't know what Ben wants for Christmas but I would encourage all who
value OpenAFS to write letters to the North Pole to ensure that Ben is
not mistakenly placed on the naughty list.  Ben has given the OpenAFS
community an early present.

Jeffrey Altman

P.S. - Happy Holidays from the AuriStor team

  Daria Phoebe Brashear
  Marc Dionne
  Simon Wilkinson
  Peter Scott
  Rod Widdowson
  and myself

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] dbservers version

2016-12-12 Thread Jeffrey Altman

On 12/12/2016 10:26 AM, Jean-Marc Choulet wrote:
> Hello,
> 
> At work, we have only one dbserver in version 1.6.1 (Debian 6). We want
> to add another dbserver but in a different version : v1.6.9 (Debian 8).
> Is it possible without upgrade the first dbserver ?
> 
> Thanks,
> 
> Jean-Marc.

There is little benefit to adding a second DB server unless you are also
adding a third.  In a two server scenario, the server with the lowest IP
(because of the extra 1/2 vote) can elect itself coordinator (sync site)
and the other server never can.  Failure of the only server that can
become the coordinator results in an outage.  A minimum of three DB
servers is required to provide redundancy.

While it is possible to continue operating the 1.6.1 (Debian 6) DB
servers as part of a ubik quorum.  You should do so with the utmost
care.  Over the last two years several critical bugs in the ubik
protocol implementation have been fixed which can result in corrupted
databases.  Some of the scenarios result in empty databases being
replicated to all servers.  The safest path is to upgrade the existing
database server to OpenAFS 1.6.20 before increasing the size of the quorum.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Additonal question about the OpenAFS Security Advisory 2016-003

2016-12-07 Thread Jeffrey Altman

On 12/7/2016 11:52 AM, Dave Botsch wrote:
> It sounds like running the salvagedirs would result in the next
> incremental dump being equiv in size to doing a full dump?

I haven't looked at the OpenAFS -salvagedirs implementation in a long
while but if it doesn't bump the DV of each directory it rewrites it
risks data corruption.

The next incremental dump would include all of the directories (which it
does in general anyway) but it wouldn't included normal files or
symlinks that have not changed.

Jeffrey Altman
<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Additonal question about the OpenAFS Security Advisory 2016-003

2016-12-07 Thread Jeffrey Altman

On 12/7/2016 8:06 AM, Harald Barth wrote:
> 
> The security advisory says:
> 
>> We further recommend that administrators salvage all volumes with the
>> -salvagedirs option, in order to remove existing leaks.
> 
> Is moving the volume to another server enough to fix this as well or
> does the leak move with the volume?

The leak will move with the volume.

A bit of background for those that are not steeped in the details of the
AFS3 protocol and client and file server access for directories.

AFS file servers store directory information in a flat file that
consists of a header, a hash table and a fixed number of directory entry
blocks.  When a client reads the contents of a directory, it fetches the
directory file in exactly the same way it fetches the contents of normal
files and symlinks.  The AFS3 callback mechanism works the same for
directory files as it does for normal files and symlinks.

An AFS dump can be thought of as an AFS specific "tar" variant which
stores AFS Volume metadata and data elements. When a volume dump is
constructed for a volume move, a volume release, a volume backup, etc.
the contents of the directory files are copied into the dump stream
exactly as they are stored on disk by the file server.  When a volserver
receives a dump and writes it to disk as part of a AFSVol_VolForward or
AFSVol_Restore operation, each directory file is written to disk as it
exists within the dump.

Backup systems that store full and incremental dump files do so without
modifying the contents during the backup or restore operations.  As a
result restoring from a backup will restore any leaked information.

Backup systems that parse AFS dumps and reconstruct AFS dumps during the
restore process might or might not store and restore the leaked
information.  Contact the provider of your backup system.

It is worth emphasizing that IBM AFS and OpenAFS volserver operations
including all backup and restore operations occur in the clear.
Therefore, all leaked information will be visible to passive viewers on
the network segments across which volume backups and moves occur.

What the salvager's "-salvagedirs" option does is force the salvager to
rewrite every directory object.  This has two benefits when performed by
a 1.6.20 or later salvager.

1. It will build a directory file that contains no leaked information
   stored in the original directory file.

2. It will compact the directory to reduce fragmentation that could
   have resulted in directory full errors when attempting to store a
   filename that required more directory blocks than are available
   contiguously.

I hope this information is helpful.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

[OpenAFS] AuriStor, Inc at LISA 2016

2016-12-02 Thread Jeffrey Altman

AuriStor, Inc. is proud to be a bronze sponsor of LISA 2016 which is
being held next week at the Sheraton Boston Hotel.  If you are attending
please attend our BOF on Tuesday night at 7pm or stop by our booth on
Wednesday or Thursday in the Expo area.

Daria Brashear, Gerry Seidman and I look forward to seeing old friends
and discussing the accomplishments of the AuriStor team.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Connection timed out on new mount point

2016-12-02 Thread Jeffrey Altman

On 12/2/2016 11:35 AM, Dirk Heinrichs wrote:
> Hi,
> 
> I'm currently facing a strange problem with connection timeouts after
> creating a mount point (fs mkm) for a new volume:
> 
> # fs mkm tester home.tester.backup
> #  ll
> ls: cannot access 'tester': Connection timed out
> total 132K
> ...
> ??   ? ?  ? ?? tester
> 
> The mount point has been created from a client workstation and only
> becomes available there after reboot or cache manager restart. OTOH,
> it's accessible immediately on the server (where /afs is usually not
> accessed):
> 
> # ll
> total 134K
> ...
> drwx--   2  1005  1001 2.0K Dec  1 21:49 tester
> 
> Both server and client are up-to-date Debian Stretch systems running
> OpenAFS 1.6.18.3.
> 
> Any ideas what could be causing the problem?
> 
> Thanks...
> 
> Dirk

The client has cached information for the volume group that indicates
that no backup volume exists.

  fs checkvolumes

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] space and vos zap problem

2016-11-29 Thread Jeffrey Altman

On 11/29/2016 12:23 PM, Ken Hornstein wrote:
>> The one concern with -orphans remove when salvaging the entire partition
>> is if there were orphans that belonged volumes other than the one that
>> was deleted.  If such files existed they are now lost.
> 
> Yeah, that's certainly something to be aware of.  I only suggested
> that because Gary said he had moved all of the volumes off of that
> partition.
> 
> --Ken

In that case, wipe the partition and restart the fileserver.

The bosserver must stop the fileserver anyway to perform a full
partition salvage.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] space and vos zap problem

2016-11-29 Thread Jeffrey Altman

On 11/29/2016 10:50 AM, Gary Gatling wrote:
> Ah yes. That fixed that problem Adding "-forceDAFS -orphans remove
> " Thanks a lot!
> 
> gsgatlin@vaporware:~ ssh-session$ /usr/sbin/vos partinfo engr-f-200
> Free space on partition /vicepa: 158233268 K blocks out of total 488735480
> Free space on partition /vicepb: 488662856 K blocks out of total 488735480
> gsgatlin@vaporware:~ ssh-session$ /usr/sbin/vos partinfo engr-f-200
> Free space on partition /vicepa: 488662872 K blocks out of total 488735480
> Free space on partition /vicepb: 488662856 K blocks out of total 488735480

The one concern with -orphans remove when salvaging the entire partition
is if there were orphans that belonged volumes other than the one that
was deleted.  If such files existed they are now lost.

Jeffrey Altman




smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] best practice for a service to access a user AFS token? and why ruid instead of euid?

2016-11-17 Thread Jeffrey Altman

; dependencies) ?

There are four steps that need to be performed:

1. obtain and/or renew Kerberos v5 tickets

2. obtain the afs (or AuriStorFS token)

3. create the PAG for the process

4. set the token for the PAG

Step 3 needs to be performed once per process and does not involve the
network.

Step 4 needs to be performed each time the token is replaced due to
renewal.  It does not require network.

Steps 1 and 2 do require network but can be performed in the background
after the initial acquisition.

> 
> Another idea, as I only care about supporting Linux, would be to
> leverage Linux kernel keyrings.  I am thinking perhaps my credential
> manager could link the "afs_pag: _pag" key to the user keyring, and then
> the HTCondor service could link this key into its session keyring when
> impersonating.  Does anyone think that would work?  Or is there more to
> swapping PAGs in and out (i.e. besides the key on the keyring, pagsh
> seems to do some magic with groups as "/bin/id" shows some magic groups
> when in a PAG...) ?  Is the keyring-based idea crazy talk or a good idea
> to pursue if Linux is my only target?

The groups are not required on Linux.  The cache manager uses keyrings
internally to track the PAG.

The behavior you are seeking with user-land manipulation of keyrings to
store afs tokens is the one that David Howell's kafs client will provide
when it is complete.

  https://www.infradead.org/~dhowells/kafs/

> I've seen the great lengths (i.e. immense amount of code, security
> side-steps of creating their own krb4 tickets) that Samba has done to
> support AFS; I am hoping there is an easier way.

Samba had a different problem than HTCondor.  An SMB 1.1 server used to
authenticate a client by the client sending a username and password over
the wire to the server which could then be used with kinit (or
equivalent library calls) to obtain a Kerberos TGT.  In the 90s this was
more often than not Kerberos v4.

In SMB 1.2 NTLM authentication was introduced and later GSS-Kerberos v5
authentication.  Both avoided the transmission of the user's password.
Therefore, Samba had to obtain a Kerberos v5 ticket granting ticket or
an AFS token some other way.  This typically made use of an
impersonation service to acquire a TGT or Token after asserting it had
authenticated the user identity.  Many single sign-on web authentication
services utilize a similar model.

> Your suggestions greatly appreciated.

The best approach in my opinion is to follow the LSF model.

Jeffrey Altman
AuriStor Inc.

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Mac OS sierra support - any news?

2016-10-14 Thread Jeffrey Altman

On 10/14/2016 8:28 AM, Elias Michael Ruettinger wrote:
> Hi,
> 
> I just wanted to ask, if there is any news on a mac os sierra open afs 
> release.
> Since the last update, it isn’t supported anymore. :-( 
> 
> Cheers
> 
> Elias


AuriStor, Inc. has been shipping an OSX Sierra client compatible with
IBM AFS and OpenAFS since OSX Sierra was publicly released.

  https://www.auristor.com/filesystem/client-installer/

Sincerely,

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Moving volumes between different cell and different realm names

2016-10-10 Thread Jeffrey Altman

On 10/10/2016 4:51 AM, Andreas Ladanyi wrote:
> Am 07.10.2016 um 22:58 schrieb Jeffrey Altman:
>>
>>>
>>> I read the thread:
>>> https://lists.openafs.org/pipermail/openafs-info/2009-March/031004.html
>>>
>>> So if i understand the thread and man pages correctly i could do the
>>> following steps:
>> Step 0.  Shutdown all of the AFS services on the server you want to
>> relocate to a new cell.
>>
>>> 1. change entries CellServDB / ThisCell on the old OpenAFS server
>>> (current config is Cell A) to Cell B.
>> And you need to install the keys from Cell B onto the fileserver.
> The old afs server doesnt support rxkad, only single des.
> The new afs server works with rxkad.
> 
> Is this a problem ?

I believe you meant to say the new afs server uses rxkad-k5+kdf.

If you have deployed non-DES keys to Cell B, then you cannot move the
fileserver from Cell A to Cell B unless you first upgrade the fileserver
to a version of OpenAFS that supports rxkad-k5+kdf.

>>
>> AFS servers do not know or care about the realms.   The servers within a
>> cell all must share the same server configuration (ThisCell, CellServDB,
>> and keys).
>>
>> You cannot move a volume between cells with the OpenAFS vos command.
> I know this. This is the reason why i want to relocate the old afs
> server cell name to the new cell name and then move the volumes.
>>
>> With AuriStorFS it is possible to copy volumes between cells.  A volume
>> once copied can be removed from the source if that is desired.
> So this feature wont be implemented in OpenAFS in the future ?

There is nothing that prevents someone from implementing this feature in
OpenAFS.

> Whats up with the release of OpenAFS 1.8 ?

The most recent status report on 1.8 was sent to the openafs-devel
mailing list on Sept 15th.

http://lists.openafs.org/pipermail/openafs-devel/2016-September/020355.html

Since then several of the remaining blocking work items have been merged
into the source tree.  There are still several non-blocking items that
require additional review.

Gerrit item https://gerrit.openafs.org/#/c/12393/ will be used for
development of the NEWS file for the first pre-release.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] OpenAFS Installation on Windows

2016-09-02 Thread Jeffrey Altman

On 9/1/2016 4:07 PM, Odoom, Jason wrote:
> 
> Hello,
> 
>  I've been having difficulty installing OpenAFS on Windows 10. I receive
> the error "Installation of network provider failed" when installing the
> OpenAFS client msi package. I saw from a previous mailing list[1] that I
> have to run the msi with administrative privileges. However, that does
> not fix the issue. How do I fix this issue?
> 
> [1]:https://lists.openafs.org/pipermail/openafs-info/2011-January/035370.html
> <https://lists.openafs.org/pipermail/openafs-info/2011-January/035370.html>
> 
> Any assistance would be appreciated, 
> 
> 
> -- 
> -Jason Odoom 
> ARCS Student Assistant

Jason,

When requesting assistance it is important that you be specific about
the variables in your environment:

 * what version of Windows are you using?   You say Windows 10
   but Windows 10 is a moniker that describes every version of
   Windows shipped since 29 July 2015 as well as many different
   variants?

   home, professional, educational, enterprise, tablet, mobile, IoT, ...

   x64, x86, arm, ...

   Within each variant there are major milestone releases with
   different features and requirements.

* what version of OpenAFS are you installing and whose packaging?

If you are not already aware, the OpenAFS client for Windows 1.7.x is
implemented with several "system" component extensions

 * two file system drivers

 * two network provider dlls.  one for authentication and one for
   interfacing with the file system drivers to support drive letter
   mapping, path evaluation, etc.

 * a system service

 * a suite of file explorer object classes

All of the OpenAFS MSIs and the binaries included within them were
signed by Your File System, Inc. with a SHA1 hash.  Microsoft has warned
for years that SHA1 code signing would be deprecated.  Due to additional
weaknesses in SHA1 hashes Microsoft and Mozilla decided to accelerate
the deprecation process.

 https://blogs.windows.com/msedgedev/2015/11/04/sha-1-deprecation-update/

While SHA1 signed MSIs are still accepted, SHA1 signed system binaries,
depending upon the Windows variant and domain policy, are not.   An
attempt to install a system binary on a system that requires SHA2
signatures will fail.

Also, it is important to note that as of Windows 10 version 1607 the new
driver signing requirements for file system drivers are being enforced.

https://blogs.msdn.microsoft.com/windows_hardware_certification/2016/07/26/driver-signing-changes-in-windows-10-version-1607/

However, I doubt that this is the cause of the problems you are
experiencing.Last night I clean installed the 1607 professional 1607
build on both x86 and x64.  The AuriStor distributed OpenAFS for Windows
installers (1.7.3301) that utilize the AuriStor (formerly Your File
System) packaging installed successfully on both systems and the client
operated as expected.   The drivers included in this installer are
accepted under the grandfather exception for cross-signing certificates
issued before 29 June 2015.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] VerboseLogging registry values?

2016-09-01 Thread Jeffrey Altman

On 9/1/2016 11:43 AM, Caldwell, Hugh wrote:
> Could someone tell me what the appropriate values are for 
> HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\TransarcAFSDaemon\NetworkProvider\VerboseLogging
> ?
> 
> This page is the only reference I can find to it and the values aren't
> defined
> https://www.openafs.org/dl/1.3.64/winnt/registry.txt
> 

"VerboseLogging" is not used by OpenAFS.


<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Access an OpenAFS cell in LAN and WAN with dynamic DNS (DDNS) address

2016-08-31 Thread Jeffrey Altman

o IPv6"
from the 2015 Best Practices Workshop)

 http://workshop.openafs.org/afsbpw15/talks/thursday/wilkinson-road-to-afs.pdf

you could obtain your own IPv6 tunnel from Hurricane Electric and avoid
the IPv4 NAT nonsense when IPv6 is available to your client systems.  Of
course, OpenAFS doesn't. Alternatives include requiring the use of a VPN
to access the cell and ensuring that only private addresses are
published for cell servers.


As indicated above, DNS can be used by clients to locate the database
services: Volume Location, Protection, etc.  DNS cannot be used by
servers to locate other servers.  Nor can DNS be used by clients to
locate file servers.

When AuriStor added IPv6 support to its product one of the goals was
support for 6to4 gateways such as Microsoft Direct Access.  These
gateways rely upon DNS queries to obtain a  record for a host that
only has an IPv4 address.  The AFS3 model of embedding IPv4 addresses in
the VL_GetAddrsU RPC response posed a challenge because an IPv6 only
client must implement the necessary logic to identify the correct ipv6
prefix to use in converting the private IPv4 address to an IPv6 address
that will be routed through the 6to4 gateway. Other protocols such as
FTP that embed addresses in the control stream have similar problems.

Although it might be tempting to extend the VL RPCs to permit
registration of file servers by DNS host name instead of by IP address,
a better approach might be to add split horizon support to the VL
service such that file servers can register both the public and private
addresses with the VL service but the VL service would only disclose one
based upon which address the client's query is received from.

It would not be safe to convert the Ubik servers to rely upon DNS host
names.  Not with the current Ubik protocol specification.

Jeffrey Altman
<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Alternate source of Windows grind all Kerberos

2016-08-24 Thread Jeffrey Altman

On 8/24/2016 1:35 AM, Neil Davies wrote:
> Hi
> 
> It appears that secure-enpoints web site is broken since, at least, Monday. 
> 
> Anyone know where I can lay my hands on copy of heimsal Kerberos and the 
> network identity manager for Windows?
> 
> Google (and the way back machine) unable to help 
> 
> Cheers
> 
> Neil

As best as I can tell, some parts of the Internet are resolving
www.secure-endpoints.com to 141.8.225.68 instead of 208.125.0.235 as it
should.  As a result, those machines which are resolving the wrong IP
address are being redirected to a broken site.

The https://www.secure-endpoints.com/ is working.  I will also note that
at the time of the OP's e-mail the Secure Endpoints ISP was in the
middle of a four hour maintenance window that resulted in a total loss
of connectivity to https://www.secure-endpoints.com/.

Jeffrey Altman
Secure Endpoints, Inc.

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] some older openafs-client versions have started failing

2016-07-14 Thread Jeffrey Altman

On 7/14/2016 6:18 PM, Chad William Seys wrote:
> Hi Ben,
> 
> The Scientific Linux clients are using patched (by Redhat) 2.6.32 and
> the Debian clients are using patched (by Debian) 3.2.78 and 3.16.7 .
> 
> Do you suspect that a recent security patch, applied to all three
> kernels, could have broken the older AFS clients?
> 
> I could certainly test this idea if it appears promising.  I guess I'd
> start with the server's kernel though: One data point that argues
> against it being the client's kernel is that for the Scientific Linux
> box I booted up an machine which had not been updated for a long time
> (kernel dated Mar 22, 2016) and compiled openafs 1.6.15 (not functional)
> and 1.6.16 (functional).
> 
> Chad.

I am dismissive of the notion that the server's kernel version matters
since all of the fileserver code is in userland.

I believe the Debian and Scientific Linux issues are unrelated because
the symptoms are so different.

If you said that 1.6.18 was the first version of OpenAFS to work on
Debian I would correlate that with the Linux kernel changes to support
interrupting splice operations.  The splice operations were used by the
OpenAFS client for StoreData RPCs to avoid an extra memory copy of every
page that is written to the fileserver.  The 1.6.18 release removed it.

One of the symptoms of the splice change on OpenAFS clients was "git"
operations failing in such a fashion that the OpenAFS client marked the
fileserver state as "down".  When that happens the

  "Connection timed out"

error is logged regardless of the actual cause.

Since you indicate that 1.6.16 is the first version to work, something
else must be to blame on Debian.

For the Scientific Linux issue you should obtain a stack trace for the
hung "ls" process and collect cmdebug output for the affected cache manager.

Jeffrey Altman
<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Number of files in an OpenAFS volume...

2016-07-04 Thread Jeffrey Altman

On 7/4/2016 12:48 PM, Stephan Wiesand wrote:
> 
> On Jul 4, 2016, at 16:27 , Jeffrey Altman wrote:
> 
>> The directory size restrictions are one of the reasons that /afs cannot
>> be used for a large number of applications.  The AuriStor File System
>> implements a new directory format which is understood only by AuriStor
>> clients.  This format permits directories to grow to store an unlimited
>> number of entries.
> 
> Another such restriction is that AFS won't allow hard links across 
> directories, even if they're in the same volume. Does AuriStor lift that 
> restriction too?

Yes.  Cross-directory hard links must be prevented by AFS3 because ACLs
can only be assigned to directories.   The AuriStor File System allows
ACLs to be assigned to any object (directory, file, symlink or mount
point.)  If a cross-directory hard link is added to a file and that file
inherits a directory ACL, the directory ACL is copied to the file as
part of the creation of the hard link.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Number of files in an OpenAFS volume...

2016-07-04 Thread Jeffrey Altman

On 7/4/2016 11:57 AM, Matthew Lowy wrote:
> Hi Jeffrey,
>
> a locale where double-byte characters were in file names the packing wouldn't 
> be as good.

All AFS3 directory entries are stored as single byte C-strings.  Unicode
names as stored by Windows, OSX, Linux with the UTF8 locale, etc store
utf8 sequences.   Any character that cannot be represented by ISO-8859-1
will require more than one octet.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Number of files in an OpenAFS volume...

2016-07-04 Thread Jeffrey Altman

On 7/4/2016 6:55 AM, Matthew Lowy wrote:
> Hello,
> 
> We have a number of OpenAFS volumes that serve as storage for (public)
> mirrors and one of them is misbehaving when updated from upstream - the
> error indicates we've reached the limit of file names allowed in a volume.

In a volume or a directory?

The theoretical limit of directories in a volume is 2^30 and
non-directories in a volume is 2^30.  There have been incomplete efforts
to raise those limits by treating signed values as unsigned values but I
wouldn't count on them.

> The limit I am seeing is not compatible with my understanding of how
> OpenAFS handles file names in a directory. I've seen in the mail list
> archives the statements about how many file names can fit, that there
> are 64k slots and a file name < 16 in length occupies one slot, a file
> name from 16 to 32 characters long occupies two slots and so on. The
> earliest reference I've found is at
> http://lists.openafs.org/pipermail/openafs-info/2002-September/005812.html

Your understanding is roughly correct except that the numbers Todd
specified are wrong.   The actual number are determined by this function:

/* Find out how many entries are required to store a name. */
int
afs_dir_NameBlobs(char *name)
{
int i;
i = strlen(name) + 1;
return 1 + ((i + 15) >> 5);
}

A slot contains per file metadata followed by name data.  When multiple
slots all of the space in the 2nd and subsequent slots are used for file
name data.

> However...
> 
> The directory concerned has more than 21,000 files in it, almost all of
> them have names exceeding 52 characters... as at today there are
> 1,220,000 characters in filenames in that directory. Even assuming they
> pack down perfectly into directory name slots that's over 76,000
> slots... and working them out using the rule above indicates that the
> directory is using over 87,000 slots. These are both significantly above
> 64k.
> 
> I don't know if I'm misinterpreting the information in the OpenAFS
> archive or if the information is out of date - but I've not found
> anything that fundamentally is different from the information in the
> archive and I'm looking at a volume that seems to break the limits. 

The AFS3 directory format is part of the wire protocol as it is shared
by both the file server and the clients.

> I'd really benefit from understanding what's going on ... how we appear to
> be getting more file name information into a directory than should be
> possible.
> 
> /mirror.ox.ac.uk/sites/archive.ubuntu.com/ubuntu/pool/main/l/linux$ ls
> |wc -l
> 21731

This number is within the existing limits.

> /mirror.ox.ac.uk/sites/archive.ubuntu.com/ubuntu/pool/main/l/linux$ ls
> |wc -c
> 1250894

File names of length 52 through 70 characters require three slots.  If
all of the file names are of length 60 and are perfectly packed they
would require 62545 slots which is very close to the limit.

> This is one directory in a mirror of archive.ubuntu.com so you can see
> the contents from (e.g)
> https://launchpad.net/ubuntu/+mirror/mirror.ox.ac.uk-archive which
> points to the presentation of our mirror. The number of files has
> recently gone up because of upstream changes.

The directory size restrictions are one of the reasons that /afs cannot
be used for a large number of applications.  The AuriStor File System
implements a new directory format which is understood only by AuriStor
clients.  This format permits directories to grow to store an unlimited
number of entries.  However, the AuriStor file servers currently apply
an artificial limit of approximately 20 million entries.

More details on the AuriStor File System can be obtained at

  https://www.auristor.com/openafs/migrate-to-auristor/

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Client install on Windows Server 2012R2

2016-06-27 Thread Jeffrey Altman

On 6/27/2016 5:25 PM, Garrison, Christine wrote:
> Thanks, Jeff.
> 
> As usual, I am asking my questions clumsily.
> 
> What should my user install in order to use OpenAFS client on his Windows 
> Server 2012R2 machine? We have somewhat dated instructions for configuring to 
> our cell in IU's KB, and I'm trying to clarify for him to be able to get set 
> up and working. 

I would install

  https://www.auristor.com/openafs/client-installer/

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Client install on Windows Server 2012R2

2016-06-27 Thread Jeffrey Altman

On 6/27/2016 3:33 PM, Garrison, Christine wrote:
> Hello OpenAFS folks,
> 
> 
> I've got a user with a Win 2012R2 Server that's trying to install and
> configure the OpenAFS for Windows client. Their OS is 64 bit.
> 
> 
> We tried an install of OpenAFS for Windows 1.7.31 from the openafs.org
> website, but got a missing DLL error (sorry, he closed the shared screen
> too fast for me to jot it down) when trying to run the server manager to
> do a config of the cell.

Christine,

None of the cell management GUI tools work on Windows.  Just don't
install them or expect them to work.  The "Server Manager" literally a
tool for managing the cell not the client.

> We're more familiar with the 1.6.x client here; can it be run on Win
> Server 2012R2?

The OpenAFS 1.6 clients on Windows use the SMB gateway interface that
doesn't work on Windows 7 or above.

FYI, as of July 29th all new installations of Windows will require
Secure Boot.  At that time Microsoft signed device drivers will become
mandatory.  AuriStor, Inc. hopes to have its Windows client approved for
Microsoft's signatures by that date.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Access an OpenAFS cell in LAN and WAN with dynamic DNS (DDNS) address

2016-06-25 Thread Jeffrey Altman

On 6/24/2016 10:31 AM, Karl-Philipp Richter wrote:
> Hi,
> I'm running a server with an OpenAFS volume which updates its IP which
> is dynamically changed every 24 hours by the ISP using a dynamic DNS
> (DDNS) service and `ddclient`. The server is a gateway for a LAN subnet
> 192.168.179.0/24. I access this server inside my LAN by adding the
> 192.168.179.0/24 address to `CellServDB` which works fine on client
> inside 192.168.179.0/24. When I add the dynamic WAN IP of the server
> when I'm outside LAN (e.g. in eduroam) to `CellServDB` on the client
> side and reboot (and make sure that the IP didn't change after reboot)
> I'm experiencing `ls: cannot access '/afs/richtercloud.de': Connection
> timed out` when I invoke `ls /afs/` and see
> 
> [  130.010338] afs: Lost contact with file server 192.168.178.20 in
> cell richtercloud.de (code -1) (multi-homed address; other same-host
> interfaces maybe up)
> [  130.010343] RXAFS_GetCapabilities failed with code -1
> [  186.461024] afs: Lost contact with file server 192.168.179.1 in
> cell richtercloud.de (code -1) (all multi-homed ip addresses down for
> the server)
> 
> in `dmesg`.
> 
> I tried adding all LAN IPs of the server and the WAN IP to `CellServDB`
> in `[]` and not in all possible combinations. I configured my WiFi
> router to forward UDP for port 7000 to 7008 (inclusively) and 88 and 750
> (following https://wiki.openafs.org/AFSServicePorts/) to the server's
> interface and setup the same forwarding on the server.
> 
> -Kalle

Kalle,

There is an expectation that AFS servers have a stable IP address.
OpenAFS was developed in an age in which all assigned IP addresses were
stable and there was end-to-end connectivity.  There were no NATs and
few firewalls blocking network traffic.

When the IP address changes there is a requirement that the
configuration be altered and the servers be restarted in order for that
new IP address to become available.

The servers and the clients store the IP addresses.  The client in
particular caches volume location information for hours and must
manually "fs checkvolumes" be forced to refresh it when the file
servers' IP address changes.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Problem restore / mount volume

2016-06-21 Thread Jeffrey Altman

On 6/21/2016 9:08 AM, Andreas Ladanyi wrote:
>
>> The other question that comes to mind is, what are the ACLs on the root
>> directory of the user.test volume?   Does the current user have at least
>> lookup permission?
> yes, i mounted the restored volume to a folder "user.test" in my user
> afs home path. I also have a admin token. The user admin is in UserList
> of AFS server. So i should could do everything.

The system:administrator does not get implied lookup.  It has implied
admin permission.  The only think the system:administrator can do
without being on an ACL is query and set the ACL.

>> Finally, what version of the file server is hosting the volume?
> 1.6.1 , Solaris 10

The 1.6.1 file server does not permit FetchStatus on the root directory
of volumes for system:anyuser.  If you do not have tokens for a user on
the ACL, you won't be able to obtain access.

Use the system:administrator rights to query the ACL and set it as
necessary.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Read-only volume issues

2016-06-21 Thread Jeffrey Altman

On 6/21/2016 2:36 AM, Jan Iven wrote:
>
> One way could be to serve all "readonly" content (i.e. those with rw+ro
> volumes) from dedicated Samba servers that run with
> 
>  read only = yes
>  locking = no
> and possibly
>  oplocks = False
> (per
> https://www.samba.org/samba/docs/man/Samba-HOWTO-Collection/locking.html)
> 
> you could also increase logging (-d, or "kill -SIGTSTP") on a test AFS
> server to see what operation exactly is being attempted (and fails), see
> "man dafileserver"). Works best if you isolate the volume in question to
> a dedicated server.

On the AFS file servers I would recommend using audit logs instead of
debug logging because they record every RPC and can be filtered by host
IP address and Volume ID.   Each OpCode is recorded plus the return code.

On the Samba server recording "fs trace" output would show what the AFS
cache manager thinks the Samba server is requesting.

Samba logs will show what Samba is receiving from its clients.  All of
those sources will need to be correlated to identify the actual flow of
requests and failures.

Jeffrey Altman

<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Problem restore / mount volume

2016-06-21 Thread Jeffrey Altman

On 6/21/2016 5:10 AM, Andreas Ladanyi wrote:
> 
>> What does the log of the afs fileserver tell you, on which the volume
>> resist?
> On afs server:
> 
> FileLog:
> fssync: breaking all call backs for volume 536875364
> 
> VolserLog:
> Volser: CreateVolume: volume 536875364 (user.test) created
>> Looks like the user.test volume is not online.
> vos examine user.test
> user.test 536875364 RW1828930 K  On-line
> 
> 
> regards,
> Andreas

By any chance was the mount point created before the "user.test" volume
was restored or was the volume restored, removed, and restored again?

I'm thinking the client might have cached a volume id for "user.test"
that is no longer valid.  If that is the case, try

  fs checkvolume

The other question that comes to mind is, what are the ACLs on the root
directory of the user.test volume?   Does the current user have at least
lookup permission?

Finally, what version of the file server is hosting the volume?

Jeffrey Altman

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Read-only volume issues

2016-06-20 Thread Jeffrey Altman

On 6/20/2016 5:09 PM, Garrison, Christine wrote:
> Jeff,
> 
> Thank you for the "vos addsite" idea and for clarifying. That does seem like 
> a better option, assuming we can solve the Samba problem. The .readonly 
> volume produced acts just like the .backup mount did, so we are stuck. 
> Windows reports "failed with an error of 5" upon attempts to open files 
> through Samba from the .readonly volume mount.
> 
> I was pretty sure the answer here would be something like that, but the Samba 
> also people haven't been helpful about OpenAFS issues for a long time, so we 
> are kind of on our own. Which is one of the reasons we are retiring this 
> system.
> 
> Chris

Another approach you can take is to use the -readonly switch on the
fileserver process.   In that situation it won't matter if the volumes
are .readonly or not, the fileserver won't permit modifications.

Jeffrey Altman


<>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [OpenAFS] Read-only volume issues

2016-06-20 Thread Jeffrey Altman

On 6/20/2016 4:10 PM, Garrison, Christine wrote:
> Hi Jeff,
> 
> Does "vos addsite" duplicate a volume's data? 

From an implementation perspective a .readonly instance on the same
partition as the RW volume is identical to a .backup.  The difference
is that the .readonly is stable and mount points behave the way you
want them to and .backup volume instances are unstable.

If you move the volume the .backup will be destroyed whereas the
.readonly will remain.

> Because we are using much more than half our capacity, and therefore can't 
> afford to double usage. 
> To be blunt, we are trying to make the system read only by stages to
encourage users to migrate off this system as we are trying to retire it
at the end of the year.

I assumed that is what you were trying to do.

> If mounting the .backup volume isn't a good option, then that more or less 
> leaves me with walking the filesystem, parsing and changing ACLs to have no 
> write/insert/delete.
> This would be irreversible unless I stored the original ACL states
somehow and could re-apply them later if there was a mistake or a
political reason to restore a volume to be writeable for awhile.
> 
> Though, I am open to ideas. It seems awful to ask about how to retire an 
> OpenAFS system on this list, however.

The approach I have suggested will do what you need.

>> What are the errors?   Obviously, any access that expects write permission 
>> or the ability to obtain a lock is going to fail when the volume isn't 
>> writable.
> 
> Well the errors indicate the files can't even be read, despite that not being 
> the case (at least, it works fine within OpenAFS and with the sftp and http 
> gateways we have sitting on top... just not Samba.
> 
> OS X at least says:
> 
> "The document “Document.rtf” could not be opened."
> 
> ...and if I copy to the desktop, it says:
> 
> "The Finder can’t complete the operation because some data in “Document.rtf” 
> can’t be read or written.
> (Error code -36)"

You are looking in the wrong place for errors.   You need to be looking
at Samba.

Jeffrey Altman



<>

smime.p7s
Description: S/MIME Cryptographic Signature

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2474 matches

Mail list logo