A group of us - Jeffery Altman, Matt Benjamin, Derrick Brashear, Alistair Ferguson, Christof Hanke, Tom Keiser, Hartmut Reuter, Marcus Watts, Rod Widdowson and myself met for 3 days in Edinburgh at the end of last month. We discussed a wide variety of AFS issues. A jabber chat log of our discussions is available at http://conference.openafs.org/[email protected]/ (see the 2009-09-22 and 2009-09-24 files logs)

I'm not going to attempt to summarise our discussions in much detail. However, I have noted below the topics we discussed, and any conclusions that I believe we reached. Where we identified the next steps to be taken, I've also noted this in the hope that we can keep things moving forwards.

Extended Callbacks
------------------

Matt presented his extended callbacks draft. The discussion ranged between protocol, implementation and code management issues. On the protocol front, it was felt that the draft was approaching consensus, but that concerns remained around the changes to the behaviour of callback breaks (in particular, whether an RPC can return before all callbacks have been broken). Given the lack of consensus on this topic, we agreed that the draft would drop mention of callback coalescing entirely. Other issues included the behaviour of clients which receive extended callbacks over untrusted channels, and the risk of deploying extended callbacks on servers which only have a small number configured. Matt will produce a draft addressing these issues, and then we will attempt to move forwards with a consensus call.

We will attempt to address the asynchrony issue at a later date. Given that this change is arguably a modification to afs3 semantics, we'll attempt to engage a wider body of the community in this discussion

In discussing the implementation, we considered the range of xcb dependencies, in particular those on mcas and libosi. It was felt that blocking extended callbacks on getting libosi into the tree was undesirable - in particular, a desire was expressed for the xcb code to use the existing pthread implementation, rather than pulling in osi's thread abstractions. We agreed that we would not enable, or expose the callback coalescing code in the OpenAFS implementation, pending further discussion of this issue. We didn't resolve the issue of whether we should use MCAS's native atomics everywhere, or whether we should prefer atomic operations provided by the operating system.

Code management concerns were expressed on a number of occasions throughout the meeting. I'll summarise these in a section of their own towards the end.

We agreed the following:

Matt will publish a new version of the draft which:
1) Removes any mention asynchronous behaviour for callback breaks
2) Extends the security considerations section to state that if the client receives an XCB for metadata on an untrusted connection, it should treat it as a normal callback break. 3) Adds an implementation note on the risks for servers with a small number of callbacks

Simon (in the absence of a chair for afs3-stds) will issue a call for consensus on the updated
draft.

Matt will then:
1) Remove as many OSI dependencies from xcb implementation as possible
2) Remove asynchronous callback breaks from the visible implementation (off by default, no switch to enable) 3) Push changes to gerrit (separate patches for Windows CM, Unix CM, and fileserver)

All will then take time to review these changes.

rx/osd
------

Hartmut and Christof presented their protocol documentation for rx/ osd, available online at http://pfanne.rzg.mpg.de/trac/openAFS-OSD/wiki/Specs

A desire was expressed to not use an IP address to identify OSD servers, but use a UUID instead, and register OSDs in an extended vldb which also knows about ports. We decided that this could be fixed in a later protocol version, but that for now servers should be expressed as a union of IP address and UUID so we don't have to rev the RPC later on.

Where 'expires' is used in the structures it will become an absolute time, and become a 64 bit value.

After a discussion of the consistency issues in the current mirroring implementation (if some mirrors go offline, then you can end up with multiple OSDs with different versions of the data), we decided that mirroring would be out of scope for the initial rxosd integration.

UUID will be removed from getOSDLocation, StartAsyncFetch and StartASyncStore, as the filserver already knows the UUID (from connection establishment)

Times in general will become 64bits

Hartmut and Christof will publish a revised set of protocol specifications addressing these, and will continue to split out the code into chunks and submit them to gerrit.

RPCRefresh
----------

This was roughly split into topic headings as follows. Simon agreed to edit a document proposing these changes.

UUIDs
We had a general discussion of how UUIDs might fit into the AFS protocol, beginning with an expressed desire to include client UUIDS in every call, to minimise the issues with using IP addresses to locate clients. After discussion, this approach was rejected, because client information needs to be available before an RPC has been decoded. Instead, we proposed making our new security classes exchange UUID information as part of the challenge/response connection establishment. Tom will specify and develop a new 'clear' class which will exchange UUIDs, and be a drop in replacement for the current null class.

We then discussed the issue where, through a race condition when servers change IP addresses, an RPC may arrive at a server other than the one it is destined for and, in some rare situations, mutate data incorrectly. It was felt that this was too rare an occurrence to justify adding a server UUID to every data mutating RPC. Tom's new clear class could also be used to address this case.

Jeffrey discussed ways in which we could use UUIDs (as SIDs) in the ptserver. We agreed that this was out of scope for this round, but that it was an interesting topic. Jeffrey will write up a proposal.

64bit time
We agreed to change all of the time occurences in AFS RPCS (but not any on disk occurences) to be 64bit, with a granularity of 100ns,

RXOSD changes
We'd like to be able to specify a quota value for the number of files in a volume (rxosd already implements this, but currently does so by using a 'spare' field) - this is particularly relevant for sites which are using tape storage. This changes VolIntInfo and those RPCS which use it in the volint family, and Fetch/StoreVolumeStatus. Christof will provide a detailed list, and suitable language.

We will rename 'ResidencyMask' to 'DataAccessProtocol' in a revised version of FetchStatus.

In VolIntInfo we want to add afs_uint32 as 'osdPolicy'

Future proofing FIDs
We agreed to change volumeID, vnode and uniquifier to all be 64 bit values, with 0xffffffff and 0 being reserved.

Quotas / Block size
Fields which report quotas and volume block sizes should become 64 bit, even if we can't use them all now. This affects Store/ FetchVolumeStatus, VolIntInfo and VolIntXinfo

Last update time of volume
We'll add a field to FetchStatus to store the last update time of the volume, so it can be used to optimise handling of read only volumes in the cache manager. Alistair will arrange for this to be implemented

Per file ACLs
We'll define semantics for the new FetchACL and StoreACL commands when they are invoked on files The new FetchStatus will be defined as returning per file ACL information.

ACL Extensions
We'll extend ACLs to use 32bits of access data on the wire, and reserve all of the new 16 bits for our own use.

FetchStatus cleanups
*) InterfaceVersion will be removed
*) Length and Length_hi will be combined into a 64bit length
*) Dataversion and dataVersionHigh will be a single 64bit value.
*) User and Group ID will become 64bit
*) ParentVnode and ParentUnique will become 64bit (inline with the FID changes being made elsewhere)
*) SyncCounter will be removed

Tom: Propose new clear rx security class
Jeffrey: Write proposal for using SIDs within pts
Jeffrey: Provide langauge for 64bit time
Christof: Provide language for file quotas
Simon: Edit this into a manageable whole
Ali: Arrange for code to make use of the 'volume last update time' field to be written.

SRV records
-----------

Use them to replace AFSDB - standardise supporting for vlserver and ptserver _afs3-vlserver._udp.<cellname>, SRV priority matches to rank, weight should be used as input to the server selection randomisation function.

Jeffrey will write a short I-D describing how AFS uses SRV records.

rxk5
----

Marcus presented his rxk5 document - /afs/umich.edu/group/itd/build/ mdw/openafs/patches/rxk5-1.pdf

A lively discussion ensued. In particular, we discussed the initial packet problem at great length. This is where a client sends a packet containing valuable data to a server which only wants to accept encrypted connections. However, because the server tells the client that after it has received the first packet, the client may have just sent that data in cleartext. Jeffrey proposed a solution to this problem, which Marcus was unconvinced by. Further discussion is required.

We also debated the merits of using our locally developed k5ssl, against an externally maintained crypto library. Given that Heimdal's hcrypto is likely to be imported into the OpenAFS tree to support other AFS uses of crypto, the opinion was expressed that rxk5 should probably be built upon that.

A discussion of the problems of ubik's hard coded assumption that there will only ever be 3 security classes took place. It was agreed that a new, dynamic, interface will be defined to handle this.

We discussed Marcus's new cache manager properties list, which provides a sysctl like mechanism for exchanging configuration strings between cache manager and user space. The meeting was unable to reach consensus on this design, and we agreed that it should be discussed further on list.

Other agreed changes were:
*) The authenticator will be extended to support more than 4 calls per connection *) Space will be added for an application level binding (AFS wants this to assert the client UUID, but we want to make it generic)
*) rkx5 will be modfied to use the Kerberos PRF+, rather than MD5
*) The cellname length in the new tokens pioctl will be extended to 256

Matt & Marcus: Update draft to reflect changes, break code into chunks and submit to gerrit
Marcus: Raise sysctl-style properties interface on openafs-devel
Simon: Import hcrypto into OpenAFS tree (as part of the rxgk work)

Generic Quotas
--------------

Christof had raised the issue of providing a more generic quota mechanism, which allows more flexible definitions of what quota might be (rxosd would like to be able to apply a separate quota to files under a certain size, for example)

We discussed implementing this as a set of tag value pairs, with each pair having a globally defined meaning. Individual tags need not be implemented on every fileserver - there should be an RPC by which clients can determine which tags a fileserver supports. We want to implement this by revising existing RPCs which take quota values, and use it to replace the quota values that those RPCs already contain.

Christof will write a document describing this, but we won't block the RPC refresh on it

Volume State
------------

Tom wants to be able to communicate to the client the type of fileserver its talking to, and provide a 'raw' and a 'mapped' indicator of the volume status. (raw is implementation dependent, mapped uses globally defined error codes). We agreed that fileserver type could, for now, be expressed as a capability bit, and that the volume state fields should be new parameters within VolIntInfo

Tom will write an I-D describing this. Again, we won't hold RPC refresh up for these changes

rx/udp improvements
-------------------

Jeffrey discussed changes he is making to RTT calculations such that the algorithms better reflect Phil Karn's findings from 1987. This seemed uncontentious - Jeffrey will put a patch into gerrit.

Derrick discussed larger window support, which he will test, and discuss his findings further

Derrick discussed improvements he wants to make to RX negotation, by adding elements to the existing negotiation packet. In theory this should be backwards compatible, because existing clients use the packet size to determine the version of the structure they are receiving. We discussed the mechanism for progressing RX modifications, as there isn't an obvious body to do it in. The conclusion was to use afs3-stds, but make a deliberate attempt to reach out to those people who we know are using RX in other applications.

Derrick will write up an I-D describing this, and solicit feedback

rx/gk
-----

Simon presented his write-up of the current rx/gk protocol

Jeffrey expressed a desire for the first packet problem to be resolved, and for client uuids to be a part of the authenticator.

We had a long discussion of bytelife, and of key agility. The conlusion that was reached was that each packet should have a security header containing a key number that was used to encrypt that packet. Providing this key number is input to the PRF used to derive the transport key, keys can be revised at either the client, or server's, request. bytelife will remain advisory, however.

We decided that the ivec should be determined from the pseudo header, to solve the packet ordering problem.

We decided that the same pseudo header should be used as rxk5

Marcus pointed out that rxgk uses the first version of the rxkad authenticator, and strongly suggested that CITI's recommendations that lead to the rxkadv2 authenticator be studied and followed.

Marcus noted that using different numbers than rxkad for security levels will only cause implementation pain. We'll use the same as rxkad.

Simon will update the rxgk protocol document, and create a new one which describes its implementation within AFS.

miniosi
-------

Tom talked us through the current libosi, with a view to splitting it into chunks that we can start to pull into the tree.

We agreed that we will only pull in changes that are going to be used in the code, and import
libosi as follows:

Phase 0: A build framework
Phase 1: buildenv, compiler, types
Phase 2: platform/datamodel.h
Phase 3: Time

Each of these phases should update the rest of the tree to use the new functionality, so that we don't, for example, end up with two descriptions of each platforms data model in the tree.

This is on the critical path for extended callbacks, so Tom and Matt will seek to move this forwards, with Tom doing the work and Matt making sure that it gets done.

Tom: Submit these chunks of libosi, and tree integration patches
Matt: nag Tom

Directory RPCs
--------------
http://michigan-openafs-lists.central.org/archives/afs3-standardization/2009September/000423.html

We discussed Matt's proposal for an explicit directory listing RPC (posted to afs3-stds during the hackathon). We were unable to achieve consensus on any of the issues this presents, beyond determining that server side sorting was unachievable.

DAFS
----

Tom would like a way of a server changing its advertised capabilities, without having to do an InitCallbackState3. Jeffrey proposed a new TellMeAboutYourself RPC which will take the capabilities as an IN parameter.

Tom will go away and think about this, and write something up.

There was agreement that DAFS (including the changes to the vnode and volume packages) needed to be properly documented. Tom and Ali will get this done.

PRDB extensions
---------------

Derrick presented the original Swedish Hackathon work:
http://web.archive.org/web/20060211111127/http://www.afsig.se/snipsnap/space/prdb+extensions
We agreed that we need a way of creating multiple names in a single RPC, and provide a RenameAuthName RPC which takes a vector of triples of (type, old_opaque, new_opaque)

Derrick will produce an I-D documenting the new RPCs, and an implementation.

Code management
---------------

We spent a lot of time discussing issues of code management, and how large changes can get into the OpenAFS source code. It's hard to summarise what was a wide ranging and contradictory discussion, but the following points were made and broadly agreed with *) Both clear protocol, and implementation documentation can hugely help with code review *) Design and implementation discussions, during the development process, are hugely valuable *) For 'large' projects we will merge by creating a git branch onto which changes will be reviewed. We can the flip the switch by doing a final merge commit, safe in the knowledge that the individual commits have already been reviewed.



_______________________________________________
AFS3-standardization mailing list
[email protected]
http://michigan-openafs-lists.central.org/mailman/listinfo/afs3-standardization

Reply via email to