[AFS3-std] Hackathon Summary

Simon Wilkinson Fri, 09 Oct 2009 07:12:53 -0700

A group of us - Jeffery Altman, Matt Benjamin, Derrick Brashear,Alistair Ferguson, Christof Hanke, Tom Keiser, Hartmut Reuter, MarcusWatts, Rod Widdowson and myself met for 3 days in Edinburgh at the endof last month. We discussed a wide variety of AFS issues. A jabberchat log of our discussions is available at http://conference.openafs.org/[email protected]/(see the 2009-09-22 and 2009-09-24 files logs)

I'm not going to attempt to summarise our discussions in much detail.However, I have noted below the topics we discussed, and anyconclusions that I believe we reached. Where we identified the nextsteps to be taken, I've also noted this in the hope that we can keepthings moving forwards.


Extended Callbacks
------------------

Matt presented his extended callbacks draft. The discussion rangedbetween protocol, implementation and code management issues. On theprotocol front, it was felt that the draft was approaching consensus,but that concerns remained around the changes to the behaviour ofcallback breaks (in particular, whether an RPC can return before allcallbacks have been broken). Given the lack of consensus on thistopic, we agreed that the draft would drop mention of callbackcoalescing entirely. Other issues included the behaviour of clientswhich receive extendedcallbacks over untrusted channels, and the risk of deploying extendedcallbacks on servers which only have a small number configured. Mattwill produce a draft addressing these issues, and then we will attemptto move forwards with a consensus call.

We will attempt to address the asynchrony issue at a later date. Giventhat this change is arguably a modification to afs3 semantics, we'llattempt to engage a wider body of the community in this discussion

In discussing the implementation, we considered the range of xcbdependencies, in particular those on mcas and libosi. It was felt thatblocking extended callbacks on getting libosi into the tree wasundesirable - in particular, a desire was expressed for the xcb codeto use the existing pthread implementation, rather than pulling inosi's thread abstractions. We agreed that we would not enable, orexpose the callback coalescing code in the OpenAFS implementation,pending further discussion of this issue. We didn't resolve the issueof whether we should use MCAS's native atomics everywhere, or whetherwe should prefer atomic operations provided by the operating system.

Code management concerns were expressed on a number of occasionsthroughout the meeting. I'll summarise these in a section of their owntowards the end.


We agreed the following:

Matt will publish a new version of the draft which:
1) Removes any mention asynchronous behaviour for callback breaks

2) Extends the security considerations section to state that if theclient receives an XCB for metadata on an untrusted connection, itshould treat it as a normal callback break.3) Adds an implementation note on the risks for servers with a smallnumber of callbacks

Simon (in the absence of a chair for afs3-stds) will issue a call forconsensus on the updated

draft.

Matt will then:
1) Remove as many OSI dependencies from xcb implementation as possible

2) Remove asynchronous callback breaks from the visible implementation(off by default, no switch to enable)3) Push changes to gerrit (separate patches for Windows CM, Unix CM,and fileserver)


All will then take time to review these changes.

rx/osd
------

Hartmut and Christof presented their protocol documentation for rx/osd, available online at http://pfanne.rzg.mpg.de/trac/openAFS-OSD/wiki/Specs

A desire was expressed to not use an IP address to identify OSDservers, but use a UUID instead, and register OSDs in an extended vldbwhich also knows about ports. We decided that this could be fixed in alater protocol version, but that for now servers should be expressedas a union of IP address and UUID so we don't have to rev the RPClater on.

Where 'expires' is used in the structures it will become an absolutetime, and become a 64 bit value.

After a discussion of the consistency issues in the current mirroringimplementation (if some mirrors go offline, then you can end up withmultiple OSDs with different versions of the data), we decided thatmirroring would be out of scope for the initial rxosd integration.

UUID will be removed from getOSDLocation, StartAsyncFetch andStartASyncStore, as the filserver already knows the UUID (fromconnection establishment)


Times in general will become 64bits

Hartmut and Christof will publish a revised set of protocolspecifications addressing these, and will continue to split out thecode into chunks and submit them to gerrit.


RPCRefresh
----------

This was roughly split into topic headings as follows. Simon agreed toedit a document proposing these changes.


UUIDs

We had a general discussion of how UUIDs might fit into the AFSprotocol, beginning with an expressed desire to include client UUIDSin every call, to minimise the issues with using IP addresses tolocate clients. After discussion, this approach was rejected, becauseclient information needs to be available before an RPC has beendecoded. Instead, we proposed making our new security classes exchangeUUID information as part of the challenge/response connectionestablishment. Tom will specify and develop a new 'clear' class whichwill exchange UUIDs, and be a drop in replacement for the current nullclass.

We then discussed the issue where, through a race condition whenservers change IP addresses, an RPC may arrive at a server other thanthe one it is destined for and, in some rare situations, mutate dataincorrectly. It was felt that this was too rare an occurrence tojustify adding a server UUID to every data mutating RPC. Tom's newclear class could also be used to address this case.

Jeffrey discussed ways in which we could use UUIDs (as SIDs) in theptserver. We agreed that this was out of scope for this round, butthat it was an interesting topic. Jeffrey will write up a proposal.


64bit time

We agreed to change all of the time occurences in AFS RPCS (but notany on disk occurences) to be 64bit, with a granularity of 100ns,


RXOSD changes

We'd like to be able to specify a quota value for the number of filesin a volume (rxosd already implements this, but currently does so byusing a 'spare' field) - this is particularly relevant for sites whichare using tape storage. This changes VolIntInfo and those RPCS whichuse it in the volint family, and Fetch/StoreVolumeStatus. Christofwill provide a detailed list, and suitable language.

We will rename 'ResidencyMask' to 'DataAccessProtocol' in a revisedversion of FetchStatus.


In VolIntInfo we want to add afs_uint32 as 'osdPolicy'

Future proofing FIDs

We agreed to change volumeID, vnode and uniquifier to all be 64 bitvalues, with 0xffffffff and 0 being reserved.


Quotas / Block size

Fields which report quotas and volume block sizes should become 64bit, even if we can't use them all now. This affects Store/FetchVolumeStatus, VolIntInfo and VolIntXinfo


Last update time of volume

We'll add a field to FetchStatus to store the last update time of thevolume, so it can be used to optimise handling of read only volumes inthe cache manager. Alistair will arrange for this to be implemented


Per file ACLs

We'll define semantics for the new FetchACL and StoreACL commands whenthey are invoked on filesThe new FetchStatus will be defined as returning per file ACLinformation.


ACL Extensions

We'll extend ACLs to use 32bits of access data on the wire, andreserve all of the new 16 bits for our own use.


FetchStatus cleanups
*) InterfaceVersion will be removed
*) Length and Length_hi will be combined into a 64bit length
*) Dataversion and dataVersionHigh will be a single 64bit value.
*) User and Group ID will become 64bit

*) ParentVnode and ParentUnique will become 64bit (inline with the FIDchanges being made elsewhere)

*) SyncCounter will be removed

Tom: Propose new clear rx security class
Jeffrey: Write proposal for using SIDs within pts
Jeffrey: Provide langauge for 64bit time
Christof: Provide language for file quotas
Simon: Edit this into a manageable whole

Ali: Arrange for code to make use of the 'volume last update time'field to be written.


SRV records
-----------

Use them to replace AFSDB - standardise supporting for vlserver andptserver _afs3-vlserver._udp.<cellname>, SRV priority matches to rank,weight should be used as input to the server selection randomisationfunction.


Jeffrey will write a short I-D describing how AFS uses SRV records.

rxk5
----

Marcus presented his rxk5 document - /afs/umich.edu/group/itd/build/mdw/openafs/patches/rxk5-1.pdf

A lively discussion ensued. In particular, we discussed the initialpacket problem at great length. This is where a client sends a packetcontaining valuable data to a server which only wants to acceptencrypted connections. However, because the server tells the clientthat after it has received the first packet, the client may have justsent that data in cleartext. Jeffrey proposed a solution to thisproblem, which Marcus was unconvinced by. Further discussion isrequired.

We also debated the merits of using our locally developed k5ssl,against an externally maintained crypto library. Given that Heimdal'shcrypto is likely to be imported into the OpenAFS tree to supportother AFS uses of crypto, the opinion was expressed that rxk5 shouldprobably be built upon that.

A discussion of the problems of ubik's hard coded assumption thatthere will only ever be 3 security classes took place. It was agreedthat a new, dynamic, interface will be defined to handle this.

We discussed Marcus's new cache manager properties list, whichprovides a sysctl like mechanism for exchanging configuration stringsbetween cache manager and user space. The meeting was unable to reachconsensus on this design, and we agreed that it should be discussedfurther on list.


Other agreed changes were:

*) The authenticator will be extended to support more than 4 calls perconnection*) Space will be added for an application level binding (AFS wantsthis to assert the client UUID, but we want to make it generic)

*) rkx5 will be modfied to use the Kerberos PRF+, rather than MD5
*) The cellname length in the new tokens pioctl will be extended to 256

Matt & Marcus: Update draft to reflect changes, break code into chunksand submit to gerrit

Marcus: Raise sysctl-style properties interface on openafs-devel
Simon: Import hcrypto into OpenAFS tree (as part of the rxgk work)

Generic Quotas
--------------

Christof had raised the issue of providing a more generic quotamechanism, which allows moreflexible definitions of what quota might be (rxosd would like to beable to apply a separate quota to files under a certain size, forexample)

We discussed implementing this as a set of tag value pairs, with eachpair having a globally defined meaning. Individual tags need not beimplemented on every fileserver - there should be an RPC by whichclients can determine which tags a fileserver supports. We want toimplement this by revising existing RPCs which take quota values, anduse it to replace the quota values that those RPCs already contain.

Christof will write a document describing this, but we won't block theRPC refresh on it


Volume State
------------

Tom wants to be able to communicate to the client the type offileserver its talking to, and provide a 'raw' and a 'mapped'indicator of the volume status. (raw is implementation dependent,mapped uses globally defined error codes). We agreed that fileservertype could, for now, be expressed as a capability bit, and that thevolume state fields should be new parameters within VolIntInfo

Tom will write an I-D describing this. Again, we won't hold RPCrefresh up for these changes


rx/udp improvements
-------------------

Jeffrey discussed changes he is making to RTT calculations such thatthe algorithms better reflect Phil Karn's findings from 1987. Thisseemed uncontentious - Jeffrey will put a patch into gerrit.

Derrick discussed larger window support, which he will test, anddiscuss his findings further

Derrick discussed improvements he wants to make to RX negotation, byadding elements to the existing negotiation packet. In theory thisshould be backwards compatible, because existing clients use thepacket size to determine the version of the structure they arereceiving. We discussed the mechanism for progressing RXmodifications, as there isn't an obvious body to do it in. Theconclusion was to use afs3-stds, but make a deliberate attempt toreach out to those people who we know are using RX in otherapplications.


Derrick will write up an I-D describing this, and solicit feedback

rx/gk
-----

Simon presented his write-up of the current rx/gk protocol

Jeffrey expressed a desire for the first packet problem to beresolved, and for client uuids to be a part of the authenticator.

We had a long discussion of bytelife, and of key agility. Theconlusion that was reached was that each packet should have a securityheader containing a key number that was used to encrypt that packet.Providing this key number is input to the PRF used to derive thetransport key, keys can be revised at either the client, or server's,request. bytelife will remain advisory, however.

We decided that the ivec should be determined from the pseudo header,to solve the packet ordering problem.


We decided that the same pseudo header should be used as rxk5

Marcus pointed out that rxgk uses the first version of the rxkadauthenticator, and strongly suggested that CITI's recommendations thatlead to the rxkadv2 authenticator be studied and followed.

Marcus noted that using different numbers than rxkad for securitylevels will only cause implementation pain. We'll use the same as rxkad.

Simon will update the rxgk protocol document, and create a new onewhich describes its implementation within AFS.


miniosi
-------

Tom talked us through the current libosi, with a view to splitting itinto chunks that we can start to pull into the tree.

We agreed that we will only pull in changes that are going to be usedin the code, and import

libosi as follows:

Phase 0: A build framework
Phase 1: buildenv, compiler, types
Phase 2: platform/datamodel.h
Phase 3: Time

Each of these phases should update the rest of the tree to use the newfunctionality, so thatwe don't, for example, end up with two descriptions of each platformsdata model in the tree.

This is on the critical path for extended callbacks, so Tom and Mattwill seek to move this forwards, with Tom doing the work and Mattmaking sure that it gets done.


Tom: Submit these chunks of libosi, and tree integration patches
Matt: nag Tom

Directory RPCs
--------------
http://michigan-openafs-lists.central.org/archives/afs3-standardization/2009September/000423.html

We discussed Matt's proposal for an explicit directory listing RPC(posted to afs3-stds during the hackathon). We were unable to achieveconsensus on any of the issues this presents, beyond determining thatserver side sorting was unachievable.


DAFS
----

Tom would like a way of a server changing its advertised capabilities,without having to do an InitCallbackState3. Jeffrey proposed a newTellMeAboutYourself RPC which will take the capabilities as an INparameter.


Tom will go away and think about this, and write something up.

There was agreement that DAFS (including the changes to the vnode andvolume packages) needed to be properly documented. Tom and Ali willget this done.


PRDB extensions
---------------

Derrick presented the original Swedish Hackathon work:
http://web.archive.org/web/20060211111127/http://www.afsig.se/snipsnap/space/prdb+extensions

We agreed that we need a way of creating multiple names in a singleRPC, and provide a RenameAuthName RPC which takes a vector of triplesof (type, old_opaque, new_opaque)

Derrick will produce an I-D documenting the new RPCs, and animplementation.


Code management
---------------

We spent a lot of time discussing issues of code management, and howlarge changes can get into the OpenAFS source code. It's hard tosummarise what was a wide ranging and contradictory discussion, butthe following points were made and broadly agreed with*) Both clear protocol, and implementation documentation can hugelyhelp with code review*) Design and implementation discussions, during the developmentprocess, are hugely valuable*) For 'large' projects we will merge by creating a git branch ontowhich changes will be reviewed. We can the flip the switch by doing afinal merge commit, safe in the knowledge that the individual commitshave already been reviewed.




_______________________________________________
AFS3-standardization mailing list
[email protected]
http://michigan-openafs-lists.central.org/mailman/listinfo/afs3-standardization

[AFS3-std] Hackathon Summary

Reply via email to