Hi,

I would like to build support for file content encryption in the kAFS
filesystem driver in the Linux kernel - but this needs standardising so that
other AFS filesystems can make use of it also.

Note that by "content encryption", I mean that only the permitted clients have
a key to the content.  The server does not.  Further, filenames may also be
encrypted.

For the kAFS filesystem, content encryption would be provided by the netfs
library.  The intention is that netfslib will provide such service to any
filesystem that uses it (afs, 9p, cifs and, hopefully soon, ceph) using Linux
fscrypt where possible (but not mandatory).  netfslib would then store the
encrypted content in the local cache also and only decrypt it when it's
brought into memory.

Now, the way I would envision this working is:

 (1) Each file is divided into units of 4KiB, each of which is encrypted
     separately with its own block key.  The block key is derived from the
     file key and the block offset.

 (2) Unfortunately, AFS does not have anywhere to store additional information
     for a file, such as xattrs, but the last block must be rounded out to at
     least the crypto block size and maybe the unit size - and we need to
     stash the real file size somewhere.  There are a number of ways this
     could be dealt with:

     (a) Store this extra metadata in a separate file.  This has a potential
         integrity issue if we fail to update that due to EDQUOT/ENOSPC,
         network loss, etc.

     (b) Round up the data part of the file to 4KiB and tack on a trailer at
         the end of file that has the real EOF in it.  This the advantages
         that the trailer and the last block can be updated in a single
         StoreData RPC and that the real EOF can be encrypted, but the
         disadvantage that we can't return accurate info with stat() unless we
         can read (and decrypt) the trailer - and we have to do that in
         stat().

     (c) Stick a fixed-len trailer at the real EOF and just encrypt over part
         of that.  Again, this can be updated in a single StoreData RPC and
         the real EOF can be calculated by simple subtraction.  The trailer
         only need be one crypto block (say 16 bytes) in size, not the full
         4K.

     (d) Find a hole somewhere in the protocol and the on-server-disk metadata
         to store a number in the range 0-4095 that is backed up and
         transferred during a volume release.  I suspect this is infeasible.

     (e) Provide xattr support.  Probably also infeasible - though it might
         help with other things such as stacked filesystem support.

 (3) Mark a whole volume as being content-encrypted.  That is that content
     encryption is only available on a whole-volume basis unless we can find a
     way to mark individual vnodes as being encrypted - but this has the same
     issues as storing the real EOF length.

     This could be done in a number of ways:

     (a) A volume flag, passed to the client through the VLDB and the volume
         server.  The flag would need to be passed on to clone volumes and
         would need to be set at volume creation time or shortly thereafter.

         This might need a new RPC, say VOLSER.CreateEncryptedVolume, as
         VOLSER.CreateVolume doesn't seem to offer a way to indicate this, but
         maybe VOLSER.SetFlags would suffice: you turn it on and everything is
         suddenly encrypted.

     (b) Storing a magic file in the root directory of the volume
         (".afs_encrypted" say) that the client can look for.  This file could
         contain info about the algorithms used and the information about key
         needed to decrypt it.

 (4) Encrypt filenames in an encrypted directory.  Whilst we could just
     directly pass encrypted filenames in the protocol as the names are XDR
     strings with a length count, they can't be stored in the standard AFS
     directory format as they may include NUL and '/'.  I can see two
     possibilities here:

     (a) base64 encode the encrypted filenames (using a modified base64 to
         exclude '/').  This has two disadvantages: it reduces the maximum
         name length by 3/4 and makes all names longer, reducing the capacity
         of the directory.

     (b) Use the key to generate a series of numbers and then use each number
         to map a character of the filename, being careful to break the range
         around 0 and 47 so that we can map backwards.  This may result in
         less secure filename encryption than (a) and is trickier to do.

 (5) Derive file keys by combining a per-volume key with the vnode ID and the
     uniquifier.  Marking files with the 'name' of a specific key could be
     possible, but again this requires somewhere to store these as discussed
     in (2).

     Possibly 'file keys' could be skipped, deriving each block key from:

        RW vol ID || vnode ID || uniquifier || block pos

     The cell name cannot be included due to aliasing unless the canonical
     cell name can be queried.

 (6) Provide a conditional FS.StoreData RPC that takes a Data Version number
     as an additional parameter and fails if that doesn't match the current
     DV.  The issue is that even if just a byte is changed, an entire crypto
     unit must be written and truncation may also have to reencrypt the tail.

     (And by "fail", I'd prefer if it returned the updated stats rather than
     simply aborting - but I understand that we really want to close off the
     data transmission).

 (7) Though it's not strictly required for this, similar to (6), a conditional
     FS.FetchData could be useful as well for speculatively reading from a RO
     clone of a RW volume.

     Again, rather than failing with an abort, I'd prefer this to return no
     data and just the updated stats.  The client should then check the DV in
     the updated stats.

The simplest way to do this need not involve any changes on the server, though
having a conditional store would make it safer.

Thanks for your consideration,
David


Reply via email to