wjones127 commented on issue #5087:
URL: https://github.com/apache/arrow-rs/issues/5087#issuecomment-1942897487
In keeping with the goals of this crate, I thought I might see if there is a
general pattern across object store implementations for how encryption is
handled. Indeed there does seem to be a pattern.
In general, there are two modes:
1. Keys are managed by a KMS, and object store can directly reach out and
get keys from it. Here, they only need to be provided when writing. On read,
the object store can lookup which key it needs to decrypt with from metadata
and fetch as needed.
2. Keys are provided by the user, and must be provided on ever write **and
read** request. The object store does not keep the keys, just a digest of them.
For the most part, encryption information is provided as headers as part of
requests. The one exception to this is GCS uses query parameters, but only for
it's JSON API when in KMS mode. User-provided keys are always passed via
headers, and GCP's XML API always uses headers regardless of mode. 😵
For all three object stores, user-provided keys are always AES-256 keys.
They are always accompanied by a digest. AWS uses MD5, while Azure and GCP use
sha256.
## Potential Shared API
This makes me think there might be a general configuration API that could be
passed into builders.
```rust
// TODO: do not derive debug so encryption keys aren't accidentally printed.
enum EncryptionMode {
/// Use a key management store
KMS {
/// If not provided, will attempt to use a default key, if any.
key_id: Option<String>,
},
/// Provide a key with each request
Key {
/// The encryption key to use. Most services require this to be
key: Bytes,
/// The digest of the key. If this is not provided, this will be computed
/// from the key based on the object stores preferred algorithm, if
known.
/// This is MD5 for AWS S3 and sha256 for others.
digest: Option<Bytes>,
}
}
```
Or perhaps this should be more of a struct like
[ClientOptions](https://docs.rs/object_store/latest/object_store/struct.ClientOptions.html),
which is the best parallel I can think of to this situation.
I'll think about this a little more, and likely submit something just
implementing AWS for now. But I do think it would be nice to have an API that
would work well for other object stores that support server-side encryption.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]