[
https://issues.apache.org/jira/browse/HADOOP-17855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402547#comment-17402547
]
Mike Dias commented on HADOOP-17855:
------------------------------------
Hello [[email protected]], thank you for your comments, I really appreciated
it. I'm glad to hear that you are considering accept a contribution for this
feature. I will familiase myself with the codebase and allocate time to work on
it while we discuss what would be the appropriate solution.
Do you know if there are any precedent in the project that loads extra
configuration files? I'm asking because I think that having the path-to-key
mappings in a file would be the most adequate strategy because you can easily
have 1000 different keys/partitions x 100 tables and that would be hard to fit
in a core-site.xml.
To your points:
{quote}when we do things with directories, we often create markers in parent
dirs. This complicates life as we'd have to choose which to use there too
{quote}
My understanding is that markers contain only metadata. In my biased opinion,
users won't care much about the encryption settings on it.
{quote}S3A Delegation tokens pass down all encryption settings so that you can
submit work into a shared cluster where all encryption options including your
secrets come with the job. This will need to be extended.
and
this'd be left completely out of the delegation token info passed into the
cluster. Up to the cluster deployer to deal with this. The default encryption
settings would be passed in this way.
{quote}
Does S3A Delegation token control which encryption settings to use? It seems to
me it should be concerned only about the authentication to S3.
{quote}all the usual stuff related to hierarchical references, duplicate
conflicting entries et cetera et cetera.
{quote}
Yeah, I wonder if there are any simplifications we can make here.
{quote}would you support different SSE options (SSE-C vs SSE-KMS)? SSE-KMS is
the only sensible option, really.
{quote}
I'm more interested in SSE-KMS but I do see value in supporting SS3-C as well
for the same reasons. Users might want to use a tenant-generated key to encrypt
paths in a table partitioned by tenant for example.
{quote}you know that when you do rename() in S3A FS, because it's a copy,
things get re-encrypted with the latest settings
{quote}
Yeah, that is pretty much expected. In order to prevent mistakes from incorrect
configurations, users can define IAM policies that prevent objects to be
uploaded if they don't have the correct encryption settings, as explained
[here.|https://aws.amazon.com/premiumsupport/knowledge-center/s3-encrypt-specific-folder/]
{quote}we could have some plugin point which returned the encryption settings
for each path being written to, would be used when creating a request (i.e in
RequestFactoryImpl) to choose settings in PUT/initiate MPU, copy. There's some
complexity there related to TransferManager though... copy is going to be
trouble.
{quote}
Exactly, this plugin point can be the single point that resolve all the logic
regarding encryption configurations (per path, per bucket, global, etc) and
return the correct settings to be used.
{quote}It'd be (another) hadoop AbstractService created during initialize(),
but we'd make its serviceStart() operation async, so anything it does (load a
config file, bind to some service) wouldn't block normal initialization...the
config is only needed on the first write call
{quote}
Not super familiar with initialization details but I think it will be easier to
just load all you need and fail quickly if there is a config problem instead of
waiting for the first call to do so.
> S3A: Allow SSE configurations per object path
> ---------------------------------------------
>
> Key: HADOOP-17855
> URL: https://issues.apache.org/jira/browse/HADOOP-17855
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.3.1
> Reporter: Mike Dias
> Priority: Major
>
> Currently, we can map the SSE configurations at bucket level only:
> {code:java}
> <property>
> <name>fs.s3a.bucket.ireland-dev.server-side-encryption-algorithm</name>
> <value>SSE-KMS</value>
> </property>
> <property>
> <name>fs.s3a.bucket.ireland-dev.server-side-encryption.key</name>
>
> <value>arn:aws:kms:eu-west-1:98067faff834c:key/071a86ff-8881-4ba0-9230-95af6d01ca01</value>
> </property>
> {code}
> But sometimes we want to encrypt data in different paths with different keys
> within the same bucket. For example, a partitioned table might benefit from
> encrypting each partition with a different key when the partition represents
> a customer or a country.
> [S3 already can encrypt using different keys/configurations at the object
> level|https://aws.amazon.com/premiumsupport/knowledge-center/s3-encrypt-specific-folder/],
> so what we need to do on Hadoop is to provide a way to map which key to use.
> One idea could be mapping them in the XML config:
>
> {code:java}
> <property>
> <name>fs.s3a.server-side-encryption.paths</name>
>
> <value>s3://bucket/my_table/country=ireland,s3://bucket/my_table/country=uk,
> s3://bucket/my_table/country=germany</value>
> </property>
> <property>
> <name>fs.s3a.server-side-encryption.path-keys</name>
>
> <value>arn:aws:kms:eu-west-1:90ireland09:key/ireland-key,arn:aws:kms:eu-west-1:980uk0993c:key/uk-key,arn:aws:kms:eu-west-1:98germany089:key/germany-key</value>
> </property>
> {code}
> Or potentially fetch the mappings from the filesystem:
>
> {code:java}
> <property>
> <name>fs.s3a.server-side-encryption.mappings</name>
> <value>s3://bucket/configs/encryption_mappings.json</value>
> </property> {code}
> where encryption_mappings.json could be something like this:
>
> {code:java}
> {
> "path": "s3://bucket/customer_table/customerId=abc123",
> "algorithm": "SSE-KMS",
> "key": "arn:aws:kms:eu-west-1:933993746:key/abc123-key"
> }
> ...
> {
> "path": "s3://bucket/customer_table/customerId=xyx987",
> "algorithm": "SSE-KMS",
> "key": "arn:aws:kms:eu-west-1:933993746:key/xyx987-key"
> }
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]