[
https://issues.apache.org/jira/browse/HADOOP-17966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dong0829 updated HADOOP-17966:
------------------------------
Description:
According to the document:
[https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/encryption.html#S3_Default_Encryption]
"Organizations may define a default key in the Amazon KMS; if a default key is
set, then it will be used whenever SSE-KMS encryption is chosen and the value
of fs.s3a.server-side-encryption.key is empty."
So basically two conditions to make the object with default KMS: 1. Set SSE-KMS
encryption 2. Did not set fs.s3a.server-side-encryption.key
But there is another confusing scenario below:
1. User want to rely on s3 bucket side encryption using their customer KMS
key(kms-keyA, for example), so user did not set
fs.s3a.server-side-encryption-algorithm or fs.s3a.server-side-encryption.key,
and the files uploaded to this bucket will use bucket custom KMS key kms-keyA
2. Next step, user want to copy the file to other file using s3a, the process
will invoke copyFile() in S3AFileSystem, during the copy, s3a will clone the
meta data of the source in cloneObjectMetadata(), in the clone, there is copy
of SSE algorithm but no specific kms key copy for the SSE-KMS, it will cause
the destination using SSE-KMS without any key id, the final file will use
account level default key under aws/s3(
[https://docs.aws.amazon.com/cli/latest/reference/s3api/put-object.html),]
lets say its kms-keyB.
It means when ever there is a copy, the kms key will be changed from customer
key kms-keyA to kms-keyB, which will cause inconsistency, for example:
hdfs dfs -put test s3://ssetest/
During this put, there will be rename processing from test.__COPYING__ to test,
it will cause the final test file encrypted with account default key kms-keyB
instead of s3 bucket customer key kms-keyA which is expected. Should we
consider to clone the KMS key id also to keep the consistency?
was:
According to the document:
[https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/encryption.html#S3_Default_Encryption]
"Organizations may define a default key in the Amazon KMS; if a default key is
set, then it will be used whenever SSE-KMS encryption is chosen and the value
of fs.s3a.server-side-encryption.key is empty."
So basically two conditions to make the object with default KMS: 1. Set
SSE-KMS encryption 2. Did not set fs.s3a.server-side-encryption.key
But there is another confusing scenario below:
1. User want to rely on s3 bucket side encryption using their customer KMS
key(kms-keyA, for example), so user did not set
fs.s3a.server-side-encryption-algorithm or fs.s3a.server-side-encryption.key,
and the files uploaded to this bucket will use bucket custom KMS key kms-keyA
2. Next step, user want to copy the file to other file using s3a, the process
will invoke copyFile() in S3AFileSystem, during the copy, s3a will clone the
meta data of the source in cloneObjectMetadata(), in the clone, there is copy
of SSE algorithm but no specific kms key copy for the SSE-KMS, it will cause
the destination using SSE-KMS without any key id, the final file will use
account level default key under aws/s3(
[https://docs.aws.amazon.com/cli/latest/reference/s3api/put-object.html),]
lets say its kms-keyB.
It means when ever there is a copy, the kms key will be changed from customer
key kms-keyA to kms-keyB, which will cause inconsistency, for example:
hdfs dfs -put test s3://ssetest/
During this put, there will be rename processing from test.__COPYING__ to test,
it will cause the final test file encrypted with account default key kms-keyB
instead of s3 bucket customer key kms-keyA which is expected. This issue will
also happen during some spark commit process.
Should we consider to clone the KMS key id also to keep the consistency?
> S3A SSE-KMS inconsistency issue during COPY
> -------------------------------------------
>
> Key: HADOOP-17966
> URL: https://issues.apache.org/jira/browse/HADOOP-17966
> Project: Hadoop Common
> Issue Type: Bug
> Components: common
> Affects Versions: 3.1.2
> Reporter: Dong0829
> Priority: Major
>
> According to the document:
>
> [https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/encryption.html#S3_Default_Encryption]
> "Organizations may define a default key in the Amazon KMS; if a default key
> is set, then it will be used whenever SSE-KMS encryption is chosen and the
> value of fs.s3a.server-side-encryption.key is empty."
> So basically two conditions to make the object with default KMS: 1. Set
> SSE-KMS encryption 2. Did not set fs.s3a.server-side-encryption.key
> But there is another confusing scenario below:
> 1. User want to rely on s3 bucket side encryption using their customer KMS
> key(kms-keyA, for example), so user did not set
> fs.s3a.server-side-encryption-algorithm or fs.s3a.server-side-encryption.key,
> and the files uploaded to this bucket will use bucket custom KMS key kms-keyA
> 2. Next step, user want to copy the file to other file using s3a, the
> process will invoke copyFile() in S3AFileSystem, during the copy, s3a will
> clone the meta data of the source in cloneObjectMetadata(), in the clone,
> there is copy of SSE algorithm but no specific kms key copy for the SSE-KMS,
> it will cause the destination using SSE-KMS without any key id, the final
> file will use account level default key under aws/s3(
> [https://docs.aws.amazon.com/cli/latest/reference/s3api/put-object.html),]
> lets say its kms-keyB.
> It means when ever there is a copy, the kms key will be changed from customer
> key kms-keyA to kms-keyB, which will cause inconsistency, for example:
> hdfs dfs -put test s3://ssetest/
> During this put, there will be rename processing from test.__COPYING__ to
> test, it will cause the final test file encrypted with account default key
> kms-keyB instead of s3 bucket customer key kms-keyA which is expected. Should
> we consider to clone the KMS key id also to keep the consistency?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]