[ 
https://issues.apache.org/jira/browse/HADOOP-15006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309710#comment-16309710
 ] 

Steve Loughran commented on HADOOP-15006:
-----------------------------------------

As noted before, I really don't like how client-side encryption results in 
datasets shorter than the file length: this breaks so much. I'm currently not 
confident that you can use any client-side encrypted data as a source for 
operations.

# Somehow s3guard enabled buckets should be better at this (there's a header 
which you can get in a HEAD or GET which returns the real length, but it 
doesn't show in a LIST). If s3guard can check these values on a file open the 
shorter value can be cached.
# maybe the actual length of a file could be provided by an input stream (if 
the right API is there), and/or seek() could be expanded to explicitly support 
EOF-relative seeks, to which those standard bits of code which do this 
(hadoop's internal format, Parquet & ORC, presumably).

Before worrying about these, why not conduct some experiments? You could take 
S3A and modify it to always encrypt client side with the same key, then run as 
many integration tests as you can against it (Hive, Spark, impala, ...), *and 
see what fails*. I think that should be a first step to anything client-side 
related

> Encrypt S3A data client-side with Hadoop libraries & Hadoop KMS
> ---------------------------------------------------------------
>
>                 Key: HADOOP-15006
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15006
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3, kms
>            Reporter: Steve Moist
>            Priority: Minor
>         Attachments: S3-CSE Proposal.pdf
>
>
> This is for the proposal to introduce Client Side Encryption to S3 in such a 
> way that it can leverage HDFS transparent encryption, use the Hadoop KMS to 
> manage keys, use the `hdfs crypto` command line tools to manage encryption 
> zones in the cloud, and enable distcp to copy from HDFS to S3 (and 
> vice-versa) with data still encrypted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to