[ 
https://issues.apache.org/jira/browse/HADOOP-15006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317345#comment-16317345
 ] 

Steve Moist commented on HADOOP-15006:
--------------------------------------

{quote}
Do have any good links for further reading on the crypto algorithms, 
particularly the NoPadding variant you mention? (How do lengths and byte 
offsets map from the user data to the encrypted stream?)
{quote}
I've got a few links about general block ciphers and padding.  I'll post more 
as I find them later.
* http://web.cs.ucdavis.edu/~rogaway/papers/modes.pdf is a good(and lengthy) 
doc on encryption, look at page 5 for a summary and then page 45 for more on 
CTR.
* https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Padding 
Obligatory wikipedia page
* https://www.cryptosys.net/pki/manpki/pki_paddingschemes.html

{quote}
 (How do lengths and byte offsets map from the user data to the encrypted 
stream?)
{quote}
They should map 1-1.  AES works on a fixed block size of 16 bytes.  So you 
would read from 0->15 bytes, 16->31, etc.  It means you can't read 3->18 
directly, you'd have to read 0->31 to read 3->18.  Not sure exactly but I would 
imagine HDFS transparent encryption also has the same issue that has already 
been solved.  It would just mean we would have to get the before block as well 
to properly decrypt.  CTR allows for random encryption/decryption so I don't 
expect this to be a problem performing encryption/decryption .  It's just a 
minor technical point.  So far in my testing I haven't hit it, but I also 
haven't been directly invoking the MultiPartUpload.  This is the only issue I 
see when randomly reading/writing blocks and it's easily solvable.

{quote}
What are the actual atomicity requirements?
{quote}
This is a good question.  The main atomicity requirements that I have is that 
once the S3a stream is closed and the object committed that the OEMI is also 
committed.  I haven't fully worked around that from a specific code perspective 
yet.  
{quote}
Specifically, how do we handle multiple clients racing to create the same path?
{quote}
Using OEMI Storage Option #5: Suppose userA uploads objectA, with OEMIA to key1 
and userB uploads objectB with OEMIB to key1.  S3 is doesn't guarantee which 
one is the winner, so it is possible that objectA is stored with OEMIB.  This 
shouldn't happen if OEMI is stored as object metadata.  It could be done such 
that we create a "lock" on the DynamoDB row that userA owns that location 
preventing the upload of objectB.  In HDFS, once the INode is created, it 
should prevent userB from creating that file, perhaps we should do the same for 
S3?

{quote}
Also since the scope of the encryption zone is the bucket, we could get by with 
a very low provisioned I/O budget on the Dynamo table and save money, no?
{quote}
Yea we should be able to, I believe the only requirements is that the table can 
have things inserted and read.  IIRC, each bucket gets their own S3a jvm (or 
somethign to that effect) so at least at startup we can cache its EZ 
information.

> Encrypt S3A data client-side with Hadoop libraries & Hadoop KMS
> ---------------------------------------------------------------
>
>                 Key: HADOOP-15006
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15006
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3, kms
>            Reporter: Steve Moist
>            Priority: Minor
>         Attachments: S3-CSE Proposal.pdf
>
>
> This is for the proposal to introduce Client Side Encryption to S3 in such a 
> way that it can leverage HDFS transparent encryption, use the Hadoop KMS to 
> manage keys, use the `hdfs crypto` command line tools to manage encryption 
> zones in the cloud, and enable distcp to copy from HDFS to S3 (and 
> vice-versa) with data still encrypted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to