[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces

Yi Liu (JIRA) Wed, 21 May 2014 01:53:04 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004498#comment-14004498
 ]


Yi Liu commented on HADOOP-10603:
---------------------------------

Andrew, thanks for your detailed review. Although they are based on a slightly 
old version, but most are also valid for latest patch :-).  
HADOOP-10617 is the common side test cases for crypto streams, since we already 
have lots of test cases and still need to increase and it's a bit large if 
merging to this, I made it as separate JIRA. 
Sure, according to your suggestion, I will merge it to this JIRA.

Following is my response to your comments, and I will update the patch later.

{quote}
Need class javadoc and interface annotations on all new classes
Need "<p/>" to actually line break in javadoc
Some tab characters present
{quote}
I will update them.

{quote}
s/mod/mode
What does "calIV" mean? Javadoc here would be nice.
calIV would be simpler if we used ByteBuffer.wrap and getLong. I think right 
now, we also need to cast each value to a long before shifting, else it only 
works up to an int. Would be good to unit test this function.
{quote}
Right, I will update them.

{quote}
Could you define the term "block" in the #encrypt javadoc?
{quote}
It was a wrong word and should be “buffer”.  Already updated it in latest patch.

{quote}
I don't understand the reinit conditions, do you mind explaining this a bit? 
The javadoc for Cipher#update indicates that it always fully reads the input 
buffer, so is the issue that the cipher sometimes doesn't flush all the input 
to the output buffer?
{quote}
Andrew, I agree with you. The javadoc for Cipher#update indicates that it 
always fully reads the input buffer and decrypt all input data.  This will be 
always correct for CTR mode, for some of other modes input data may be buffered 
if requested padding (CTR doesn’t need padding).  Charles has concern about 
maybe some custom JCE provider implementation can’t decrypt all data for CTR 
mode using {{Cipher#update}}, so I add the reinit conditions, and I think if 
that specific provider can’t decrypt all input data of {{Cipher#update}} for 
CTR mode, that should be a bug of that provider since it doesn't follow the 
definition of {{Cipher#update}}.  

{quote}
 If this API only accepts direct ByteBuffers, we should Precondition check that 
in the implementation
{quote}
I’m not sure we have this restriction. Java heap byteBuffer is also OK.  Direct 
ByteBuffer is more efficient (no copy) when the cipher provider is native code 
and using JNI. I will add if you prefer.

{quote}
 Javadoc for {{encrypt}} should link to {{javax.crypto.ShortBufferException}}, 
not {{#ShortBufferException}}. I also don't see this being thrown because we 
wrap everything in an IOException.
{quote}
Right, I will revise this.

{quote}
How was the default buffer size of 8KB chosen? This should probably be a new 
configuration parameter, or respect io.file.buffer.size.
{quote}
OK. I will add configuration parameter for the default buffer size.

{quote}
Potential for int overflow in {{#write}} where we check {{off+len < 0}}. I also 
find this if statement hard to parse, would prefer if it were expanded.
{quote}
OK. I will expand them in next patch.

{quote}
Is the {{16}} in {{updateEncryptor}} something that should be hard-coded? Maybe 
pull it out into a constant and javadoc why it's 16. I'm curious if this is 
dependent on the Encryptor implementation.
{quote}
Let’s pull it out into variable.  16bytes is 128bits, and it’s in definition of 
AES: http://en.wikipedia.org/wiki/Advanced_Encryption_Standard. Let’s define it 
as a configuration parameter, since other algorithm may have different block 
size, although we use AES.

{quote}
We need to be careful with direct BBs, since they don't trigger GC. We should 
be freeing them manually when the stream is closed, or pooling them somehow for 
reuse.
{quote}
Good point.  For pooling them, maybe they are created with different buffer 
size and not suitable in pool? So I will add freeing them manually when the 
stream is closed.

{quote}
•  In {{#process}}, we flip the inBuf, then if there's no data we just return. 
Shouldn't we restore inBuf to its previous padded state first? Also, IIUC 
{{inBuffer.remaining()}} cannot be less than padding since the inBuffer 
position does not move backwards, so I'd prefer to see a Precondition check and 
{{inBuf.remaining() == padding)}}. Test case would be nice if I'm right about 
this.
{quote}

You are right, there is a potential issue. I will fix it and add test case.  
Since in our code, only when we have input data then we go to {{#process}},  so 
{{inBuffer}} should have real data. But from view of code logic we should 
handle like you said. And agree we have a precondition check.

{quote}
Rename {{#process}} to {{#encrypt}}?
{quote}
Good, let’s do that.

{quote}
Do we need the special-case logic with tmpBuf? It looks like outBuffer is 
always direct.
{quote}
Right, let’s remove the unnecessary logic.

{quote}
Do we need to update padding when we do a flush?
{quote}
We don’t need to update padding when doing {{flush}}.  In {{flush}}, we will 
encrypt the remaining data, we don’t need to recalculate the padding.

{quote}
 Also in {{#flush}}, s/encryption/encrypt 
{{oneByte}} can be final
{quote}
Right, I will update it.

{quote}
If you have extra time, an ASCII art diagram showing how padding and the stream 
offset works would also be nice. Javadoc for the special padding handling would 
be nice.
{quote}
Good point, I will add a diagram.

{quote}
Can make class-private methods private
{quote}
OK. I will update that.

{quote}
Should {{close()}} also close the underlying stream?
{quote}
Actually in {{#close}} we have called {{super.close}}, it will close the 
underlying stream.

{quote}
Do we have tests for wrapping both ByteBufferReadable and not streams?
{quote}
Please refer to the latest patch, we refine {{ByteBufferReadable}} and support 
{code}public boolean seekToNewSource(long targetPos) throws IOException{code}
{code}public ByteBuffer read(ByteBufferPool bufferPool, int maxLength, 
EnumSet<ReadOption> opts) throws IOException, 
UnsupportedOperationException{code}
{code}public void releaseBuffer(ByteBuffer buffer){code}
And we also have test cases to cover them.

{quote}
Rename {{process()}} to {{decrypt()}}?
{quote}
Right, I will do that.

{quote}
In process, the catch/throw there doesn't seem useful, since everything that 
throws already throws an IOException.
{quote}
I will refine this.

{quote}
Positioned read and readFully, doing a seek in the finally will not work on a 
non-Seekable stream. There's also no need to catch {{ClassCastException}} since 
it's already handled in {{seek}}.
{quote}
OK, I will handle it using Positioned read and readFully of underlying stream. 
The position of using {{seek}} mades logic more simple, since will should not 
change the underlying stream position when using Positioned read and readFully. 
Furthermore, I saw in {{FSInputStream}} has similar logic.
Anyway, I will refine this not using {{seek}}.

{quote}
{{readFully(long, byte[])}} should just delegate directly to the other 
readFully method, it doesn't need to do anything else.
{quote}
You are right, the exception have been checked in delegated method.

{quote}
{{updateDecryptor}} doesn't seem to need the {{long offset}} parameter since 
it's always passed {{streamOffset}}.
{quote}
Right, I will update it.

{quote}
We need to return -1 on EOF for zero-byte reads, see .
{quote}
Great, I will update it.

{quote}
Comment in skip about why we subtract then add outBuffer.remaining() would be 
good.
{quote}
OK, I will add it.

{quote}
Some empty throw new UnsupportedOperationException() could use text
{quote}
OK

{quote}
s/if initialize fails/if initialization fails/
{quote}
OK

> Crypto input and output streams implementing Hadoop stream interfaces
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-10603
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10603
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: security
>    Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
>            Reporter: Alejandro Abdelnur
>            Assignee: Yi Liu
>             Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)
>
>         Attachments: HADOOP-10603.1.patch, HADOOP-10603.2.patch, 
> HADOOP-10603.3.patch, HADOOP-10603.4.patch, HADOOP-10603.5.patch, 
> HADOOP-10603.6.patch, HADOOP-10603.7.patch, HADOOP-10603.8.patch, 
> HADOOP-10603.patch
>
>
> A common set of Crypto Input/Output streams. They would be used by 
> CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. 
> Note we cannot use the JDK Cipher Input/Output streams directly because we 
> need to support the additional interfaces that the Hadoop FileSystem streams 
> implement (Seekable, PositionedReadable, ByteBufferReadable, 
> HasFileDescriptor, CanSetDropBehind, CanSetReadahead, 
> HasEnhancedByteBufferAccess, Syncable, CanSetDropBehind).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces

Reply via email to