[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces

Andrew Wang (JIRA) Tue, 20 May 2014 18:30:06 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004190#comment-14004190
 ]


Andrew Wang commented on HADOOP-10603:
--------------------------------------

Hi Yi, thanks for working on this code. I have some review comments based on a 
slightly old version of this patch. Throughout, if I've correctly identified a 
correctness issue, let's have a corresponding unit test if possible. I'd 
personally also rather not split the implementation and tests into separate 
JIRAs.

Nitty stuff:
* Need class javadoc and interface annotations on all new classes
* Need "<p/>" to actually line break in javadoc
* Some tab characters present

CryptoUtils
* s/mod/mode
* What does "calIV" mean? Javadoc here would be nice.
* calIV would be simpler if we used ByteBuffer.wrap and {{getLong}}. I think 
right now, we also need to cast each value to a {{long}} before shifting, else 
it only works up to an int. Would be good to unit test this function.

Encryptor/JCEAESCTREncryptor:
* Could you define the term "block" in the {{#encrypt}} javadoc?
* I don't understand the reinit conditions, do you mind explaining this a bit? 
The javadoc for {{Cipher#update}} indicates that it always fully reads the 
input buffer, so is the issue that the cipher sometimes doesn't flush all the 
input to the output buffer?
* If this API only accepts direct ByteBuffers, we should Precondition check 
that in the implementation
* Javadoc for {{encrypt}} should link to {{javax.crypto.ShortBufferException}}, 
not {{#ShortBufferException}}. I also don't see this being thrown because we 
wrap everything in an IOException.

CryptoOutputStream:
* How was the default buffer size of 8KB chosen? This should probably be a new 
configuration parameter, or respect io.file.buffer.size.
* Potential for int overflow in {{#write}} where we check {{off+len < 0}}. I 
also find this if statement hard to parse, would prefer if it were expanded.
* Is the {{16}} in updateEncryptor something that should be hard-coded? Maybe 
pull it out into a constant and javadoc why it's 16. I'm curious if this is 
dependent on the Encryptor implementation.
* We need to be careful with direct BBs, since they don't trigger GC. We should 
be freeing them manually when the stream is closed, or pooling them somehow for 
reuse.
* In {{#process}}, we flip the inBuf, then if there's no data we just return. 
Shouldn't we restore inBuf to its previous padded state first? Also, IIUC 
{{inBuffer.remaining()}} cannot be less than padding since the inBuffer 
position does not move backwards, so I'd prefer to see a Precondition check and 
{{inBuf.remaining() == padding)}}. Test case would be nice if I'm right about 
this.
* Rename {{#process}} to {{#encrypt}}?
* Do we need the special-case logic with tmpBuf? It looks like outBuffer is 
always direct.
* Do we need to update padding when we do a flush?
* Also in {{#flush}}, s/encryption/encrypt
* oneByte can be final
* If you have extra time, an ASCII art diagram showing how {{padding}} and the 
stream offset works would also be nice. Javadoc for the special padding 
handling would be nice.
* Can make class-private methods private
* Should {{close()}} also close the underlying stream?

CryptoInputStream:
* Many of the OutputStream comments apply here too: hardcoded 8KB buffer size, 
int overflow for {{off+len}}, hardcoded 16, handling of padding in {{process}}, 
freeing direct buffers, private methods, closing underlying stream, etc.
* Do we have tests for wrapping both ByteBufferReadable and not streams?
* Rename {{process()}} to {{decrypt()}}?
* In {{process}}, the catch/throw there doesn't seem useful, since everything 
that throws already throws an IOException.
* Positioned read and readFully, doing a seek in the finally will not work on a 
non-Seekable stream. There's also no need to catch {{ClassCastException}} since 
it's already handled in {{seek}}.
* {{readFully(long, byte[])}} should just delegate directly to the other 
readFully method, it doesn't need to do anything else.
* updateDecryptor doesn't seem to need the {{long offset}} parameter since it's 
always passed {{streamOffset}}.
* We need to return -1 on EOF for zero-byte reads, see HDFS-5762.
* Comment in {{skip}} about why we subtract then add {{outBuffer.remaining()}} 
would be good.
* Some empty {{throw new UnsupportedOperationException()}} could use text

Decryptor:
* s/if initialize fails/if initialization fails/

> Crypto input and output streams implementing Hadoop stream interfaces
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-10603
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10603
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: security
>    Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
>            Reporter: Alejandro Abdelnur
>            Assignee: Yi Liu
>             Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)
>
>         Attachments: HADOOP-10603.1.patch, HADOOP-10603.2.patch, 
> HADOOP-10603.3.patch, HADOOP-10603.4.patch, HADOOP-10603.5.patch, 
> HADOOP-10603.6.patch, HADOOP-10603.7.patch, HADOOP-10603.8.patch, 
> HADOOP-10603.patch
>
>
> A common set of Crypto Input/Output streams. They would be used by 
> CryptoFileSystem, HDFS encryption, MapReduce intermediate data and spills. 
> Note we cannot use the JDK Cipher Input/Output streams directly because we 
> need to support the additional interfaces that the Hadoop FileSystem streams 
> implement (Seekable, PositionedReadable, ByteBufferReadable, 
> HasFileDescriptor, CanSetDropBehind, CanSetReadahead, 
> HasEnhancedByteBufferAccess, Syncable, CanSetDropBehind).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces

Reply via email to