[ 
https://issues.apache.org/jira/browse/HDFS-12574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284051#comment-16284051
 ] 

Rushabh S Shah edited comment on HDFS-12574 at 12/8/17 8:39 PM:
----------------------------------------------------------------

Attaching a patch for jenkins to run and point out silly mistakes/checkstyles 
issues.

{quote}
This is bad: CryptoProtocolVersion.values(). The values method always allocates 
a new garbage array for every invocation. I forget where else I made a change 
to have a static array assignment of the values and created a static valueOf to 
return the item from the static array. I can't find it, looks like it might 
have been undone... Note that protobufs actually do this.
{quote}
Addressed in v2 of patch.

{quote}
WebHdfsFileSystem#open contains a copy-n-paste of the same code in 
DFSClient#createWrappedInputStream. CryptoInputStream can work with any general 
stream so let's make a general wrapping method. Maybe create an interface 
something like EncryptableInputStream for the getFileEncryptionInfo which 
DFSInputStream and WebHdfsInputStream implements. Pass an encryptable stream 
and it returns a wrapped stream if necessary.
{quote}
Addressed in latest patch.
But I am thinking of alternative.
Instead of creating {{EncryptableInputStream}} and {{EncryptableOutputStream}}, 
how about creating {{EncryptableStream}} and let {{WebHdfsFileSystem}} and 
{{DistributedFileSystem}} implement it.
Just an idea. Let me know if you have pros and cons in that approach.

{quote}
I'm not thrilled with stream construction always calling file info but I 
understand the stream is lazily opened which creates a chicken and egg problem 
for determining whether to return a crypto stream. 
{quote}
Exactly client connects to namenode when {{InputStream#read}} is being called. 
But by then it is too late to determine.

{quote}
Double check that failing in the ReadRunner ctor doesn't cause any retry loop 
issues or partial stream leakage. I'll scrutinize too.
{quote}
I added try catch block in {{WebHdfsFileSystem#open}} to close the stream in 
case of any Exception.
Please let me know if I missed any case.


{quote}
I think using the cached file status at open in 
ReadRunner#initializeInputStream subtly changes semantics. 
{quote}
I retained the old behaviour.

{quote}
Why the change to MiniDFSCluster?
{quote}
Since {{NamenodeWebHdfsMethods#serverDefaultsResponse}} is static, so in{{ 
MiniDfsCluster#restartNamenode}} it caches the old value of key provider 
address.

Also note that patch #002 is built on top of HDFS-12907.
Once that gets reviewed and resolved, I will create a new patch with one more 
added test case.


was (Author: shahrs87):
Attaching a patch for jenkins to run and point out silly mistakes/checkstyles 
issues.

{quote}
This is bad: CryptoProtocolVersion.values(). The values method always allocates 
a new garbage array for every invocation. I forget where else I made a change 
to have a static array assignment of the values and created a static valueOf to 
return the item from the static array. I can't find it, looks like it might 
have been undone... Note that protobufs actually do this.
{quote}
Addressed in v2 of patch.

{quote}
WebHdfsFileSystem#open contains a copy-n-paste of the same code in 
DFSClient#createWrappedInputStream. CryptoInputStream can work with any general 
stream so let's make a general wrapping method. Maybe create an interface 
something like EncryptableInputStream for the getFileEncryptionInfo which 
DFSInputStream and WebHdfsInputStream implements. Pass an encryptable stream 
and it returns a wrapped stream if necessary.
{quote}
Addressed in latest patch.
But I am thinking of alternative.
Instead of creating {{EncryptableInputStream}} and {{EncryptableOutputStream}}, 
how about creating {{EncryptableStream}} and let {{WebHdfsFileSystem}} and 
{{DistributedFileSystem}} implement it.
Just an idea. Let me know if you have pros and cons in that approach.

{quote}
I'm not thrilled with stream construction always calling file info but I 
understand the stream is lazily opened which creates a chicken and egg problem 
for determining whether to return a crypto stream. 
{quote}
Exactly client connects to namenode when {{InputStream#read}} is being called. 
But by then it is too late to determine.

{quote}
Double check that failing in the ReadRunner ctor doesn't cause any retry loop 
issues or partial stream leakage. I'll scrutinize too.
{quote}
I added try catch block in {{WebHdfsFileSystem#open}} to close the stream in 
case of any Exception.
Please let me know if I missed any case.


{quote}
I think using the cached file status at open in 
ReadRunner#initializeInputStream subtly changes semantics. 
{quote}
I retained the old behaviour.

{quote}
Why the change to MiniDFSCluster?
{quote}
Since {{NamenodeWebHdfsMethods#serverDefaultsResponse}} is static, so in{{ 
MiniDfsCluster#restartNamenode}} it caches the old value of key provider 
address.

> Add CryptoInputStream to WebHdfsFileSystem read call.
> -----------------------------------------------------
>
>                 Key: HDFS-12574
>                 URL: https://issues.apache.org/jira/browse/HDFS-12574
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: encryption, kms, webhdfs
>            Reporter: Rushabh S Shah
>            Assignee: Rushabh S Shah
>         Attachments: HDFS-12574.001.patch, HDFS-12574.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to