[
https://issues.apache.org/jira/browse/HDFS-12574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284051#comment-16284051
]
Rushabh S Shah edited comment on HDFS-12574 at 12/8/17 8:39 PM:
----------------------------------------------------------------
Attaching a patch for jenkins to run and point out silly mistakes/checkstyles
issues.
{quote}
This is bad: CryptoProtocolVersion.values(). The values method always allocates
a new garbage array for every invocation. I forget where else I made a change
to have a static array assignment of the values and created a static valueOf to
return the item from the static array. I can't find it, looks like it might
have been undone... Note that protobufs actually do this.
{quote}
Addressed in v2 of patch.
{quote}
WebHdfsFileSystem#open contains a copy-n-paste of the same code in
DFSClient#createWrappedInputStream. CryptoInputStream can work with any general
stream so let's make a general wrapping method. Maybe create an interface
something like EncryptableInputStream for the getFileEncryptionInfo which
DFSInputStream and WebHdfsInputStream implements. Pass an encryptable stream
and it returns a wrapped stream if necessary.
{quote}
Addressed in latest patch.
But I am thinking of alternative.
Instead of creating {{EncryptableInputStream}} and {{EncryptableOutputStream}},
how about creating {{EncryptableStream}} and let {{WebHdfsFileSystem}} and
{{DistributedFileSystem}} implement it.
Just an idea. Let me know if you have pros and cons in that approach.
{quote}
I'm not thrilled with stream construction always calling file info but I
understand the stream is lazily opened which creates a chicken and egg problem
for determining whether to return a crypto stream.
{quote}
Exactly client connects to namenode when {{InputStream#read}} is being called.
But by then it is too late to determine.
{quote}
Double check that failing in the ReadRunner ctor doesn't cause any retry loop
issues or partial stream leakage. I'll scrutinize too.
{quote}
I added try catch block in {{WebHdfsFileSystem#open}} to close the stream in
case of any Exception.
Please let me know if I missed any case.
{quote}
I think using the cached file status at open in
ReadRunner#initializeInputStream subtly changes semantics.
{quote}
I retained the old behaviour.
{quote}
Why the change to MiniDFSCluster?
{quote}
Since {{NamenodeWebHdfsMethods#serverDefaultsResponse}} is static, so in{{
MiniDfsCluster#restartNamenode}} it caches the old value of key provider
address.
Also note that patch #002 is built on top of HDFS-12907.
Once that gets reviewed and resolved, I will create a new patch with one more
added test case.
was (Author: shahrs87):
Attaching a patch for jenkins to run and point out silly mistakes/checkstyles
issues.
{quote}
This is bad: CryptoProtocolVersion.values(). The values method always allocates
a new garbage array for every invocation. I forget where else I made a change
to have a static array assignment of the values and created a static valueOf to
return the item from the static array. I can't find it, looks like it might
have been undone... Note that protobufs actually do this.
{quote}
Addressed in v2 of patch.
{quote}
WebHdfsFileSystem#open contains a copy-n-paste of the same code in
DFSClient#createWrappedInputStream. CryptoInputStream can work with any general
stream so let's make a general wrapping method. Maybe create an interface
something like EncryptableInputStream for the getFileEncryptionInfo which
DFSInputStream and WebHdfsInputStream implements. Pass an encryptable stream
and it returns a wrapped stream if necessary.
{quote}
Addressed in latest patch.
But I am thinking of alternative.
Instead of creating {{EncryptableInputStream}} and {{EncryptableOutputStream}},
how about creating {{EncryptableStream}} and let {{WebHdfsFileSystem}} and
{{DistributedFileSystem}} implement it.
Just an idea. Let me know if you have pros and cons in that approach.
{quote}
I'm not thrilled with stream construction always calling file info but I
understand the stream is lazily opened which creates a chicken and egg problem
for determining whether to return a crypto stream.
{quote}
Exactly client connects to namenode when {{InputStream#read}} is being called.
But by then it is too late to determine.
{quote}
Double check that failing in the ReadRunner ctor doesn't cause any retry loop
issues or partial stream leakage. I'll scrutinize too.
{quote}
I added try catch block in {{WebHdfsFileSystem#open}} to close the stream in
case of any Exception.
Please let me know if I missed any case.
{quote}
I think using the cached file status at open in
ReadRunner#initializeInputStream subtly changes semantics.
{quote}
I retained the old behaviour.
{quote}
Why the change to MiniDFSCluster?
{quote}
Since {{NamenodeWebHdfsMethods#serverDefaultsResponse}} is static, so in{{
MiniDfsCluster#restartNamenode}} it caches the old value of key provider
address.
> Add CryptoInputStream to WebHdfsFileSystem read call.
> -----------------------------------------------------
>
> Key: HDFS-12574
> URL: https://issues.apache.org/jira/browse/HDFS-12574
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: encryption, kms, webhdfs
> Reporter: Rushabh S Shah
> Assignee: Rushabh S Shah
> Attachments: HDFS-12574.001.patch, HDFS-12574.002.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]