[ 
https://issues.apache.org/jira/browse/HDFS-9405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184370#comment-15184370
 ] 

Xiao Chen commented on HDFS-9405:
---------------------------------

Thanks all for the discussions and thoughts here. I'd like to work on this.

As I understand, there seems to be 2 problems:
- On NN startup/failover, the first call will trigger the {{LoadingCache}} to 
fill up, which happens synchronously.
We may solve this by having a background thread to actively warm up the cache.

- If KMS or the backing key provider is down, all RPCs to create will hang and 
timeout in {{FSNamesystem#startFile}} (if cache is empty).
This is arguably a bug. IMHO this should be identified at the service level, 
instead of depending on the client RPC to find it.
But if we don't like the hang in the RPC, perhaps in addition to the above 
background warm up, we could also update the {{ValueQueue}} to not do a get, 
but a 
[getIfPresent|http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/cache/Cache.html#getIfPresent(java.lang.Object)]
 instead, and throw {{RetryStartFileException}} directly if nothing cached, 
under the assumption that otherwise the cache should have been filled up?

Is my understanding correct?

Will work hard on making the logs/metrics helpful as well.

> When starting a file, NameNode should generate EDEK in a separate thread
> ------------------------------------------------------------------------
>
>                 Key: HDFS-9405
>                 URL: https://issues.apache.org/jira/browse/HDFS-9405
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: encryption, namenode
>    Affects Versions: 2.7.1
>            Reporter: Zhe Zhang
>
> {{generateEncryptedDataEncryptionKey}} involves a non-trivial I/O operation 
> to the key provider, which could be slow or cause timeout. It should be done 
> as a separate thread so as to return a proper error message to the RPC caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to