[ 
https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033212#comment-16033212
 ] 

Rushabh S Shah commented on HDFS-11804:
---------------------------------------

Thanks [~xiaochen] for reviewing.
bq. Should we also treat AuthenticationException as non-retry?
This means user doesn't have access to key. Even if we retry its going to fails 
anyways unless some servers are misconfigured.
I am ok with any approach.

bq. Do we want retry policy's maxRetries also be configurable? Any reason 
hard-code that to 0?
IIUC we should increment the retry counter only if we encounter Retriable 
Exception. Since kms server doesn't throw Retribale exception yet, it didn't 
made sense to have retry count.

bq. generateEncryptedKey, reencryptEncryptedKey I think are idempotent.
Thanks.. Will incorporate the change in next patch.
{quote}
retry: m2c is we retry immediately to give each server a chance. If multiplier 
> 1 (say, numFailovers > providers.length), then we retry with delays. 
{quote}
I didn't understand this comment.
Can you elaborate the context ?



> KMS client needs retry logic
> ----------------------------
>
>                 Key: HDFS-11804
>                 URL: https://issues.apache.org/jira/browse/HDFS-11804
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Rushabh S Shah
>            Assignee: Rushabh S Shah
>         Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk.patch
>
>
> The kms client appears to have no retry logic – at all.  It's completely 
> decoupled from the ipc retry logic.  This has major impacts if the KMS is 
> unreachable for any reason, including but not limited to network connection 
> issues, timeouts, the +restart during an upgrade+.
> This has some major ramifications:
> # Jobs may fail to submit, although oozie resubmit logic should mask it
> # Non-oozie launchers may experience higher rates if they do not already have 
> retry logic.
> # Tasks reading EZ files will fail, probably be masked by framework reattempts
> # EZ file creation fails after creating a 0-length file – client receives 
> EDEK in the create response, then fails when decrypting the EDEK
> # Bulk hadoop fs copies, and maybe distcp, will prematurely fail



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to