[ 
https://issues.apache.org/jira/browse/HDFS-15588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sr2020 updated HDFS-15588:
--------------------------
    Attachment: HDFS-15588-001.patch
        Status: Patch Available  (was: Open)

> Arbitrarily low values for `dfs.block.access.token.lifetime` aren't safe and 
> can cause a healthy datanode to be excluded
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15588
>                 URL: https://issues.apache.org/jira/browse/HDFS-15588
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs, hdfs-client, security
>            Reporter: sr2020
>            Priority: Major
>         Attachments: HDFS-15588-001.patch
>
>
> *Description*:
> Setting `dfs.block.access.token.lifetime` to arbitrarily low values (like 1) 
> means the lifetime of a block token is very short, as a result some healthy 
> datanodes could be wrongly excluded by the client due to the 
> `InvalidBlockTokenException`.
> More specifically, in `nextBlockOutputStream`, the client tries to get the 
> `accessToken` from the namenode and use it to talk to datanode. And the 
> lifetime of `accessToken` could set to very small (like 1 min) by setting 
> `dfs.block.access.token.lifetime`. In some extreme conditions (like a VM 
> migration, temporary network issue, or a stop-the-world GC), the 
> `accessToken` could become expired when the client tries to use it to talk to 
> the datanode. If expired, `createBlockOutputStream` will return false (and 
> mask the `InvalidBlockTokenException`), so the client will think the datanode 
> is unhealthy, mark the it as "excluded" and will never read/write on it.
> *Proposed solution*:
> A simple retry on the same datanode after catching 
> `InvalidBlockTokenException` can solve this problem (assuming the extreme 
> conditions won't happen often). Since currently the 
> `dfs.block.access.token.lifetime` can even accept values like 0, we can also 
> choose to prevent the users from setting `dfs.block.access.token.lifetime` to 
> a small value (e.g., we can enforce a minimum value of 5mins for this 
> parameter).
> We submit a patch for retrying after catching `InvalidBlockTokenException` in 
> `nextBlockOutputStream`. We can also provide a patch for enforcing a larger 
> minimum value for `dfs.block.access.token.lifetime` if it is a better way to 
> handle this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to