[jira] [Updated] (HDFS-13166) [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly getLiveDatanodeStorageReport() calls

Rakesh R (JIRA) Sun, 12 Aug 2018 22:40:53 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-13166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rakesh R updated HDFS-13166:
----------------------------
    Fix Version/s: 3.2.0

> [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly 
> getLiveDatanodeStorageReport() calls
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-13166
>                 URL: https://issues.apache.org/jira/browse/HDFS-13166
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>            Priority: Major
>             Fix For: HDFS-10285, 3.2.0
>
>         Attachments: HDFS-13166-HDFS-10285-00.patch, 
> HDFS-13166-HDFS-10285-01.patch, HDFS-13166-HDFS-10285-02.patch, 
> HDFS-13166-HDFS-10285-03.patch
>
>
> Presently {{#getLiveDatanodeStorageReport()}} is fetched for every file and 
> does the computation. This Jira sub-task is to discuss and implement a cache 
> mechanism which in turn reduces the number of function calls. Also, could 
> define a configurable refresh interval and periodically refresh the DN cache 
> by fetching latest {{#getLiveDatanodeStorageReport}} on this interval.
>  Following comments taken from HDFS-10285, 
> [here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16347472]
>  Comment-7)
> {quote}Adding getDatanodeStorageReport is concerning. 
> getDatanodeListForReport is already a very bad method that should be avoided 
> for anything but jmx – even then it’s a concern. I eliminated calls to it 
> years ago. All it takes is a nscd/dns hiccup and you’re left holding the fsn 
> lock for an excessive length of time. Beyond that, the response is going to 
> be pretty large and tagging all the storage reports is not going to be cheap.
> verifyTargetDatanodeHasSpaceForScheduling does it really need the namesystem 
> lock? Can’t DatanodeDescriptor#chooseStorage4Block synchronize on its 
> storageMap?
> Appears to be calling getLiveDatanodeStorageReport for every file. As 
> mentioned earlier, this is NOT cheap. The SPS should be able to operate on a 
> fuzzy/cached state of the world. Then it gets another datanode report to 
> determine the number of live nodes to decide if it should sleep before 
> processing the next path. The number of nodes from the prior cached view of 
> the world should suffice.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDFS-13166) [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly getLiveDatanodeStorageReport() calls

Reply via email to