[jira] [Commented] (HDFS-13166) [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly getLiveDatanodeStorageReport() calls

Rakesh R (JIRA) Fri, 23 Feb 2018 19:25:20 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-13166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375276#comment-16375276
 ]


Rakesh R commented on HDFS-13166:
---------------------------------

oops, I forgot to add new file. Attached another patch including it.

> [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly 
> getLiveDatanodeStorageReport() calls
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-13166
>                 URL: https://issues.apache.org/jira/browse/HDFS-13166
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>            Priority: Major
>         Attachments: HDFS-13166-HDFS-10285-00.patch, 
> HDFS-13166-HDFS-10285-01.patch
>
>
> Presently {{#getLiveDatanodeStorageReport()}} is fetched for every file and 
> does the computation. This Jira sub-task is to discuss and implement a cache 
> mechanism which in turn reduces the number of function calls. Also, could 
> define a configurable refresh interval and periodically refresh the DN cache 
> by fetching latest {{#getLiveDatanodeStorageReport}} on this interval.
>  Following comments taken from HDFS-10285, 
> [here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16347472]
>  Comment-7)
> {quote}Adding getDatanodeStorageReport is concerning. 
> getDatanodeListForReport is already a very bad method that should be avoided 
> for anything but jmx – even then it’s a concern. I eliminated calls to it 
> years ago. All it takes is a nscd/dns hiccup and you’re left holding the fsn 
> lock for an excessive length of time. Beyond that, the response is going to 
> be pretty large and tagging all the storage reports is not going to be cheap.
> verifyTargetDatanodeHasSpaceForScheduling does it really need the namesystem 
> lock? Can’t DatanodeDescriptor#chooseStorage4Block synchronize on its 
> storageMap?
> Appears to be calling getLiveDatanodeStorageReport for every file. As 
> mentioned earlier, this is NOT cheap. The SPS should be able to operate on a 
> fuzzy/cached state of the world. Then it gets another datanode report to 
> determine the number of live nodes to decide if it should sleep before 
> processing the next path. The number of nodes from the prior cached view of 
> the world should suffice.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13166) [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly getLiveDatanodeStorageReport() calls

Reply via email to