[
https://issues.apache.org/jira/browse/HDFS-13166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rakesh R updated HDFS-13166:
----------------------------
Fix Version/s: 3.2.0
> [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly
> getLiveDatanodeStorageReport() calls
> -----------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-13166
> URL: https://issues.apache.org/jira/browse/HDFS-13166
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Rakesh R
> Assignee: Rakesh R
> Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-13166-HDFS-10285-00.patch,
> HDFS-13166-HDFS-10285-01.patch, HDFS-13166-HDFS-10285-02.patch,
> HDFS-13166-HDFS-10285-03.patch
>
>
> Presently {{#getLiveDatanodeStorageReport()}} is fetched for every file and
> does the computation. This Jira sub-task is to discuss and implement a cache
> mechanism which in turn reduces the number of function calls. Also, could
> define a configurable refresh interval and periodically refresh the DN cache
> by fetching latest {{#getLiveDatanodeStorageReport}} on this interval.
> Following comments taken from HDFS-10285,
> [here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16347472]
> Comment-7)
> {quote}Adding getDatanodeStorageReport is concerning.
> getDatanodeListForReport is already a very bad method that should be avoided
> for anything but jmx – even then it’s a concern. I eliminated calls to it
> years ago. All it takes is a nscd/dns hiccup and you’re left holding the fsn
> lock for an excessive length of time. Beyond that, the response is going to
> be pretty large and tagging all the storage reports is not going to be cheap.
> verifyTargetDatanodeHasSpaceForScheduling does it really need the namesystem
> lock? Can’t DatanodeDescriptor#chooseStorage4Block synchronize on its
> storageMap?
> Appears to be calling getLiveDatanodeStorageReport for every file. As
> mentioned earlier, this is NOT cheap. The SPS should be able to operate on a
> fuzzy/cached state of the world. Then it gets another datanode report to
> determine the number of live nodes to decide if it should sleep before
> processing the next path. The number of nodes from the prior cached view of
> the world should suffice.
> {quote}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]