[ https://issues.apache.org/jira/browse/HDFS-13166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rakesh R updated HDFS-13166: ---------------------------- Attachment: HDFS-13166-HDFS-10285-01.patch > [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly > getLiveDatanodeStorageReport() calls > ----------------------------------------------------------------------------------------------------------------- > > Key: HDFS-13166 > URL: https://issues.apache.org/jira/browse/HDFS-13166 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Rakesh R > Assignee: Rakesh R > Priority: Major > Attachments: HDFS-13166-HDFS-10285-00.patch, > HDFS-13166-HDFS-10285-01.patch > > > Presently {{#getLiveDatanodeStorageReport()}} is fetched for every file and > does the computation. This Jira sub-task is to discuss and implement a cache > mechanism which in turn reduces the number of function calls. Also, could > define a configurable refresh interval and periodically refresh the DN cache > by fetching latest {{#getLiveDatanodeStorageReport}} on this interval. > Following comments taken from HDFS-10285, > [here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16347472] > Comment-7) > {quote}Adding getDatanodeStorageReport is concerning. > getDatanodeListForReport is already a very bad method that should be avoided > for anything but jmx – even then it’s a concern. I eliminated calls to it > years ago. All it takes is a nscd/dns hiccup and you’re left holding the fsn > lock for an excessive length of time. Beyond that, the response is going to > be pretty large and tagging all the storage reports is not going to be cheap. > verifyTargetDatanodeHasSpaceForScheduling does it really need the namesystem > lock? Can’t DatanodeDescriptor#chooseStorage4Block synchronize on its > storageMap? > Appears to be calling getLiveDatanodeStorageReport for every file. As > mentioned earlier, this is NOT cheap. The SPS should be able to operate on a > fuzzy/cached state of the world. Then it gets another datanode report to > determine the number of live nodes to decide if it should sleep before > processing the next path. The number of nodes from the prior cached view of > the world should suffice. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org