[ https://issues.apache.org/jira/browse/MAPREDUCE-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Scott Chen updated MAPREDUCE-1568: ---------------------------------- Summary: TrackerDistributedCacheManager should clean up cache in a background thread (was: TrackerDistributedCacheManager should do deleteLocalPath asynchronously) Description: Right now the TrackerDistributedCacheManager do the clean up with the following code path: {code} TaskRunner.run() -> TrackerDistributedCacheManager.setup() -> TrackerDistributedCacheManager.getLocalCache() -> TrackerDistributedCacheManager.deleteCache() {/code} The deletion of the cache files can take a long time and it should not be done by a task. We suggest that there should be a separate thread checking and clean up the cache files. was: TrackerDistributedCacheManager.deleteCache() has been improved: MAPREDUCE-1302 makes TrackerDistributedCacheManager rename the caches in the main thread and then delete them in the background MAPREDUCE-1098 avoids global locking while do the renaming (renaming lots of directories can also takes a long time) But the deleteLocalCache is still in the main thread of TaskRunner.run(). So it will still slow down the task which triggers the deletion (originally this will blocks all tasks, but it is fixed by MAPREDUCE-1098). Other tasks do not wait for the deletion. The task which triggers the deletion should not wait for this either. TrackerDistributedCacheManager should do deleteLocalPath() asynchronously. I have changed the title and description of this JIRA to fit our current idea. > TrackerDistributedCacheManager should clean up cache in a background thread > --------------------------------------------------------------------------- > > Key: MAPREDUCE-1568 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1568 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Affects Versions: 0.22.0 > Reporter: Scott Chen > Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1568-v2.txt, MAPREDUCE-1568.txt > > > Right now the TrackerDistributedCacheManager do the clean up with the > following code path: > {code} > TaskRunner.run() -> TrackerDistributedCacheManager.setup() -> > TrackerDistributedCacheManager.getLocalCache() -> > TrackerDistributedCacheManager.deleteCache() > {/code} > The deletion of the cache files can take a long time and it should not be > done by a task. We suggest that there should be a separate thread checking > and clean up the cache files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.