[
https://issues.apache.org/jira/browse/MAPREDUCE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202496#comment-13202496
]
Robert Joseph Evans commented on MAPREDUCE-3824:
------------------------------------------------
I like the concept of the patch. Volatile is definitely needed here, my bad on
that one. I also like that you are doing a DU to update the size of the cached
objects if they are 0. I do have some issues with the patch though.
The first is that even though the DU size update is being done on a separate
thread it is being done with the cachedArchives lock held. The amount of time
it takes to do a DU could be significant. Nothing new can be added to the
cache while the cachedArchives lock is held, so it could be blocking other new
tasks from making progress. I would really prefer to see this done in two
passes, similar to how we delete out entries. The first pass would go through
all entries and identify those that need to be updated, the second pass would
be to update those entries without the lock held. Then once we have all of the
entries updated we can look at cleaning up the distributed cache.
The second is that we are updating the size too late. We decide how much space
needs to be deleted to get us back under the desired amount based totally on
the size reported by BaseDirManager, which in turn gets its data from the
CacheStatus object. The issue is that in the current patch we first calculate
how much needs to be removed, then we update the size of the archives, then we
delete them. This is not that critical, because it just means that in the next
pass they would be deleted, so this is really very minor, but should be covered
by doing the update in two passes.
I am not sure exactly what are the situations that the size is not being set.
I would like to know exactly which situations the current code is missing,
because like I said previously the code that computes the used size goes
completely off of what is reported to BaseDirManager, unfortunately there are
some issues with BaseDirManger where if we are too aggressive with setting the
size we might double count some archives, which eventually would make it so
that the BaseDirManager thinks it is full all the time, which would be very bad.
> Distributed caches are not removed properly
> -------------------------------------------
>
> Key: MAPREDUCE-3824
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3824
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: distributed-cache
> Affects Versions: 1.0.0
> Reporter: Allen Wittenauer
> Priority: Critical
> Attachments: MAPREDUCE-3824-branch-1.0.txt
>
>
> Distributed caches are not being properly removed by the TaskTracker when
> they are expected to be expired.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira