[
https://issues.apache.org/jira/browse/MESOS-9172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598998#comment-16598998
]
James Peach edited comment on MESOS-9172 at 8/31/18 4:46 PM:
-------------------------------------------------------------
| [r/68587|https://reviews.apache.org/r/68587] | Fixed fetcher deadlock with
duplicate URIs. |
| [r/68586|https://reviews.apache.org/r/68586] | Add the output file to the
hash on CommandInfo::URI. |
was (Author: jamespeach):
| [r/68587|https://reviews.apache.org/*r/68587] | Fixed fetcher deadlock with
duplicate URIs. |
| [r/68586|https://reviews.apache.org/*r/68586] | Add the output file to the
hash on CommandInfo::URI. |
> Fetcher deadlock with duplicated URIs.
> --------------------------------------
>
> Key: MESOS-9172
> URL: https://issues.apache.org/jira/browse/MESOS-9172
> Project: Mesos
> Issue Type: Bug
> Components: fetcher
> Reporter: James Peach
> Assignee: James Peach
> Priority: Major
>
> If the fetcher cache is empty and you launch a task that contains duplicate
> URIs, the fetcher deadlocks waiting for the futures in
> {{FetcherProcess::_fetch}}.
> What happens is that when the fetcher is setting up the initial match of
> cache lookup futures in {{FetcherProcess::fetch}}, the duplicate URIs cause
> cache hits on the placeholder cache entries. This code is assuming that there
> is already an operation in flight that will populate the cache entry.
> However, the cache is currently empty - the placeholder entry is caused by a
> the duplicate in the task's URIs.
> When we await the futures in {{FetcherProcess::_fetch}}, we end up waiting
> for the future that indicated the cache entry becomes populated, but that
> won't ever happen because we need to make progress on the current fetching
> batch in order to populate the cache entry. At this point we are live-locked.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)