[
https://issues.apache.org/jira/browse/TEZ-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621362#comment-14621362
]
Saikat edited comment on TEZ-2172 at 7/10/15 3:43 PM:
------------------------------------------------------
LinkedHashSet seems to be a good option as it also retains the order in which
the items are inserted into the set and provides constant time performance for
add, contains and remove.
was (Author: saikatr):
A solution could be to make a linkedhashmap<InputAttemptIdentifier, Integer>
(LInkedHashmap has efficient remove properties, and for our scenario each
Fetcher runs in its own thread context so the map need not be thread safe)
The Integer value field could be a dummy field.
We would retrieve the key and work with it.
> FetcherOrderedGrouped using List to store InputAttemptIdentifier can lead to
> some inefficiency during remove() operation
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: TEZ-2172
> URL: https://issues.apache.org/jira/browse/TEZ-2172
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Saikat
>
> As part of fixing TEZ-2001, FetcherOrderedGrouped stores
> InputAttemptIdentifier in List. This can lead to some inefficiency - since
> the size of this list can be ~30, and remove() calls can be expensive.
> Option 1: by using the spillId in the hashCode - or a wrapping structure for
> just this. However, SpillId can not be added to the hashCode as it would
> break ShuffleScheduler shuffleInfoEventsMap.
> Option 2: consider using Map with an identifier.
> Need to consider other options as well. Creating this jira as a placeholder
> to fix this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)