[
https://issues.apache.org/jira/browse/ACCUMULO-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014268#comment-15014268
]
Josh Elser commented on ACCUMULO-4062:
--------------------------------------
Oh, that must be new. The above I copied is in 1.7.
That approach looks like there is some area for missing equivalent mutations
(e.g. 2 mutations with column updates [a, b] and [b, a] would likely have
different hashCodes despite being equivalent in a read). If that's the case, I
guess the question is how does the constant insert time of a list (append plus
cost of growing the list) compare to the average constant time insert of the
Java's HashMap (potentially being skewed with load or bad hashing). Would be an
interesting experiment.
> Change MutationSet.mutations to use HashSet
> -------------------------------------------
>
> Key: ACCUMULO-4062
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4062
> Project: Accumulo
> Issue Type: Improvement
> Components: client
> Reporter: Dave Marion
>
> Change TabletServerBatchWriter.MutationSet.mutations from a
> {code}
> HashMap<String,List<Mutation>>
> {code}
> to
> {code}
> HashMap<String,HashSet<Mutation>>
> {code}
> so that duplicate mutations added by a client are not sent to the server.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)