[ 
https://issues.apache.org/jira/browse/ACCUMULO-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174913#comment-13174913
 ] 

Aaron Cordova commented on ACCUMULO-227:
----------------------------------------

What the client should expect is that Accumulo will only store/process one 
value per unique key: Accumulo is a distributed map. Even if it's only for 
aggregation's sake, allowing Mutations to submit multiple values per unique key 
and processing all those values, rather than arbitrarily choosing one, violates 
the concept of a map, which will cause more confusion on the part of users.

The right thing to do for users who want to submit lots of values to aggregate 
under a sub key is to insist that they make their cells differ by at least one 
element in the key. Again, aggregating multiple values under the same key 
violates the basic tenet that Accumulo is a map. Aggregation is performed 
across different keys sharing a sub key.

If having the users generate unique timestamps is a problem, there are several 
strategies for dealing with that. One is to generate random timestamps. If 
aggregation is being done over timestamps, the actual timestamp shouldn't 
matter / ever be interpreted. If there are worries about Accumulo doing 
something undesired with random timestamps, one could generate random column 
qualifiers, etc. and aggregate over those.

To address what Adam said about versioning - aggregating tables should probably 
turn off the iterator that only keeps the latest version. But that has nothing 
to do with the policy for handling multiple identical cells.

Finally, I'm not advocating we do anything to support aggregation on the client 
side, but rather leave it up to the application developer to exploit any 
opportunities for aggregation in their application.

                
> Improve in memory map counts to provide cell level uniqueness for repeated 
> columns in  mutation
> -----------------------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-227
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-227
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: tserver
>            Reporter: John Vines
>            Assignee: John Vines
>             Fix For: 1.5.0
>
>
> Currently for isolation we only isolate mutations. This doesn't allow 
> mutations with identical cells within it. We should increase the mutation 
> counts to account for each individual cell instead of each mutation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to