And just to be clear, since there are several definitions of key flying around - in the following case:
row1,colfam1,colqual1,4 -> valueA row1,colfam1,colqual1,5 -> valueB These can coexist peacefully - although the versioning iterator might supress all but k versions. in this case: row1,colfam1,colqual1,4 -> valueA row1,colfam1,colqual1,4 -> valueB Accumulo should throw one away arbitrarily. I think what you mentioned, a system iterator that performs this logic, would be a good implementation. On Dec 22, 2011, at 5:09 PM, Keith Turner wrote: > On Thu, Dec 22, 2011 at 4:49 PM, Aaron Cordova <[email protected]> wrote: >> I think it's fine to consider different versions of 'identical keys', >> meaning row,colfam,colqual, because in that case the implementation still >> treats two keys that only differ by timestamp as two unique keys. But I >> don't think we should allow multiple identical _versions_ of identical keys, >> to use your terminology. I think we should throw all but one away if the >> user does happen to try to insert them and if the user wants to aggregate >> across values, he or she must use different version numbers or timestamps or >> whatever. >> >> If generating unique timestamps within mutations that want to perform >> several updates to the same row,colfam,colqual is a problem, why don't we >> allow the user to 'put()' multiple updates into a mutation, and on the >> server then assign slightly different timestamps to the identical >> row,colfam,colqual triples that are found in a mutation. Would that make >> everyone happy? > > This still does not address the issue of separate mutations inserting > the exact same key. Also timestamps are only set on the keys in a > mutation if the user does not set them. > > So if a table comes to have multiple keys that are exactly the same, > what do you propose? That we drop them? Which one will you drop? > One nice thing about Accumulo is that if you wish to have this > behavior, you can very easily write an iterator to do it. I think you > are proposing that we configure an iterator to do this by default? > > I think if the user is inserting things with exact same key and > expecting it to behave like a treemap (honor order of arrival), then > it never will. Even if we drop duplicate keys, we will not achieve > the map behavior you described.
