[
https://issues.apache.org/jira/browse/HBASE-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729379#action_12729379
]
Jonathan Gray commented on HBASE-1485:
--------------------------------------
I've had at least three people with a use case for this.
Might create a couple sub-tasks here so we can at least head in the right
direction.
First, we need to make scanners ignore duplicate versions of the same column.
The trickiest part is, how do we determine which to keep? We want to always
come from the latest storefile, but I believe their IDs are still random and
not timestamps? We might need to make that change to fix this. Would also
then require a modification to the KVHeap to take this into account, all other
things considered equal.
Once we have scanners working, that will mean the proper thing is enforced on
major (and if we want, minor) compactions.
Gets will only work once we re-implement Gets as an optimized scan (taking
advantage of bloom filters, mostly).
I remember why I punted this to 0.20.1, the tricky part at the beginning is
pretty tough and touches a good bit of core read-path code.
Revisiting now, we'll see. Anyone else interested in this / want to work on it?
> Wrong or indeterminate behavior when there are duplicate versions of a column
> -----------------------------------------------------------------------------
>
> Key: HBASE-1485
> URL: https://issues.apache.org/jira/browse/HBASE-1485
> Project: Hadoop HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.20.0
> Reporter: Jonathan Gray
> Fix For: 0.20.1
>
>
> As of now, both gets and scanners will end up returning all duplicate
> versions of a column. The ordering of them is indeterminate.
> We need to decide what the desired/expected behavior should be and make it
> happen.
> Note: It's nearly impossible for this to work with Gets as they are now
> implemented in 1304 so this is really a Scanner issue. To implement this
> correctly with Gets, we would have to undo basically all the optimizations
> that Gets do and making them far slower than a Scanner.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.