The sequenceid in the file tells you the newest (largest=newest). If the heap used that we might be sitting pretty.
We want to avoid using ts for filename I think, not sure what assumptions might break. On Jul 9, 2009 11:42 AM, "Jonathan Gray (JIRA)" <[email protected]> wrote: [ https://issues.apache.org/jira/browse/HBASE-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729379#action_12729379] Jonathan Gray commented on HBASE-1485: -------------------------------------- I've had at least three people with a use case for this. Might create a couple sub-tasks here so we can at least head in the right direction. First, we need to make scanners ignore duplicate versions of the same column. The trickiest part is, how do we determine which to keep? We want to always come from the latest storefile, but I believe their IDs are still random and not timestamps? We might need to make that change to fix this. Would also then require a modification to the KVHeap to take this into account, all other things considered equal. Once we have scanners working, that will mean the proper thing is enforced on major (and if we want, minor) compactions. Gets will only work once we re-implement Gets as an optimized scan (taking advantage of bloom filters, mostly). I remember why I punted this to 0.20.1, the tricky part at the beginning is pretty tough and touches a good bit of core read-path code. Revisiting now, we'll see. Anyone else interested in this / want to work on it? > Wrong or indeterminate behavior when there are duplicate versions of a column > -----------------...
