[
https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lars Hofhansl resolved HBASE-12363.
-----------------------------------
Resolution: Fixed
Hadoop Flags: Reviewed
Committed to 0.98, -1, and master.
Thanks for the reviews.
> Improve how KEEP_DELETED_CELLS works with MIN_VERSIONS
> ------------------------------------------------------
>
> Key: HBASE-12363
> URL: https://issues.apache.org/jira/browse/HBASE-12363
> Project: HBase
> Issue Type: Sub-task
> Components: regionserver
> Reporter: Lars Hofhansl
> Assignee: Lars Hofhansl
> Labels: Phoenix
> Fix For: 2.0.0, 0.98.8, 0.99.2
>
> Attachments: 12363-0.98.txt, 12363-1.0.txt, 12363-master.txt,
> 12363-test.txt, 12363-v2.txt, 12363-v3.txt
>
>
> Brainstorming...
> This morning in the train (of all places) I realized a fundamental issue in
> how KEEP_DELETED_CELLS is implemented.
> The problem is around knowing when it is safe to remove a delete marker (we
> cannot remove it unless all cells affected by it are remove otherwise).
> This was particularly hard for family marker, since they sort before all
> cells of a row, and hence scanning forward through an HFile you cannot know
> whether the family markers are still needed until at least the entire row is
> scanned.
> My solution was to keep the TS of the oldest put in any given HFile, and only
> remove delete markers older than that TS.
> That sounds good on the face of it... But now imagine you wrote a version of
> ROW 1 and then never update it again. Then later you write a billion other
> rows and delete them all. Since the TS of the cells in ROW 1 is older than
> all the delete markers for the other billion rows, these will never be
> collected... At least for the region that hosts ROW 1 after a major
> compaction.
> Note, in a sense that is what HBase is supposed to do when keeping deleted
> cells: Keep them until they would be removed by some other means (for example
> TTL, or MAX_VERSION when new versions are inserted).
> The specific problem here is that even as all KVs affected by a delete marker
> are expired this way the marker would not be removed if there just one older
> KV in the HStore.
> I don't see a good way out of this. In parent I outlined these four solutions:
> So there are three options I think:
> # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not
> apply to deleted rows or delete marker rows (wouldn't know how long to keep
> family deletes in that case). (MAX)VERSIONS would still be enforced on all
> rows types except for family delete markers.
> # Translate family delete markers to column delete marker at (major)
> compaction time.
> # Change HFileWriterV* to keep track of the earliest put TS in a store and
> write it to the file metadata. Use that use expire delete marker that are
> older and hence can't affect any puts in the file.
> # Have Store.java keep track of the earliest put in internalFlushCache and
> compactStore and then append it to the file metadata. That way HFileWriterV*
> would not need to know about KVs.
> And I implemented #4.
> I'd love to get input on ideas.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)