[
https://issues.apache.org/jira/browse/HBASE-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691109#comment-13691109
]
Lars George commented on HBASE-8784:
------------------------------------
I did think of the meta blocks in the files. The issue is of course that these
"super" deletes span possibly multiple regions and even servers. Your
suggestion of .META. and/or a special delete table therefore makes sense to
synchronize this across all of them.
Adding an extra table triggers a lot of questions I presume, like how to handle
that efficiently and consistently (namespaces etc.). Which of the two would you
think is more manageable/less intrusive?
As for your approach, that is indeed a user-land implementation of a similar
feature, and the semantics about when data is freed is also the same as any
other delete. Dropping deleted data faster (maybe even securely like discussed
on the mailing list right now) is a whole different story.
> Wildcard/Range/Partition Delete Support
> ---------------------------------------
>
> Key: HBASE-8784
> URL: https://issues.apache.org/jira/browse/HBASE-8784
> Project: HBase
> Issue Type: New Feature
> Components: Client, Deletes, regionserver
> Reporter: Lars George
>
> We often see use-cases where users, for example with timeseries data, would
> like to do deletes of large ranges of data, basically like a delete of a
> partition as supported by RDBMSs. We should support regular expressions or
> range expressions for the matches (supporting binary keys obviously).
> The idea is to store the deletes not with the data, but the meta data. When
> we read files we read the larger deletes first, and then the inline ones. Of
> course, this should be reserved for few but very data intensive deletes. This
> reduces the number of deletes to write to one, instead of many (often
> thousands, if not millions). This is different from the BulkDeleteEndpoint
> introduced in HBASE-6942. It should support similar Scan based selectiveness.
> The new range deletes will mask out all the matching data and handled
> otherwise like other deletes, for example being dropped during major
> compactions, once all masked data has been dropped too.
> To be discussed is how and where we store the delete entry in practice, since
> meta data might not be wanted. But it seems like a reasonable choice. The
> DeleteTracker can handle the delete the same with additional checks for
> wildcards/ranges. If the deletes are not used, no critical path is affected,
> therefore not causing any additional latencies or other regressions.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira