Having an issue with table design regarding how to delete old/obsolete data.
I have raw names in a non-time sorted manner, id first followed by timestamp, the main objective being running big scans on specific id's from time x to time y.
However this data builds up at a respectable rate and I need a method to delete old records en masse. I considered using the ttl parameter on the column families, but the current plan is to selectively store data for a longer time for specific id's.
Are there any plans to link a delete operation with a scanner(so delete range x-y, or if you supply a filter, delete when conditions p and q are met).
If not what would be the recommended method to handle these kind of batch deletes? The current JIRA for MultiDelete ( http://issues.apache.org/jira/browse/HBASE-1845 ) simply implements deleting on a List<Delete>, which still seems limited.
Is the only way to do this to run a scan, and then build a List from that to use with the multi call discussed in HBASE-1845? This feels very inefficient but please correct me if I'm mistaken. Current activity estimate is about 10million rows a day, generating about 300million cells, which would need to be deleted on a regular basis(so 300mil cells every day or 2.1bil once a week)