[
https://issues.apache.org/jira/browse/HBASE-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-2999:
-------------------------
Issue Type: New Feature (was: Bug)
Component/s: regionserver
(was: master)
> hbase TTL can be suboptimal and leave small regions after compaction
> --------------------------------------------------------------------
>
> Key: HBASE-2999
> URL: https://issues.apache.org/jira/browse/HBASE-2999
> Project: HBase
> Issue Type: New Feature
> Components: regionserver
> Affects Versions: 0.89.20100621
> Environment: All
> Reporter: Jimmy Hu
>
> Yes, Current TTL based on compaction is working as advertised if the key
> randomly distribute the incoming data
> among all regions. However, if the key is designed in chronological order,
> the TTL doesn't really work, as no compaction
> will happen for data already written. So we can't say that current TTL
> really work as advertised, as it is key structure dependent.
> This is a pity, because a major use case for hbase is for people to store
> history or log data. normally people only
> want to retain the data for a fixed period. for example, US government
> default data retention policy is 7 years. Those
> data are saved in chronological order. Current TTL implementation doesn't
> work at all for those kind of use case.
> In order for that use case to really work, hbase needs to have an active
> thread that periodically runs and check if there
> are data older than TTL, and delete the data older than TTL is necessary,
> and compact small regions older than certain time period
> into larger ones to save system resource. It can optimize the deletion by
> delete the whole region if it detects that the last time
> stamp for the region is older than TTL. There should be 2 parameters to
> configure for hbase:
> 1. whether to disable/enable the TTL thread.
> 2. the interval that TTL will run. maybe we can use a special value like 0
> to indicate that we don't run the TTL thread, thus saving one configuration
> parameter.
> for the default TTL, probably it should be set to 1 day.
> 3. How small will the region be merged. it should be a percentage of the
> store size. for example, if 2 consecutive region is only 10% of the store
> szie ( default is 256M), we can initiate a region merge. We probably need a
> parameter to reduce the merge too. for example , we only merge for regions
> who's largest timestamp
> is older than half of TTL.
> We are tracking min/max timestamps in storefiles currently, so it's possible
> that we could expire some files of a region as well, even if the region was
> not completely expired. So At minimum, we should be able to implement
> dropping the stores that is older than TTL. if all stores for a region is
> dropped, we should drop the whole region,
> and update the key range of the adjacent region, so there is not a key hole
> left.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.