[ 
https://issues.apache.org/jira/browse/HADOOP-2222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565267#action_12565267
 ] 

Billy Pearson commented on HADOOP-2222:
---------------------------------------

+1 on the first idea as we store the timestamps now so we do not need to store 
more data to support this option. TTL would be an option on the create table so 
if you are using TTL you would likely know to not write old stale data as it 
will be removed. but basically how you outlined would be idea.


> option to set TTL for columns in hbase
> --------------------------------------
>
>                 Key: HADOOP-2222
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2222
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hbase
>            Reporter: Billy Pearson
>            Priority: Minor
>             Fix For: 0.17.0
>
>
> I would like to see the option to have a TTL on the columns in hbase this 
> feature could be helpfully in removing stale data from large datasets with 
> out havening to do a full scan of the dataset and then issuing deletes.
> Example 
> Say I am crawling pages and only refreshing pages based on a set score and 
> some pages doe not get updated over X days the old version of the page gets 
> removed from the data set. 
> Say I am striping out links form html and storing them say a link is removed 
> from a page then I would need to issue a delete statement to remove that 
> links form the data set with a ttl the link data would remove its self if not 
> updated in x secs. These are just examples based on crawling like nutch but I 
> can foresee many apps using this option. 
> This is a feature in bigtables thats is handled when bigtable does 
> garbage-collection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to