[ 
https://issues.apache.org/jira/browse/HADOOP-2222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565214#action_12565214
 ] 

Bryan Duxbury commented on HADOOP-2222:
---------------------------------------

There's two ways I could see this feature working. One way would be that the 
TTL means time from the timestamp of the cell. The other would be time from the 
time the cell was stored, regardless of the timestamp. 

The first is really easily done, I think. We'd need to:

 - Add get and set TTL to HColumnDescriptor
 - Change shell's CREATE TABLE statement so it takes a TTL parameter for column 
families
 - Update HStore methods (get, put, etc) to check HStoreKey's timestamps 
against TTL value when doing anything
 - Compactor should screen out cells past TTL

The only thing to keep in mind about the first approach is that it'd be 
possible to put a cell with a timestamp farther back than the TTL to begin 
with. In this case, data would pretty much just silently be lost. However, 
chances are, if you are interested in TTLs your use case doesn't likely require 
you to write records in the past.

The second approach would be much harder. We'd have to start storing extra 
stored-at timestamps everywhere. As such, I'd say -1 on that one.

> option to set TTL for columns in hbase
> --------------------------------------
>
>                 Key: HADOOP-2222
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2222
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hbase
>            Reporter: Billy Pearson
>            Priority: Minor
>             Fix For: 0.17.0
>
>
> I would like to see the option to have a TTL on the columns in hbase this 
> feature could be helpfully in removing stale data from large datasets with 
> out havening to do a full scan of the dataset and then issuing deletes.
> Example 
> Say I am crawling pages and only refreshing pages based on a set score and 
> some pages doe not get updated over X days the old version of the page gets 
> removed from the data set. 
> Say I am striping out links form html and storing them say a link is removed 
> from a page then I would need to issue a delete statement to remove that 
> links form the data set with a ttl the link data would remove its self if not 
> updated in x secs. These are just examples based on crawling like nutch but I 
> can foresee many apps using this option. 
> This is a feature in bigtables thats is handled when bigtable does 
> garbage-collection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to