[ 
https://issues.apache.org/jira/browse/HBASE-12324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183355#comment-14183355
 ] 

Nick Dimiduk commented on HBASE-12324:
--------------------------------------

I believe OpenTSDB is commonly used as a metrics archival tool as well, so 
retaining data for months or years will quickly accumulate small HFIles using 
this scheme. I believe its data is otherwise consistent with your assumptions. 
You need to be very careful with your flush sizes to avoid a small file 
problem. As Sean says, I'd prefer to see less operational overhead push to 
users, not more. It would be interesting to see an 
"ImmutableRealtimeTimeSeriesCompactionPolicy" that will compact small files 
when some threshold is exceeded but otherwise defer to simply expiring files as 
you do here.

Another question: in this schema, does the rowkey contain the data's timestamp? 
Are you just using HBase cell version for storing your temporal attribute? 
StripeCompactionPolicy is explicitly addressing the former case (because stripe 
boundaries are identified by rowkey ranges.

> Improve compaction speed and process for immutable short lived datasets
> -----------------------------------------------------------------------
>
>                 Key: HBASE-12324
>                 URL: https://issues.apache.org/jira/browse/HBASE-12324
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>    Affects Versions: 0.98.0, 0.96.0
>            Reporter: Sheetal Dolas
>         Attachments: OnlyDeleteExpiredFilesCompactionPolicy.java
>
>
> We have seen multiple cases where HBase is used to store immutable data and 
> the data lives for short period of time (few days)
> On very high volume systems, major compactions become very costly and 
> slowdown ingestion rates.
> In all such use cases (immutable data, high write rate and moderate read 
> rates and shorter ttl), avoiding any compactions and just deleting old data 
> brings lot of performance benefits.
> We should have a compaction policy that can only delete/archive files older 
> than TTL and not compact any files.
> Also attaching a patch that can do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to