[ 
https://issues.apache.org/jira/browse/HBASE-12324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183520#comment-14183520
 ] 

Enis Soztutar commented on HBASE-12324:
---------------------------------------

This compaction policy makes sense with HBASE-10141 I think. Given the use 
case, it disables compactions effectively, but still lets TTL do the job. The 
problem with disable compactions using regular configuration is that, only 
compaction will get rid of hfiles, so disabling compactions will not expire any 
files. With this compaction policy, we trigger compactions, but the compaction 
selection will not select any files. 
bq. Run periodically utility which purge/archive the oldest HFiles
BTW, you cannot delete a file under the region using an external tool if the 
region is being served (table enabled, hbase cluster running).
bq. It's actually worse than that, because the clock could adjust and we could 
have a file timestamp that is older than the cell timestamps within it. That 
would result in deleting data that isn't yet expired. (presuming the timestamp 
will be set based on when the server calls close())
That is how TTL's work in HBase. The RS compares the max TS of the file / cell 
with the current timestamp. 
bq. You will never read this stale data back unles you have MIN_VERSIONS > 0 
for that CF.
I think HBASE-10141 and MIN_VERSIONS > 0 is incompatible. We may need to 
address / document that. 


> Improve compaction speed and process for immutable short lived datasets
> -----------------------------------------------------------------------
>
>                 Key: HBASE-12324
>                 URL: https://issues.apache.org/jira/browse/HBASE-12324
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>    Affects Versions: 0.98.0, 0.96.0
>            Reporter: Sheetal Dolas
>         Attachments: OnlyDeleteExpiredFilesCompactionPolicy.java
>
>
> We have seen multiple cases where HBase is used to store immutable data and 
> the data lives for short period of time (few days)
> On very high volume systems, major compactions become very costly and 
> slowdown ingestion rates.
> In all such use cases (immutable data, high write rate and moderate read 
> rates and shorter ttl), avoiding any compactions and just deleting old data 
> brings lot of performance benefits.
> We should have a compaction policy that can only delete/archive files older 
> than TTL and not compact any files.
> Also attaching a patch that can do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to