[
https://issues.apache.org/jira/browse/HBASE-12324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183520#comment-14183520
]
Enis Soztutar commented on HBASE-12324:
---------------------------------------
This compaction policy makes sense with HBASE-10141 I think. Given the use
case, it disables compactions effectively, but still lets TTL do the job. The
problem with disable compactions using regular configuration is that, only
compaction will get rid of hfiles, so disabling compactions will not expire any
files. With this compaction policy, we trigger compactions, but the compaction
selection will not select any files.
bq. Run periodically utility which purge/archive the oldest HFiles
BTW, you cannot delete a file under the region using an external tool if the
region is being served (table enabled, hbase cluster running).
bq. It's actually worse than that, because the clock could adjust and we could
have a file timestamp that is older than the cell timestamps within it. That
would result in deleting data that isn't yet expired. (presuming the timestamp
will be set based on when the server calls close())
That is how TTL's work in HBase. The RS compares the max TS of the file / cell
with the current timestamp.
bq. You will never read this stale data back unles you have MIN_VERSIONS > 0
for that CF.
I think HBASE-10141 and MIN_VERSIONS > 0 is incompatible. We may need to
address / document that.
> Improve compaction speed and process for immutable short lived datasets
> -----------------------------------------------------------------------
>
> Key: HBASE-12324
> URL: https://issues.apache.org/jira/browse/HBASE-12324
> Project: HBase
> Issue Type: New Feature
> Components: Compaction
> Affects Versions: 0.98.0, 0.96.0
> Reporter: Sheetal Dolas
> Attachments: OnlyDeleteExpiredFilesCompactionPolicy.java
>
>
> We have seen multiple cases where HBase is used to store immutable data and
> the data lives for short period of time (few days)
> On very high volume systems, major compactions become very costly and
> slowdown ingestion rates.
> In all such use cases (immutable data, high write rate and moderate read
> rates and shorter ttl), avoiding any compactions and just deleting old data
> brings lot of performance benefits.
> We should have a compaction policy that can only delete/archive files older
> than TTL and not compact any files.
> Also attaching a patch that can do so.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)