[
https://issues.apache.org/jira/browse/HBASE-12324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188518#comment-14188518
]
Sheetal Dolas commented on HBASE-12324:
---------------------------------------
So to consolidate all inputs and comments, here is what I see
* All compactions (including striped compactions) already have logic to delete
TTL expired files when 'hbase.store.delete.expired.storefile' is set to 'true'
(Reference: HBASE-5199 and HBASE-10141 )
* Archiving old files is based comparison between file timestamp and server
time stamp. HBASE-5199 can be further improved to store and check latest TTL of
any field in the file trailer and use that for comparison instead of file
timestamp. This could be independent improvement from this thread.
* Proposed OnlyDeleteExpiredFilesCompactionFiles has its own use case where use
does not want to compact at all and just delete old data.
* Cases where some compassion is needed (to avoid too many HFiles) can be
addressed by striped compaction policy (as it already has smarter logic for
deciding which files to compact as well as at already deletes TTL expired files
before compaction.)
* Declaring table/cf "immutable" - to make smarter decisions: This probably
needs more exploration.
* New question now aires is shall there be multiple compaction policies (only
delete expired files, striped, immutable etc ) or should is all be consolidated
under striped compaction with configuration parameters to enable/disable
certain behavior.
> Improve compaction speed and process for immutable short lived datasets
> -----------------------------------------------------------------------
>
> Key: HBASE-12324
> URL: https://issues.apache.org/jira/browse/HBASE-12324
> Project: HBase
> Issue Type: New Feature
> Components: Compaction
> Affects Versions: 0.98.0, 0.96.0
> Reporter: Sheetal Dolas
> Attachments: OnlyDeleteExpiredFilesCompactionPolicy.java
>
>
> We have seen multiple cases where HBase is used to store immutable data and
> the data lives for short period of time (few days)
> On very high volume systems, major compactions become very costly and
> slowdown ingestion rates.
> In all such use cases (immutable data, high write rate and moderate read
> rates and shorter ttl), avoiding any compactions and just deleting old data
> brings lot of performance benefits.
> We should have a compaction policy that can only delete/archive files older
> than TTL and not compact any files.
> Also attaching a patch that can do so.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)