[ 
https://issues.apache.org/jira/browse/HBASE-12324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188518#comment-14188518
 ] 

Sheetal Dolas commented on HBASE-12324:
---------------------------------------

So to consolidate all inputs and comments, here is what I see

* All compactions (including striped compactions) already have logic to delete 
TTL expired files  when 'hbase.store.delete.expired.storefile' is set to 'true' 
(Reference: HBASE-5199 and HBASE-10141 )
* Archiving old files is based comparison between file timestamp and server 
time stamp. HBASE-5199 can be further improved to store and check latest TTL of 
any field in the file trailer and use that for comparison instead of file 
timestamp. This could be independent improvement from this thread.
* Proposed OnlyDeleteExpiredFilesCompactionFiles has its own use case where use 
does not want to compact at all and just delete old data. 
* Cases where some compassion is needed (to avoid too many HFiles) can be 
addressed by striped compaction policy (as it already has smarter logic for 
deciding which files to compact as well as at already deletes TTL expired files 
before compaction.)
* Declaring table/cf "immutable" - to make smarter decisions: This probably 
needs more exploration.
* New question now aires is shall there be multiple compaction policies (only 
delete expired files, striped, immutable etc ) or should is all be consolidated 
under striped compaction with configuration parameters to enable/disable 
certain behavior.

> Improve compaction speed and process for immutable short lived datasets
> -----------------------------------------------------------------------
>
>                 Key: HBASE-12324
>                 URL: https://issues.apache.org/jira/browse/HBASE-12324
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>    Affects Versions: 0.98.0, 0.96.0
>            Reporter: Sheetal Dolas
>         Attachments: OnlyDeleteExpiredFilesCompactionPolicy.java
>
>
> We have seen multiple cases where HBase is used to store immutable data and 
> the data lives for short period of time (few days)
> On very high volume systems, major compactions become very costly and 
> slowdown ingestion rates.
> In all such use cases (immutable data, high write rate and moderate read 
> rates and shorter ttl), avoiding any compactions and just deleting old data 
> brings lot of performance benefits.
> We should have a compaction policy that can only delete/archive files older 
> than TTL and not compact any files.
> Also attaching a patch that can do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to