[jira] [Commented] (HBASE-3969) Outdated data can not be cleaned in time

zhoushuaifeng (JIRA) Thu, 16 Jun 2011 18:48:01 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050850#comment-13050850
 ]


zhoushuaifeng commented on HBASE-3969:
--------------------------------------

Hi st, I don't do any deletes, I only set the TTL of the table to a few days.  
When data's timestamps are old than the TTL, these data should be cleaned by a 
major compact. But if the region have no new data inserted for a while, there 
would be only 1 or 2 files in it, so the priority is very low. If there is 
large  through output, major compact will be delayed.
About the ycsb, we have done some change on it, it can load data as we want(the 
key, value, speed is all customized). THe scan is randomly, so it have a chance 
on the regions lots of data outdated but haven't cleaned intime.
I think there is no need to check for hbase.hstore.blockingStoreFiles > 
hbase.hstore.compactionThreshold, because the priority can be negative. When 
there are more files in the store than blockingStoreFiles, the flush operation 
will triger a compact, and the priority will be negative, we have seen that 
before. The negitive priority means that there are too many files in the store, 
and the flushing of this store may be blocked for at most 90 seconds. We have 
seen this in the logs before.
And also, the user mustn't set hbase.hstore.blockingStoreFiles <= 
hbase.hstore.compactionThreshold, if so, blocking will aways happen.



> Outdated data can not be cleaned in time
> ----------------------------------------
>
>                 Key: HBASE-3969
>                 URL: https://issues.apache.org/jira/browse/HBASE-3969
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.90.1, 0.90.2, 0.90.3
>            Reporter: zhoushuaifeng
>             Fix For: 0.90.4
>
>         Attachments: HBASE-3969-solution1-for-branch.patch, 
> HBASE-3969-solution1.patch
>
>
> Compaction checker will send regions to the compact queue to do compact. But 
> the priority of these regions is too low if these regions have only a few 
> storefiles. When there is large through output, and the compact queue will 
> aways have some regions with higher priority. This may causing the major 
> compact be delayed for a long time(even a few days),  and outdated data 
> cleaning will also be delayed.
> In our test case, we found some regions sent to the queue by major compact 
> checker hunging in the queue for more than 2 days! Some scanners on these 
> regions cannot get availably data for a long time and lease expired.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3969) Outdated data can not be cleaned in time

Reply via email to