[ 
https://issues.apache.org/jira/browse/HBASE-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948201#comment-15948201
 ] 

huaxiang sun edited comment on HBASE-17215 at 3/30/17 1:57 AM:
---------------------------------------------------------------

Thanks [~carp84] for the patch! The patch looks great. I went through the patch 
and left couple comments there. Maybe as a followup, can these two threads 
share a LinkedBlockingDeque? So large-size file is added to the head of the 
queue and small-size files are added to the tail. the large-file-thread poll 
from the head and the small-file-thread poll from the end. In this sense, no 
thread will be idle when there are works to do. What do you think? 

Or we do sorting of filesToBeDeleted based on the file size, push into the 
queue with largest size files first. These two threads can poll files out of 
the queue. Essentially, these two threads will delete files from large size to 
small size.


was (Author: huaxiang):
Thanks [~carp84] for the patch! The patch looks great. I went through the patch 
and left couple comments there. Maybe as a followup, can these two threads 
share a LinkedBlockingDeque? So large-size file is added to the head of the 
queue and small-size files are added to the tail. the large-file-thread poll 
from the head and the small-file-thread poll from the end. In this sense, no 
thread will be idle when there are works to do. What do you think? 

> Separate small/large file delete threads in HFileCleaner to accelerate 
> archived hfile cleanup speed
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-17215
>                 URL: https://issues.apache.org/jira/browse/HBASE-17215
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Yu Li
>            Assignee: Yu Li
>         Attachments: HBASE-17215.patch
>
>
> When using PCIe-SSD the flush speed will be really quick, and although we 
> have per CF flush, we still have the 
> {{hbase.regionserver.optionalcacheflushinterval}} setting and some other 
> mechanism to avoid data kept in memory for too long to flush small hfiles. In 
> our online environment we found the single thread cleaner kept cleaning 
> earlier flushed small files while large files got no chance, which caused 
> disk full then many other problems.
> Deleting hfiles in parallel with too many threads will also increase the 
> workload of namenode, so here we propose to separate large/small hfile 
> cleaner threads just like we do for compaction, and it turned out to work 
> well in our cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to