[ 
https://issues.apache.org/jira/browse/HBASE-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018745#comment-16018745
 ] 

Ted Yu edited comment on HBASE-18084 at 5/21/17 8:20 AM:
---------------------------------------------------------

bq. the fs.getContentSummary call is time consuming if there're many files in 
the directory

In the patch, we obtain size of all directories before performing cleaning on 
sorted directory list.
Have you thought about having two threads doing sorting and cleaning in 
parallel :

thread 1 does sorting, it presents sorted directory list every N directories 
(in batches).
thread 2 does cleaning and updates the list as thread 1 provides new list 
(minus the directories it has already cleaned)

The rationale behind the above design is to start cleaning without waiting for 
complete directory list. It is fine to clean small directory in thread 2 at the 
beginning because there is no time wasted in waiting for the complete list to 
come out.

What do you think ?


was (Author: [email protected]):
bq. the fs.getContentSummary call is time consuming if there're many files in 
the directory

In the patch, we obtain size of all directories before performing cleaning on 
sorted directory list.
Have you thought about having two threads doing sorting and cleaning in 
parallel :

thread 1 does sorting, it presents sorted directory list every N directories 
(in batches).
thread 2 does cleaning and updates the list as thread 1 provides new list 
(minus the directories it has already cleaned)

The rationale behind the above design is to start cleaning without waiting for 
complete directory list. It is fine to clean small directory in thread 2 
because there is no time wasted in waiting for the complete list to come out.

What do you think ?

> Improve CleanerChore to clean from directory which consumes more disk space
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-18084
>                 URL: https://issues.apache.org/jira/browse/HBASE-18084
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Yu Li
>            Assignee: Yu Li
>         Attachments: HBASE-18084.patch, HBASE-18084.v2.patch
>
>
> Currently CleanerChore cleans the directory in dictionary order, rather than 
> from the directory with largest space usage. And when data abnormally 
> accumulated to some huge volume in archive directory, the cleaning speed 
> might not be enough.
> This proposal is another improvement working together with HBASE-18083 to 
> resolve our online issue (archive dir consumed more than 1.8PB SSD space)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to