[
https://issues.apache.org/jira/browse/HBASE-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018801#comment-16018801
]
Yu Li edited comment on HBASE-18084 at 5/21/17 12:41 PM:
---------------------------------------------------------
bq. if the initial batch contains large directory
But what if not sir?
Let me say more about my case. The current clean logic uses depth-first algo,
while the archive dir hierarchical like:
{noformat}
/hbase/archive/data
- namespace
- table
- region
- CF
- files
{noformat}
And while we reach one leaf directory and get the file list in it and cleaning,
flush is still ongoing and the new files will be included when we iterate the
other directory later.
In our case the output of "hadoop fs -count" order by space usage (descending)
is like:
{noformat}
2043 686999 770527133663895
/hbase/archive/data/default/pora_6_feature_queue
2049 3430815 470358930247550
/hbase/archive/data/default/pora_6_feature
17101 704476 100740814980772
/hbase/archive/data/default/mainv3_ic
14251 495293 79161730247206
/hbase/archive/data/default/mainv3_main_result_b
14251 893144 71121202187220
/hbase/archive/data/default/mainv3_main_result_a
2045 79223 51098022268522
/hbase/archive/data/default/pora_log_wireless_search_item_pv_queue
2001 123332 49075201291122
/hbase/archive/data/default/mainv3_main_askr_queue_a
2001 65030 45649351359151
/hbase/archive/data/default/mainv3_main_askr_queue_b
{noformat}
And we have many directories like
{noformat}
13 6 173403
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_IdleFishPool_askr
3 1 253497
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_buyoffer_searcher_askr
17 17 15635421
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_cloud_wukuang_askr
13 6 56062313
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_common_search_askr
5 2 1165298
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_company_askr
11 9 1196774
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_content_search_askr
{noformat}
So the largest 3 directories took 1.3PB while the whole archive directory took
1.8PB, and the largest directory name starts with "p". If we use the greedy
algorithm, we may choose {{mainv3_main_askr_queue_a}} which has 123k files to
clean, while {{pora_6_feature_queue}} is still flushing with speed. And the
worst case is we cannot reach the largest dir in a long time.
And I agree that depends on the real case, but in our case the simple method in
current patch could work well, while I'm not sure whether the new approach
suggested will do (smile).
Since the patch here is already applied online, how about letting it in and
open other JIRA to implement and verify the new approach with greedy algo?
[~tedyu]
was (Author: carp84):
bq. if the initial batch contains large directory
But what if not sir?
Let me say more about my case. The current clean logic uses depth-first algo,
while the archive dir hierarchical like:
{noformat}
/hbase/archive/data
- namespace
- table
- region
- CF
- files
{noformat}
And while we reach one leaf directory and get the file list in it and cleaning,
flush is still ongoing and the new files will be included when we iterate the
other directory later.
In our case the output of "hadoop fs -count" order by space usage (descending)
is like:
{noformat}
2043 686999 770527133663895
/hbase/archive/data/default/pora_6_feature_queue
2049 3430815 470358930247550
/hbase/archive/data/default/pora_6_feature
17101 704476 100740814980772
/hbase/archive/data/default/mainv3_ic
14251 495293 79161730247206
/hbase/archive/data/default/mainv3_main_result_b
14251 893144 71121202187220
/hbase/archive/data/default/mainv3_main_result_a
2045 79223 51098022268522
/hbase/archive/data/default/pora_log_wireless_search_item_pv_queue
2001 123332 49075201291122
/hbase/archive/data/default/mainv3_main_askr_queue_a
2001 65030 45649351359151
/hbase/archive/data/default/mainv3_main_askr_queue_b
{noformat}
And we have many directories like
{noformat}
13 6 173403
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_IdleFishPool_askr
3 1 253497
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_buyoffer_searcher_askr
17 17 15635421
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_cloud_wukuang_askr
13 6 56062313
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_common_search_askr
5 2 1165298
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_company_askr
11 9 1196774
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_content_search_askr
{noformat}
So the largest 3 directories took 1.3PB while the whole archive directory took
1.8PB, and the largest directory name starts with "p". If we use the greedy
algorithm, we may choose {{mainv3_main_askr_queue_a}} which has 123k files to
clean, while {{pora_6_feature_queue}} is still flushing with speed. And the
worst case is we cannot reach the largest dir in a long time.
And I agree that depends on the real case, but in our case the simple method in
current patch could works well, while I'm not sure whether the new approach
suggested will do (smile).
> Improve CleanerChore to clean from directory which consumes more disk space
> ---------------------------------------------------------------------------
>
> Key: HBASE-18084
> URL: https://issues.apache.org/jira/browse/HBASE-18084
> Project: HBase
> Issue Type: Bug
> Reporter: Yu Li
> Assignee: Yu Li
> Attachments: HBASE-18084.patch, HBASE-18084.v2.patch
>
>
> Currently CleanerChore cleans the directory in dictionary order, rather than
> from the directory with largest space usage. And when data abnormally
> accumulated to some huge volume in archive directory, the cleaning speed
> might not be enough.
> This proposal is another improvement working together with HBASE-18083 to
> resolve our online issue (archive dir consumed more than 1.8PB SSD space)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)