[
https://issues.apache.org/jira/browse/KUDU-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
shenxingwuying reassigned KUDU-3422:
------------------------------------
Assignee: shenxingwuying
> provide compact CLI tools for kudu administrators
> -------------------------------------------------
>
> Key: KUDU-3422
> URL: https://issues.apache.org/jira/browse/KUDU-3422
> Project: Kudu
> Issue Type: New Feature
> Reporter: shenxingwuying
> Assignee: shenxingwuying
> Priority: Major
>
> h1. Motivation
> In kudu, compaction jobs may be a suffering at some scenario, for example:
> # mrs, dms flush not timely enough. The patch for this:
> [https://gerrit.cloudera.org/c/17743/]
> # Disk space amplification is too serious, need compact all rs, but no jobs
> runs, even when no maintenance job and workload is very low.
> # Some kinds of gc jobs should have been launched but no jobs runs, even
> when no maintenance job and workload is very low.
> We can solve every problem about them case by case. Compaction jobs don't
> work well may be complex, bugs exist or strategies are not good enough and
> should be improved. Our new optimize scheme maybe not reach the effect we
> expected. And we should ensure the new optimization online by upgrade kudu,
> upgrade need consider some other situations about product environment and
> users' worries, and the operation itself may encounter another suffering:
> bootstrap is very very slow.
> All in words, It's a very complex. Every problems need take some time to
> analyse. The problem when production environment happens, administrators have
> to change some gflags parameters and restart kudu to expect some compaction
> jobs can be scheduled. You see, restart kudu may take too much time and
> restarting cluster may loss availability.
> I want to support a quick method to solve them without restart. It's a
> troubleshooting for the cases above, not a root solution.
> At this, I view them from another angle to solve some difficulties. The
> solution can be accepted by SREs.
> h1. Solution
> We can deal with the problem in a flexible way: kudu administrators can
> launch some kind of compaction jobs based on their jugdements.
> To support the idea. Kudu CLI tool should add a command, like this:
>
> {{kudu compact <master_list> --tables=<tables> --tablet_ids=<tablet_ids>
> --servers=<host:port> --compact_type=<compact_rowsets,deleted_rowset_gc,...>}}
> kudu-tserver's network service should add a api, when receive the command, it
> launch a corresponding compact job. The job should run at ThreadPool
> 'thread_pool_' in class 'MaintenanceManager'. The compaction job is triggered
> by administrators and it should skip the best score computation, so its a
> method for abnormal cases.
> The compaction job should run at another thread not the service thread,
> because it may be a long time job.
> So we should provide a method to check the job's status.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)