shenxingwuying created KUDU-3422:
------------------------------------
Summary: provide compact CLI tools for kudu administrators
Key: KUDU-3422
URL: https://issues.apache.org/jira/browse/KUDU-3422
Project: Kudu
Issue Type: New Feature
Reporter: shenxingwuying
h1. Motivation
In kudu, compaction jobs may be a suffering at some scenario, for example:
# mrs, dms flush not timely enough. The patch for this:
[https://gerrit.cloudera.org/c/17743/]
# Disk space amplification is too serious, need compact all rs, but no jobs
runs, even when no maintenance job and workload is very low.
# Some kinds of gc jobs should have been launched but no jobs runs, even when
no maintenance job and workload is very low.
We can solve every problem about them case by case. Compaction jobs don't work
well may be complex, bugs exist or strategies are not good enough and should be
improved. Our new optimize scheme maybe not reach the effect we expected. And
we should ensure the new optimization online by upgrade kudu, upgrade need
consider some other situations about product environment and users' worries,
and the operation itself may encounter another suffering: bootstrap is very
very slow.
All in words, It's a very complex. Every problems need take some time to
analyse. The problem when production environment happens, administrators have
to change some gflags parameters and restart kudu to expect some compaction
jobs can be scheduled. You see, restart kudu may take too much time and
restarting cluster may loss availability.
I want to support a quick method to solve them without restart. It's a
troubleshooting for the cases above, not a root solution.
At this, I view them from another angle to solve some difficulties. The
solution can be accepted by SREs.
h1. Solution
We can deal with the problem in a flexible way: kudu administrators can launch
some kind of compaction jobs based on their jugdements.
To support the idea. Kudu CLI tool should add a command, like this:
{{kudu compact <master_list> --tables=<tables> --tablet_ids=<tablet_ids>
--servers=<host:port> --compact_type=<compact_rowsets,deleted_rowset_gc,...>}}
kudu-tserver's network service should add a api, when receive the command, it
launch a corresponding compact job. The job should run at ThreadPool
'thread_pool_' in class 'MaintenanceManager'. The compaction job is triggered
by administrators and it should skip the best score computation, so its a
method for abnormal cases.
The compaction job should run at another thread not the service thread, because
it may be a long time job.
So we should provide a method to check the job's status.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)