shenxingwuying created KUDU-3422:
------------------------------------

             Summary: provide compact CLI tools for kudu administrators
                 Key: KUDU-3422
                 URL: https://issues.apache.org/jira/browse/KUDU-3422
             Project: Kudu
          Issue Type: New Feature
            Reporter: shenxingwuying


h1. Motivation

In kudu, compaction jobs may be a suffering at some scenario, for example:
 # mrs, dms flush not timely enough. The patch for this: 
[https://gerrit.cloudera.org/c/17743/]
 # Disk space amplification is too serious, need compact all rs, but no jobs 
runs, even when no maintenance job and workload is very low.
 # Some kinds of gc jobs should have been launched but no jobs runs, even when 
no maintenance job and workload is very low.

We can solve every problem about them case by case. Compaction jobs don't work 
well may be complex, bugs exist or strategies are not good enough and should be 
improved. Our new optimize scheme maybe not reach the effect we expected. And 
we should ensure the new optimization online by upgrade kudu, upgrade need 
consider some other situations about product environment and users' worries, 
and the operation itself may encounter another suffering: bootstrap is very 
very slow.

All in words, It's a very complex. Every problems need take some time to 
analyse. The problem when production environment happens, administrators have 
to change some gflags parameters and restart kudu to expect some compaction 
jobs can be scheduled. You see, restart kudu may take too much time and 
restarting cluster may loss availability.

I want to support a quick method to solve them without restart. It's a 
troubleshooting for the cases above, not a root solution.
At this, I view them from another angle to solve some difficulties. The 
solution can be accepted by SREs.
h1. Solution

We can deal with the problem in a flexible way: kudu administrators can launch 
some kind of compaction jobs based on their jugdements.

To support the idea. Kudu CLI tool should add a command, like this:

 

{{kudu compact <master_list> --tables=<tables> --tablet_ids=<tablet_ids> 
--servers=<host:port> --compact_type=<compact_rowsets,deleted_rowset_gc,...>}}

kudu-tserver's network service should add a api, when receive the command, it 
launch a corresponding compact job. The job should run at ThreadPool 
'thread_pool_' in class 'MaintenanceManager'. The compaction job is triggered 
by administrators and it should skip the best score computation, so its a 
method for abnormal cases.

The compaction job should run at another thread not the service thread, because 
it may be a long time job.

So we should provide a method to check the job's status.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to