shenxingwuying created KUDU-3364:
------------------------------------

             Summary: ThreadPool Timer to execute some Periodic tasks and multi 
control send tasks
                 Key: KUDU-3364
                 URL: https://issues.apache.org/jira/browse/KUDU-3364
             Project: Kudu
          Issue Type: New Feature
            Reporter: shenxingwuying
            Assignee: shenxingwuying


# Scenanios
In general, I am talking about a category of problem.

There are some periodic tasks or automatically triggered scheduling tasks in 
kudu. 

For example, automatic rebalance of cluster data, some GC task and compaction 
tasks.

Their implementation is by kudu Thread, maybe std::thread or ThreadPool, the 
really task internally periodic scheduled or internally strategy to trigge 
execution. 

They are all internal, we cann't do some.

In fact, we need a method our control to trigge the above types of actions.

In general, I am talking about a category of problem. 
Some scenarios is significant.

Below is examples:
## data rebalance
There are two rebalance ways:

1. enable auto rebalance
2. use rebalance tool 1.14 before.

The two ways maybe exist some conflicts at opeations race, because rebalance 
tool' logic is a litte complex at tool and auto rebalance is running at master.

In future, auto rebalance at master will become very steady and become the main 
way for data rebalance. And at the same time, admin opers need a external 
trigger the rebalance just like auto rebalance.

But, now auto rebalance is running in a thread and by time period.
Although we can add a api for MasterService, but the api is synchronize, and 
will cose very much, we need a asynchronized method to trigger the rebalance.

## auto compaction

Another example is auto compaction,
I have found compaction strategy is not always valid, so maybe we need a method 
 controlled by admin users to triggle compaction.

If we can do a RowSetInCompaction, we need not restart the kudu cluster.


# My Solution
Add a timer in ThreadPool. This timer is a worker thread that schedules tasks 
to the specified thread according to time.

We can limit only SERIAL ThreadPoolToken can enable TimerThread.
Pseudo code expresses my intention:
```
class TimerThread {
    class Task {
        ThreadPoolToken token;
        std::function<void()> f;
    }
    
    void Schedule(Task task, int delay_ms) {
        tasks_.insert(...);
    }
    void RunLoop() {
        while (...) {
            SleepFor(100ms);
            
            tasks = FindTasks();
            for (auto task : tasks) {
                token = task.token;
                token->Submit(task.f);
                tasks_.erase...
            }
        }
    }

  scoped_refptr<Thread> thread_;
  std::multimap<MonoTime, Task>  tasks;
};

class ThreadPool {
 ...
 TimerThread* timer_;   
 ...
}

class ThreadPoolToken {
    
    void Scheduler();
    
}

```

This scheme can be compatible with the previous ThreadPool, and timer is 
nullptr by default.

For periodic tasks, We can use a Control ThreadPool with timer to refact some 
codes to make them more clear, to avoid the problem of too many single threads 
in the past.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to