[ 
https://issues.apache.org/jira/browse/KUDU-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shenxingwuying updated KUDU-3364:
---------------------------------
    Description: 
h1. Scenanios

In general, I am talking about a category of problem.

There are some periodic tasks or automatically triggered scheduling tasks in 
kudu. 

For example, automatic rebalance of cluster data, some GC task and compaction 
tasks.

Their implementation is by kudu Thread, maybe std::thread or ThreadPool, the 
really task internally periodic scheduled or internally strategy to trigge 
execution. 

They are all internal, we cann't do some.

In fact, we need a method our control to trigge the above types of actions.

In general, I am talking about a category of problem. 
Some scenarios is significant.

Below is examples:

 
h2. data rebalance

There are two rebalance ways:

1. enable auto rebalance
2. use rebalance tool 1.14 before.

The two ways maybe exist some conflicts at opeations race, because rebalance 
tool' logic is a litte complex at tool and auto rebalance is running at master.

In future, auto rebalance at master will become very steady and become the main 
way for data rebalance. And at the same time, admin opers need a external 
trigger the rebalance just like auto rebalance.

But, now auto rebalance is running in a thread and by time period.
Although we can add a api for MasterService, but the api is synchronize, and 
will cose very much, we need a asynchronized method to trigger the rebalance.
h2. auto compaction

Another example is auto compaction,
I have found compaction strategy is not always valid, so maybe we need a method 
 controlled by admin users to triggle compaction.

If we can do a RowSetInCompaction, we need not restart the kudu cluster.
h1.  
h1. My Solution

Add a timer in ThreadPool. This timer is a worker thread that schedules tasks 
to the specified thread according to time.

We can limit only SERIAL ThreadPoolToken can enable TimerThread.
Pseudo code expresses my intention:
{code:java}
//代码占位符
class TimerThread {
class Task {         
ThreadPoolToken token;         
std::function<void()> f;     
};
    
void Schedule(Task task, int delay_ms) {         
  tasks_.insert(...);     
}
void RunLoop() {
  while (...) {
    SleepFor(100ms);
    tasks = FindTasks();
    for (auto task : tasks) {
      token = task.token;
      token->Submit(task.f);
      tasks_.erase...             
    }
  }
}
  scoped_refptr<Thread> thread_;
  std::multimap<MonoTime, Task>  tasks;
};

class ThreadPool{  
...  
TimerThread* timer_;
... 
};

class ThreadPoolToken {
  void Scheduler();      
};{code}
This scheme can be compatible with the previous ThreadPool, and timer is 
nullptr by default.

For periodic tasks, We can use a Control ThreadPool with timer to refact some 
codes to make them more clear, to avoid the problem of too many single threads 
in the past.

  was:
# Scenanios
In general, I am talking about a category of problem.

There are some periodic tasks or automatically triggered scheduling tasks in 
kudu. 

For example, automatic rebalance of cluster data, some GC task and compaction 
tasks.

Their implementation is by kudu Thread, maybe std::thread or ThreadPool, the 
really task internally periodic scheduled or internally strategy to trigge 
execution. 

They are all internal, we cann't do some.

In fact, we need a method our control to trigge the above types of actions.

In general, I am talking about a category of problem. 
Some scenarios is significant.

Below is examples:
## data rebalance
There are two rebalance ways:

1. enable auto rebalance
2. use rebalance tool 1.14 before.

The two ways maybe exist some conflicts at opeations race, because rebalance 
tool' logic is a litte complex at tool and auto rebalance is running at master.

In future, auto rebalance at master will become very steady and become the main 
way for data rebalance. And at the same time, admin opers need a external 
trigger the rebalance just like auto rebalance.

But, now auto rebalance is running in a thread and by time period.
Although we can add a api for MasterService, but the api is synchronize, and 
will cose very much, we need a asynchronized method to trigger the rebalance.

## auto compaction

Another example is auto compaction,
I have found compaction strategy is not always valid, so maybe we need a method 
 controlled by admin users to triggle compaction.

If we can do a RowSetInCompaction, we need not restart the kudu cluster.


# My Solution
Add a timer in ThreadPool. This timer is a worker thread that schedules tasks 
to the specified thread according to time.

We can limit only SERIAL ThreadPoolToken can enable TimerThread.
Pseudo code expresses my intention:
```
class TimerThread {
    class Task {
        ThreadPoolToken token;
        std::function<void()> f;
    }
    
    void Schedule(Task task, int delay_ms) {
        tasks_.insert(...);
    }
    void RunLoop() {
        while (...) {
            SleepFor(100ms);
            
            tasks = FindTasks();
            for (auto task : tasks) {
                token = task.token;
                token->Submit(task.f);
                tasks_.erase...
            }
        }
    }

  scoped_refptr<Thread> thread_;
  std::multimap<MonoTime, Task>  tasks;
};

class ThreadPool {
 ...
 TimerThread* timer_;   
 ...
}

class ThreadPoolToken {
    
    void Scheduler();
    
}

```

This scheme can be compatible with the previous ThreadPool, and timer is 
nullptr by default.

For periodic tasks, We can use a Control ThreadPool with timer to refact some 
codes to make them more clear, to avoid the problem of too many single threads 
in the past.


> ThreadPool Timer to execute some Periodic tasks and multi control send tasks
> ----------------------------------------------------------------------------
>
>                 Key: KUDU-3364
>                 URL: https://issues.apache.org/jira/browse/KUDU-3364
>             Project: Kudu
>          Issue Type: New Feature
>            Reporter: shenxingwuying
>            Assignee: shenxingwuying
>            Priority: Minor
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> h1. Scenanios
> In general, I am talking about a category of problem.
> There are some periodic tasks or automatically triggered scheduling tasks in 
> kudu. 
> For example, automatic rebalance of cluster data, some GC task and compaction 
> tasks.
> Their implementation is by kudu Thread, maybe std::thread or ThreadPool, the 
> really task internally periodic scheduled or internally strategy to trigge 
> execution. 
> They are all internal, we cann't do some.
> In fact, we need a method our control to trigge the above types of actions.
> In general, I am talking about a category of problem. 
> Some scenarios is significant.
> Below is examples:
>  
> h2. data rebalance
> There are two rebalance ways:
> 1. enable auto rebalance
> 2. use rebalance tool 1.14 before.
> The two ways maybe exist some conflicts at opeations race, because rebalance 
> tool' logic is a litte complex at tool and auto rebalance is running at 
> master.
> In future, auto rebalance at master will become very steady and become the 
> main way for data rebalance. And at the same time, admin opers need a 
> external trigger the rebalance just like auto rebalance.
> But, now auto rebalance is running in a thread and by time period.
> Although we can add a api for MasterService, but the api is synchronize, and 
> will cose very much, we need a asynchronized method to trigger the rebalance.
> h2. auto compaction
> Another example is auto compaction,
> I have found compaction strategy is not always valid, so maybe we need a 
> method  controlled by admin users to triggle compaction.
> If we can do a RowSetInCompaction, we need not restart the kudu cluster.
> h1.  
> h1. My Solution
> Add a timer in ThreadPool. This timer is a worker thread that schedules tasks 
> to the specified thread according to time.
> We can limit only SERIAL ThreadPoolToken can enable TimerThread.
> Pseudo code expresses my intention:
> {code:java}
> //代码占位符
> class TimerThread {
> class Task {         
> ThreadPoolToken token;         
> std::function<void()> f;     
> };
>     
> void Schedule(Task task, int delay_ms) {         
>   tasks_.insert(...);     
> }
> void RunLoop() {
>   while (...) {
>     SleepFor(100ms);
>     tasks = FindTasks();
>     for (auto task : tasks) {
>       token = task.token;
>       token->Submit(task.f);
>       tasks_.erase...             
>     }
>   }
> }
>   scoped_refptr<Thread> thread_;
>   std::multimap<MonoTime, Task>  tasks;
> };
> class ThreadPool{  
> ...  
> TimerThread* timer_;
> ... 
> };
> class ThreadPoolToken {
>   void Scheduler();      
> };{code}
> This scheme can be compatible with the previous ThreadPool, and timer is 
> nullptr by default.
> For periodic tasks, We can use a Control ThreadPool with timer to refact some 
> codes to make them more clear, to avoid the problem of too many single 
> threads in the past.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to