Dear ShardingSphere Community,

I am glad to discuss a new design for Sharding-Scaling Job.
Same issue in GitHub 
[Issue#5106](https://github.com/apache/incubator-shardingsphere/issues/5106).



## Current Job Design




Sharding-Scaling will extract `datasources` and `actualDataNodes` information 
from ShardingSphere's sharding configuration to generate Sharding-Scaling jobs. 
Each complete ShardingSphere configuration will generate a complete 
Sharding-Scaling job.




After checking datasources, Sharding-Scaling job will do the first time job 
spliting according to the datasource. Each datasource will be one task of 
Sharding-Scaling job.




In datasource sub-task, it also will be splited to inventory task(or named 
history task) and incremental task (or named real-time task). What's more, the 
inventory task will still be split further through the tables, and maybe then 
splited by primary key.




Job structure like following:




```

Sharding-Scaling Job

|--- Datasource1 Sub-task

||--- Inventory Task

|||--- TableA Task

|||--- TableA Task shard #1

|||--- TableA Task shard #2

||.....

|||--- TableB Task

||.....

||--- Incremental Task

|--- Datasource2 Sub-task

.....

``` 




For each table task without sharding, table task shard and incremental task, 
there are at lease two thread to execute the task. One is `Reader` which query 
or subscribe change data and the other `Writer` which write data into targer 
ShardingSphere.




The design is not friendly for distribution:




- The leaf task may be the second layer(incremental task), third layer(table 
task without sharding and fourth layer(table task shard).

- Concurrency or thread pool size is hard to control.

- Limited thread pool may cause problem 
[Issue#5092](https://github.com/apache/incubator-shardingsphere/issues/5092)




## New Design




In new design, the Sharding-Scaling job will not be splited with logic of 
database structure rather by the logic of distributed jobs.




```

Sharding-Scaling Job

|--- Inventory Task

|--- Inventory Task #1

||--- Inner Inventory Task For ds1.table1

||--- Inner Inventory Task For ds1.table2 #1

|--- Inventory Task #2

||--- Inner Inventory Task For ds1.table2 #2

||--- Inner Inventory Task For ds2.table3

......

|--- Incremental Task

|--- Incremental Task #1 For ds1

|--- Incremental Task #2 For ds2

......

```




In new design, Sharding-Scaling split job to inventory and incremental first, 
and then split inventory task by `job concurrency` users configured. The leaf 
task is `Inventory Task #n` and `Incremental Task #n`, which can be equally 
distributed to different Sharding-Scaling nodes.




The inventory table task and inventory table task shard in old design will be 
as inner task of `Inventory Task #n`, and exeucte one by one.




For `Reader` and `Writer`, they are similar to old design. But for new design, 
the thread pool size and concurrency become easier to understand and configure. 
Users only need to understand `job concurrency` property.




Todo list:




- [ ] Rename `Reader` and `Writer` 
[Issue#4595](https://github.com/apache/incubator-shardingsphere/issues/4595)

- [ ] Rename `HistoryTask` and `RealtimeTask`

- [ ] Refactor `ExecuteEngine` in Sharding-Scaling

- [ ] Refactor `Inventory Task`

- [ ] Refactor `TaskController`

- [ ] Refactor `JobController` 




I think the new design is better, and more suitable for development of 
Sharding-Scaling.




Any suggestions? 







--

Yi Yang(Sion)
Apache ShardingSphere

Reply via email to