# Refactor Alter Job 

Alter Job mainly includes Schema Change and Rollup.


## Current problems


1. Job's execution is slow


After all tasks are sent to BE, they are queued in the specified thread pool. 
Before each task is executed, it must wait for the transactions(txn) running on 
the current BE to complete before proceeding with the processing. In this way, 
the total waiting time of each task will prolong the completion time of the 
whole Job.


2. The state of FINISHING is uncontrollable


Alter Job has a FINISHING status before it is finally completed. This state is 
that Job can enter FINISHED state only after waiting for the transactions that 
are running in the current database to complete. And Job can't be cancelled in 
this state. This leads to the risk that Job will be stuck in FINISHING status 
all the time in some corner cases.


3. Relational Chain of Tablet on BE


At present, when Tablet on BE performs alter task, it maintains a Schema Change 
Status from source tablet to destination tablet. It is mainly used to inform 
import task that it needs to convert one data to destination tablet at the same 
time when receiving load task. Maintenance of this relationship chain adds a 
state to BE. And the chain needs to be cleaned up by sending clear tasks 
through the Job in FINISHING state. Maintenance complexity is high.


4. Schema Change and Rollup have different processing logic


Rollup is the production of a new batch of tablets, and Schema Change is the 
production of a new schema hash for each tablet. This leads to the fact that 
although most of the logic for both jobs is the same, because of this 
difference, many processes need to be considered separately and the cost of 
code maintenance is high.




We hope to solve the above problems by refactoring and provide a unified and 
relatively loose process for BE. This method can also provide convenience for 
subsequent storage computing separation design.


## Design scheme


We use FE to unify the relationship between Alter job and the currently 
executing transaction.


1. Find a time point X after Alter Job starts. With X as the demarcation line, 
transactions before X may only import old tables, or may import both old and 
new tables. Transactions after X will certainly import both old and new tables. 
Alter task was sent to BE after X.


2. When Alter task is executed in BE, it only needs to be responsible for 
converting the historical data of the specified version, and does not need to 
be related to other transactions being executed. Because of the first step, the 
historical version of task that needs to be converted must be valid and smaller 
than the current version of the transaction being executed. So they don't 
interfere with each other.


3. The tablet on BE no longer needs to record the relationship chain, because 
BE is no longer needed for data conversion from the source tablet to the 
destination tablet in the load transaction. Loading data generated by a 
transaction, either for a single tablet, or for both old and new tablets at the 
same time.


4. New tablets are also generated for Schema Change jobs. So for Alter task of 
BE, the logic is the same.


--
此致!Best Regards
陈明雨 Mingyu Chen

Email:
[email protected]

Reply via email to