killxdcj opened a new issue #5830:
URL: https://github.com/apache/incubator-doris/issues/5830


   ## Backgroud
   
   For high-frequency import and high-frequency Compaction scenarios, the 
number of versions of the Tablet will be large(maybe thousands), and the 
version graph will be more complex. In this case, the current implementation of 
VersionGraph.capture_consistent_versions will have relatively large performance 
problems.
   
   For example, the flowing is a flame diagram of one of our online scenes
   
   
![image](https://user-images.githubusercontent.com/8370710/118458746-a388e080-b72d-11eb-863d-2aee6f282f66.png)
   
   After analyzing the relevant code, find the current version capture 
algorithm is BFS, and there are two main reasons for the poor performance
   1. Space complexity is O (n), which is related to the versions number of 
Tablet
   2. In most scenarios, the version retrieving needs to traverse all versions.
   
   Suppose StreamLoad is performed once in 1 second and Compaction is performed 
once in 2 seconds, the version graph of the tablet may be as the following 
figure (the side represents rowset)
   
   
![image](https://user-images.githubusercontent.com/8370710/118459916-ca93e200-b72e-11eb-882b-d70ce72350bc.png)
   
   If we want to retrieve the version (0 -> 8), the retrieval process is as 
follows (we need to traverse all nodes and edges)
   
   `0 -> (1, 2, 4, 6) -> (3, 5, 7) -> (8)`
   
   ## Solution
   
   One solution is to change the retrieval algorithm from BFS to greedy 
strategy. when retrieving, we only find the version with the largest step size 
in each time, so that the space complexity can be reduced to O (1), and the 
nodes that need to be traversed when retrieving are also greatly reduced.
   
   For example, for the above scenario, the retrieval process is as follows:
   
   `0 -> 6 -> 7 -> 8`
   
   The following is the flame diagram of BE in the same scene after switching 
to greedy strategy. The proportion of capture_consistent_versions has almost 
disappeared.
   
   
![image](https://user-images.githubusercontent.com/8370710/118462684-89e99800-b731-11eb-9f94-454a07c8824b.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to