killxdcj opened a new issue #5830: URL: https://github.com/apache/incubator-doris/issues/5830
## Backgroud For high-frequency import and high-frequency Compaction scenarios, the number of versions of the Tablet will be large(maybe thousands), and the version graph will be more complex. In this case, the current implementation of VersionGraph.capture_consistent_versions will have relatively large performance problems. For example, the flowing is a flame diagram of one of our online scenes  After analyzing the relevant code, find the current version capture algorithm is BFS, and there are two main reasons for the poor performance 1. Space complexity is O (n), which is related to the versions number of Tablet 2. In most scenarios, the version retrieving needs to traverse all versions. Suppose StreamLoad is performed once in 1 second and Compaction is performed once in 2 seconds, the version graph of the tablet may be as the following figure (the side represents rowset)  If we want to retrieve the version (0 -> 8), the retrieval process is as follows (we need to traverse all nodes and edges) `0 -> (1, 2, 4, 6) -> (3, 5, 7) -> (8)` ## Solution One solution is to change the retrieval algorithm from BFS to greedy strategy. when retrieving, we only find the version with the largest step size in each time, so that the space complexity can be reduced to O (1), and the nodes that need to be traversed when retrieving are also greatly reduced. For example, for the above scenario, the retrieval process is as follows: `0 -> 6 -> 7 -> 8` The following is the flame diagram of BE in the same scene after switching to greedy strategy. The proportion of capture_consistent_versions has almost disappeared.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
