Xiaoccer opened a new pull request, #16031:
URL: https://github.com/apache/doris/pull/16031
# Proposed changes
Issue Number: close #xxx
## Problem summary
Add bthread to separate the logic of IO and computation when executing the
OlapScanner, which can speed up to access those sql that are already cached.
* In OlapScanner's execution chain, change
std::mutex/std::condition_variable to bthread::Mutex/bthread::ConditionVariable
* Add class of AsyncIO to separate task from bthread to pthread
* Change the usage of reader's/filesytem's interface when using bthread
## Performance test
* 1be+1fe
* base: commit_id-bd2280b4ce702e24cc31ca1d379aeaf6f00ce69c
### ssbf100 benchmark
| sql | base(s) | bthread(s) |
| ---- | ------- | ---------- |
| q1.1 | 0.92 | 0.904 |
| q1.2 | 1.122 | 1.126 |
| q1.3 | 0.052 | 0.052 |
| q2.1 | 12.041 | 12.072 |
| q2.2 | 0.741 | 0.705 |
| q2.3 | 0.635 | 0.671 |
| q3.1 | 3.492 | 3.422 |
| q3.2 | 0.46 | 0.47 |
| q3.3 | 0.649 | 0.679 |
| q3.4 | 0.089 | 0.087 |
| q4.1 | 4.358 | 4.685 |
| q4.2 | 0.519 | 0.572 |
| q4.3 | 0.457 | 0.463 |
The performance of using bthread and using pthread is almost the same.
### cached read
Description: First execute q1.1.sql to cache data, and then execute
q1.1.sql、q2.1.sql and q4.1.sql concurrently.
| sql | base(s) | bthread(s) |
| ------------------------- | ------- | ---------- |
| first time q1.1 | 2.673 | 2.721 |
| second time (cached) q1.1 | 13.441 | **0.206** |
| first time q2.1 | 53.846 | 53.793 |
| first time q4.1 | 53.864 | 53.903 |
When using bthread, If the data of sql has been cached, the result of sql
can be returned fast without waiting the free thread of thread pool.
## Checklist(Required)
1. Does it affect the original behavior:
- [x] Yes
- [ ] No
- [ ] I don't know
2. Has unit tests been added:
- [ ] Yes
- [x] No
- [ ] No Need
3. Has document been added or modified:
- [ ] Yes
- [x] No
- [ ] No Need
4. Does it need to update dependencies:
- [ ] Yes
- [x] No
5. Are there any changes that cannot be rolled back:
- [ ] Yes (If Yes, please explain WHY)
- [x] No
## Further comments
If this is a relatively large or complex change, kick off the discussion at
[[email protected]](mailto:[email protected]) by explaining why you
chose the solution you did and what alternatives you considered, etc...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]