wangbo opened a new issue #7771:
URL: https://github.com/apache/incubator-doris/issues/7771


   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Description
   
   After SegmentIterator Vectorization PR merged, there is still some todo for 
it;
   This ISSUE tried to solve some performance problems.
   
   ### Solution
   
   Test SQL
   ```
   SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
   FROM lineorder_flat
   WHERE LO_ORDERDATE >= 19930101 and LO_ORDERDATE <= 19931231 AND LO_DISCOUNT 
BETWEEN 1 AND 3 AND LO_QUANTITY < 25;
   ```
   
   Initial performance test:
   
   ```
   code version:SegmentIterator row version
   - BlockLoadTime: 3s687ms
   - VectorPredEvalTime: 778.640ms
   - BlockSeekCount: 5.36M
   
   
   code version: SegmentIterator vectorization
   - BlockLoadTime: 4s140ms
   - VectorPredEvalTime: 256.926ms
   - BlockSeekCount: 5.36M
   
   
   ```
   Analysis
   1 After ```SegIter``` is vectorized, the performance is dropped.
   2 The predicate calculation performance is indeed improved, but the overall 
impact is not large
   3 ```BlockSeekCount``` is too big, it can be optimized.
   
   Optimization 1:  remove timer ```BlockSeekTime```
   - BlockLoadTime: 3s512ms
   
   Optimization 2(based on opt 1):  Batch insert column vector in 
```BitShufflePageDecoder.next_batch```
   - BlockLoadTime: 3s105ms
   
   Optimization 3(based on opt1, opt2): eliminate lazy materialization
   - BlockLoadTime: 2s641ms
   - BlockSeekCount: 175.02K
   We can see ```BlockSeekCount``` reduced much.
   
   Optimization 4(based on op1, opt2, opt3): set 
doris_scanner_thread_pool_thread_num = 1
   - BlockLoadTime: 1s665ms
   Performance is further improved, but the whole sql may cost more time.
   Then I wonder whether original version has the same problem
   
   Origin Version Test: set doris_scanner_thread_pool_thread_num = 1 vs default 
value
   
   ```
   set doris_scanner_thread_pool_thread_num = false value
   - BlockLoadTime: 3s571ms
   
   set doris_scanner_thread_pool_thread_num = 1
   - BlockLoadTime: 2s232ms
   ```
   We can see that the origin version has the same problem, this may be related 
to memory allocation under multithreading, this need further research.
   
   I will submit a PR for opt1, opt2, opt3
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to