wangbo opened a new issue #6281:
URL: https://github.com/apache/incubator-doris/issues/6281


   Recently I did some performance test for storage layer and find there still 
has some optimization room.
   First I tried to remove bitshuffle encode for Doris.
   Because I find when a page is read/write from disk, it was 
compress/decompress by ```PageIO```.
   Then after this page goes through 
```BitshufflePageBuilder/BitShufflePageDecoder```, it was compress/decompress 
secondly.
   
   **Test Environment**
   1FE,3BE
   code version: apache doris 0.13
   test Data: ssb data
   sql:
   ```
   SELECT sum(lo_revenue) , year(lo_orderdate) AS year, p_brand FROM 
lineorder_flat WHERE p_category = 'MFGR#12' AND s_region = 'AMERICA' GROUP BY 
year, p_brand ORDER BY year, p_brand; 
   ```
   
   **Code modification**
   Remove ```bitshuffle compress``` from ```BitShufflePageDecoder::_decode``` 
and ```BitshufflePageBuilder::_finish```.
   
   **Test Result 1 : query performance**
   
   ```BitShufflePageDecodeTime``` is BitShufflePageDecoder::_decode 's time 
cost;
   I run sql for many times and pick the fastest sql.
   No read disk happens here.
   
   before:
   ```
        - RawRowsRead: 208.64M
        - BitShufflePageDecodeTime: 4s825ms
        - BlockLoadTime: 20s201ms
   
        - RawRowsRead: 205.78M
        - BlockLoadTime: 21s272ms
        - BitShufflePageDecodeTime: 5s051ms
   
           - RawRowsRead: 185.62M
        - BlockLoadTime: 17s510ms
        - BitShufflePageDecodeTime: 4s156ms
        
   ```
   
   after
   ```
        - RawRowsRead: 211.40M
        - BitShufflePageDecodeTime: 1s116ms
        - BlockLoadTime: 17s114ms
        
   
        - RawRowsRead: 200.01M
        - BlockLoadTime: 17s008ms
        - BitShufflePageDecodeTime: 1s047ms
        
        - RawRowsRead: 188.62M
        - BlockLoadTime: 14s571ms
        - BitShufflePageDecodeTime: 975.978ms
   ```
   We can see that both ```BlockLoadTime``` and ```BitShufflePageDecodeTime```  
has improved.
   
   **Test Result 2 : Storage**
   before:
   ```
   | lineorder_flat        | 58.866 GB   | 336   
   ```
   after:
   ```
   | lineorder_flat | 71.685 GB   | 336          |
   ```
   Abort 22% increase in storage. This shows that the current compression 
algorithm is still effective.
   
   **Todo**
   I think compression algorithm is quite important here.
   Two things to try in the future:
   1 Find a better performance compression algorithm for doris,balance 
performance and space usage
   2 Regard ```Make bitshuffle encode optional``` as an experiment feature.
   Because this is just a simple test, whether there are other effects still 
needs more verification.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to