wangbo opened a new issue #7729:
URL: https://github.com/apache/incubator-doris/issues/7729


   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Description
   
   I find there is room for optimization when reading value from dict in 
```BinaryDictPageDecoder```.
   
   
   ### Solution
   
   The method  call ```string_at_index``` can be eliminated when read dict in 
```BinaryDictPageDecoder::next_batch```.
   # Performance Test
   env: 2 be, 1 fe
   data: ssb 100GB
   sql:
   ```
   SELECT
      (LO_ORDERDATE DIV 10000) AS year,
       C_NATION,
       sum(LO_REVENUE - LO_SUPPLYCOST) AS profit
   FROM lineorder_flat
   WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND P_MFGR in ( 'MFGR#1' 
, 'MFGR#2')
   GROUP BY
       year,
       C_NATION
   ORDER BY
       year ASC,
       C_NATION ASC;
   ```
   ## Test result
   run sql many times and pick the fast result.
   
   before
   ```
           SegmentIterator:
              - BitmapIndexFilterTimer: 252.515us
              - BlockLoadTime: 39s471ms
              - BlockSeekCount: 4.49M
              - BlockSeekTime: 5s822ms
              - BlocksLoad: 293.06K
              - CachedPagesNum: 124.89K
              - CompressedBytesRead: 0
              - DecompressorTimer: 0.000ns
              - IOTimer: 0.000ns
              - IndexLoadTime_V1: 0.000ns
              - NumSegmentFiltered: 0
              - NumSegmentTotal: 168
              - RawRowsRead: 300.00M
              - RowsBitmapIndexFiltered: 0
              - RowsBloomFilterFiltered: 0
              - RowsConditionsFiltered: 0
              - RowsKeyRangeFiltered: 0
              - RowsStatsFiltered: 0
              - RowsVectorPredFiltered: 295.20M
              - TotalPagesNum: 124.89K
              - UncompressedBytesRead: 0
              - VectorPredEvalTime: 6s631ms
   ```
   
   after
   ```
           SegmentIterator:
              - BitmapIndexFilterTimer: 292.656us
              - BlockLoadTime: 36s734ms
              - BlockSeekCount: 4.49M
              - BlockSeekTime: 5s995ms
              - BlocksLoad: 293.06K
              - CachedPagesNum: 124.89K
              - CompressedBytesRead: 0
              - DecompressorTimer: 0.000ns
              - IOTimer: 0.000ns
              - IndexLoadTime_V1: 0.000ns
              - NumSegmentFiltered: 0
              - NumSegmentTotal: 168
              - RawRowsRead: 300.00M
              - RowsBitmapIndexFiltered: 0
              - RowsBloomFilterFiltered: 0
              - RowsConditionsFiltered: 0
              - RowsKeyRangeFiltered: 0
              - RowsStatsFiltered: 0
              - RowsVectorPredFiltered: 295.20M
              - TotalPagesNum: 124.89K
              - UncompressedBytesRead: 0
              - VectorPredEvalTime: 6s632ms
   ```
   ```- BlockLoadTime: 39s471ms``` -> ```BlockLoadTime: 36s734ms```
   
   ## Price to pay
   This optimization need to add two fields to keep start_offset and 
len_offset,  so more memory may be used
   
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to