wangbo opened a new issue #7729: URL: https://github.com/apache/incubator-doris/issues/7729
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues. ### Description I find there is room for optimization when reading value from dict in ```BinaryDictPageDecoder```. ### Solution The method call ```string_at_index``` can be eliminated when read dict in ```BinaryDictPageDecoder::next_batch```. # Performance Test env: 2 be, 1 fe data: ssb 100GB sql: ``` SELECT (LO_ORDERDATE DIV 10000) AS year, C_NATION, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND P_MFGR in ( 'MFGR#1' , 'MFGR#2') GROUP BY year, C_NATION ORDER BY year ASC, C_NATION ASC; ``` ## Test result run sql many times and pick the fast result. before ``` SegmentIterator: - BitmapIndexFilterTimer: 252.515us - BlockLoadTime: 39s471ms - BlockSeekCount: 4.49M - BlockSeekTime: 5s822ms - BlocksLoad: 293.06K - CachedPagesNum: 124.89K - CompressedBytesRead: 0 - DecompressorTimer: 0.000ns - IOTimer: 0.000ns - IndexLoadTime_V1: 0.000ns - NumSegmentFiltered: 0 - NumSegmentTotal: 168 - RawRowsRead: 300.00M - RowsBitmapIndexFiltered: 0 - RowsBloomFilterFiltered: 0 - RowsConditionsFiltered: 0 - RowsKeyRangeFiltered: 0 - RowsStatsFiltered: 0 - RowsVectorPredFiltered: 295.20M - TotalPagesNum: 124.89K - UncompressedBytesRead: 0 - VectorPredEvalTime: 6s631ms ``` after ``` SegmentIterator: - BitmapIndexFilterTimer: 292.656us - BlockLoadTime: 36s734ms - BlockSeekCount: 4.49M - BlockSeekTime: 5s995ms - BlocksLoad: 293.06K - CachedPagesNum: 124.89K - CompressedBytesRead: 0 - DecompressorTimer: 0.000ns - IOTimer: 0.000ns - IndexLoadTime_V1: 0.000ns - NumSegmentFiltered: 0 - NumSegmentTotal: 168 - RawRowsRead: 300.00M - RowsBitmapIndexFiltered: 0 - RowsBloomFilterFiltered: 0 - RowsConditionsFiltered: 0 - RowsKeyRangeFiltered: 0 - RowsStatsFiltered: 0 - RowsVectorPredFiltered: 295.20M - TotalPagesNum: 124.89K - UncompressedBytesRead: 0 - VectorPredEvalTime: 6s632ms ``` ```- BlockLoadTime: 39s471ms``` -> ```BlockLoadTime: 36s734ms``` ## Price to pay This optimization need to add two fields to keep start_offset and len_offset, so more memory may be used ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
