zenoyang opened a new pull request #8737:
URL: https://github.com/apache/incubator-doris/pull/8737


   # Proposed changes
   
   Issue Number: close #8315 
   Use column dictionary only when all pages are dict encoding.
   
   ## Problem Summary:
   
   The current storage layer creates a `ColumnDictionary` type for string 
columns with predicates. When the `BinaryDictPageDecoder` reads the 
PLAIN_ENCODING data_page, it converts the `ColumnDictionary` to a 
`PredicateColumn`, causing unnecessary overhead. Especially when the string 
columns are all high cardinality.
   
   Therefore, when `SegmentIterator::init` is executed, the last data_page of 
the string column is read in advance. If the last data_page is DICT_ENCODING, 
it means that all data_pages are DICT_ENCODING.
   `ColumnDictionary` is used to optimize queries only if all data_pages of a 
string column are DICT_ENCODING, and the column has `comparison` or `in_list` 
predicates.
   
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[[email protected]](mailto:[email protected]) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to