zenoyang opened a new pull request #8737: URL: https://github.com/apache/incubator-doris/pull/8737
# Proposed changes Issue Number: close #8315 Use column dictionary only when all pages are dict encoding. ## Problem Summary: The current storage layer creates a `ColumnDictionary` type for string columns with predicates. When the `BinaryDictPageDecoder` reads the PLAIN_ENCODING data_page, it converts the `ColumnDictionary` to a `PredicateColumn`, causing unnecessary overhead. Especially when the string columns are all high cardinality. Therefore, when `SegmentIterator::init` is executed, the last data_page of the string column is read in advance. If the last data_page is DICT_ENCODING, it means that all data_pages are DICT_ENCODING. `ColumnDictionary` is used to optimize queries only if all data_pages of a string column are DICT_ENCODING, and the column has `comparison` or `in_list` predicates. ## Checklist(Required) 1. Does it affect the original behavior: (Yes/No/I Don't know) 2. Has unit tests been added: (Yes/No/No Need) 3. Has document been added or modified: (Yes/No/No Need) 4. Does it need to update dependencies: (Yes/No) 5. Are there any changes that cannot be rolled back: (Yes/No) ## Further comments If this is a relatively large or complex change, kick off the discussion at [[email protected]](mailto:[email protected]) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
