wangbo opened a new issue #6282:
URL: https://github.com/apache/incubator-doris/issues/6282


   **Background**
   Currently Doris will make dictionary by default for char/varchar column.
   I think we can use dictionary to do some filter work.
   For example:
   ```
   select xx from table where columnA = 'a'
   ```
   Storage layer's dictionary as below:
   ```
   origin value | dictionary code
       a                          1
       b                          2
       c                          3
   ....
   ```
   **filter segment**
   After reading segment's dictionary, we can judge whether 'a' exists in the 
dictionary.
   If not, then the segment can be skipped.
   Here range predicate is allowed.
   **filter page**
   Convert 'a' to dictionary code '1'.
   After reading a page, we can using code '1' to do a quick existence judgment.
   If code '1' not exists in current page, the page can be skipped.
   Only equivalence judgment is allowed here.
   **filter row**
   Using dictionary code '1' to filter every row in 
```BinaryDictPageDecoder::next_batch```;
   Only equivalence judgment is allowed here.
   
   **Test environment**
   1FE 3BE
   ssb data.
   sql:
   ```
   select count(distinct p_category) from lineorder_flat WHERE p_category = 
'MFGR#12';
   ```
   I run sql many times and pick the fastest one.
   No disk read here.
   
   **Test result**
   before:
   total sql cost:0.35 sec
   ReadDictTime: 3s537ms
   
   after:
   total sql cost:0.30 sec
   ReadDictTime: 1s182ms
   
   **In Conclusion**
   This optimization is greatly affected by the data distribution, such as 
cardinal.
   So I think column statistics is necessary here.
   If doris find a sql contians char/varchar condition and the column has good 
selectivity, then doris can regard this column as pre-reading column for lazy 
materialization.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to