wangbo opened a new issue #6282:
URL: https://github.com/apache/incubator-doris/issues/6282
**Background**
Currently Doris will make dictionary by default for char/varchar column.
I think we can use dictionary to do some filter work.
For example:
```
select xx from table where columnA = 'a'
```
Storage layer's dictionary as below:
```
origin value | dictionary code
a 1
b 2
c 3
....
```
**filter segment**
After reading segment's dictionary, we can judge whether 'a' exists in the
dictionary.
If not, then the segment can be skipped.
Here range predicate is allowed.
**filter page**
Convert 'a' to dictionary code '1'.
After reading a page, we can using code '1' to do a quick existence judgment.
If code '1' not exists in current page, the page can be skipped.
Only equivalence judgment is allowed here.
**filter row**
Using dictionary code '1' to filter every row in
```BinaryDictPageDecoder::next_batch```;
Only equivalence judgment is allowed here.
**Test environment**
1FE 3BE
ssb data.
sql:
```
select count(distinct p_category) from lineorder_flat WHERE p_category =
'MFGR#12';
```
I run sql many times and pick the fastest one.
No disk read here.
**Test result**
before:
total sql cost:0.35 sec
ReadDictTime: 3s537ms
after:
total sql cost:0.30 sec
ReadDictTime: 1s182ms
**In Conclusion**
This optimization is greatly affected by the data distribution, such as
cardinal.
So I think column statistics is necessary here.
If doris find a sql contians char/varchar condition and the column has good
selectivity, then doris can regard this column as pre-reading column for lazy
materialization.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]