You could do that enhancement solely in the planning layer, without any changes to the index scan physical operator. Transform
WHERE UPPER(name) = ‘DRUID’ to WHERE (name LIKE ‘D%’ OR name LIKE ‘d%’) AND UPPER(name) = ‘DRUID’ and hopefully the cost model recognizes that the first term is very selective. How to decide whether the OR should contain 2, 4, 8, … 2^N factors? A simple heuristic on the number of entries in the index should suffice. Julian > On Apr 11, 2025, at 10:49 AM, Gian Merlino <g...@apache.org> wrote: > > Druid does use ordinary unicode collation, and it isn't currently > configurable. For that reason, we can, and do, use range scans for e.g. "LIKE > 'Druid%'". > > Although your comment makes me think. Currently I believe we scan the full > index to satisfy a filter like "UPPER(name) = 'Druid'". In principle we don't > need to do that, we could instead scan all the entries that start with 'd' or > 'D'. (Or perhaps 'dr', 'Dr', 'dR', and 'DR'? I'm not sure where the benefit > ends of doing something like this. Possibly when the cardinality of possible > matches approaches or exceeds the cardinality of the index.) > > Could be an interesting enhancement. > > Gian > > On 2025/04/11 16:33:13 Julian Hyde wrote: >> Gian, >> >> Does the index use ordinary unicode collation (so that ‘d’ < ‘e’ … ‘z’ < >> ‘A’ < ‘D’, and therefore ‘druid’ < ‘dRuid’ < ‘fluid’ < ‘Druid’ < ‘DRUID’)? >> >> There’s another collation where ‘d’ < ‘D’ < ‘e’ < ‘E’ etc. This puts >> ‘druid’, ‘dRuid’, ‘Druid’, ‘DRUID’ next to each other, so that the query >> "UPPER(name) = ‘DRUID’” can be answered with a very small range scan. >> >> Julian >> >> >>> On Apr 11, 2025, at 9:24 AM, Gian Merlino <g...@apache.org> wrote: >>> >>> String indexes in Druid are case-sensitive. UPPER (and LOWER) can use the >>> index, but they must do it through index scans, rather than point index >>> lookups. So index usage will be less efficient. >>> >>> Gian >>> >>> On 2025/04/09 18:50:45 vikas thakur wrote: >>>> On Wed, Apr 9, 2025 at 2:49 PM vikas thakur <vikas.thaku...@gmail.com> >>>> wrote: >>>> >>>>> Can someone help with logic of upper sql function of Druid ? What is >>>>> underlying implementation, does it use case insensitive indexes ? >>>>> Currently, benchmarking performance with upper function and standardizing >>>>> the casing for APIs. >>>>> >>>>> Regards, >>>>> VT >>>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org >>> For additional commands, e-mail: dev-h...@druid.apache.org >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org >> For additional commands, e-mail: dev-h...@druid.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org > For additional commands, e-mail: dev-h...@druid.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org