You could do that enhancement solely in the planning layer, without any changes 
to the index scan physical operator. Transform

  WHERE UPPER(name) = ‘DRUID’

to

  WHERE (name LIKE ‘D%’ OR name LIKE ‘d%’)
    AND UPPER(name) = ‘DRUID’

and hopefully the cost model recognizes that the first term is very selective.

How to decide whether the OR should contain 2, 4, 8, … 2^N factors? A simple 
heuristic on the number of entries in the index should suffice.

Julian

> On Apr 11, 2025, at 10:49 AM, Gian Merlino <g...@apache.org> wrote:
> 
> Druid does use ordinary unicode collation, and it isn't currently 
> configurable. For that reason, we can, and do, use range scans for e.g. "LIKE 
> 'Druid%'".
> 
> Although your comment makes me think. Currently I believe we scan the full 
> index to satisfy a filter like "UPPER(name) = 'Druid'". In principle we don't 
> need to do that, we could instead scan all the entries that start with 'd' or 
> 'D'. (Or perhaps 'dr', 'Dr', 'dR', and 'DR'? I'm not sure where the benefit 
> ends of doing something like this. Possibly when the cardinality of possible 
> matches approaches or exceeds the cardinality of the index.)
> 
> Could be an interesting enhancement.
> 
> Gian
> 
> On 2025/04/11 16:33:13 Julian Hyde wrote:
>> Gian,
>> 
>> Does the index use ordinary unicode collation (so that ‘d’  < ‘e’ … ‘z’ < 
>> ‘A’ < ‘D’, and therefore ‘druid’ < ‘dRuid’ < ‘fluid’ < ‘Druid’ < ‘DRUID’)?
>> 
>> There’s another collation where ‘d’ < ‘D’ < ‘e’ < ‘E’ etc. This puts 
>> ‘druid’, ‘dRuid’, ‘Druid’, ‘DRUID’ next to each other, so that the query 
>> "UPPER(name) = ‘DRUID’” can be answered with a very small range scan.
>> 
>> Julian
>> 
>> 
>>> On Apr 11, 2025, at 9:24 AM, Gian Merlino <g...@apache.org> wrote:
>>> 
>>> String indexes in Druid are case-sensitive. UPPER (and LOWER) can use the 
>>> index, but they must do it through index scans, rather than point index 
>>> lookups. So index usage will be less efficient.
>>> 
>>> Gian
>>> 
>>> On 2025/04/09 18:50:45 vikas thakur wrote:
>>>> On Wed, Apr 9, 2025 at 2:49 PM vikas thakur <vikas.thaku...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Can someone help with logic of upper sql function of Druid ? What is
>>>>> underlying implementation, does it use case insensitive indexes ?
>>>>> Currently, benchmarking performance with upper function and standardizing
>>>>> the casing for APIs.
>>>>> 
>>>>> Regards,
>>>>> VT
>>>>> 
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
>>> For additional commands, e-mail: dev-h...@druid.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
>> For additional commands, e-mail: dev-h...@druid.apache.org
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> For additional commands, e-mail: dev-h...@druid.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org

Reply via email to