You could do that enhancement solely in the planning layer, without any changes
to the index scan physical operator. Transform
WHERE UPPER(name) = ‘DRUID’
to
WHERE (name LIKE ‘D%’ OR name LIKE ‘d%’)
AND UPPER(name) = ‘DRUID’
and hopefully the cost model recognizes that the first term is very selective.
How to decide whether the OR should contain 2, 4, 8, … 2^N factors? A simple
heuristic on the number of entries in the index should suffice.
Julian
> On Apr 11, 2025, at 10:49 AM, Gian Merlino <[email protected]> wrote:
>
> Druid does use ordinary unicode collation, and it isn't currently
> configurable. For that reason, we can, and do, use range scans for e.g. "LIKE
> 'Druid%'".
>
> Although your comment makes me think. Currently I believe we scan the full
> index to satisfy a filter like "UPPER(name) = 'Druid'". In principle we don't
> need to do that, we could instead scan all the entries that start with 'd' or
> 'D'. (Or perhaps 'dr', 'Dr', 'dR', and 'DR'? I'm not sure where the benefit
> ends of doing something like this. Possibly when the cardinality of possible
> matches approaches or exceeds the cardinality of the index.)
>
> Could be an interesting enhancement.
>
> Gian
>
> On 2025/04/11 16:33:13 Julian Hyde wrote:
>> Gian,
>>
>> Does the index use ordinary unicode collation (so that ‘d’ < ‘e’ … ‘z’ <
>> ‘A’ < ‘D’, and therefore ‘druid’ < ‘dRuid’ < ‘fluid’ < ‘Druid’ < ‘DRUID’)?
>>
>> There’s another collation where ‘d’ < ‘D’ < ‘e’ < ‘E’ etc. This puts
>> ‘druid’, ‘dRuid’, ‘Druid’, ‘DRUID’ next to each other, so that the query
>> "UPPER(name) = ‘DRUID’” can be answered with a very small range scan.
>>
>> Julian
>>
>>
>>> On Apr 11, 2025, at 9:24 AM, Gian Merlino <[email protected]> wrote:
>>>
>>> String indexes in Druid are case-sensitive. UPPER (and LOWER) can use the
>>> index, but they must do it through index scans, rather than point index
>>> lookups. So index usage will be less efficient.
>>>
>>> Gian
>>>
>>> On 2025/04/09 18:50:45 vikas thakur wrote:
>>>> On Wed, Apr 9, 2025 at 2:49 PM vikas thakur <[email protected]>
>>>> wrote:
>>>>
>>>>> Can someone help with logic of upper sql function of Druid ? What is
>>>>> underlying implementation, does it use case insensitive indexes ?
>>>>> Currently, benchmarking performance with upper function and standardizing
>>>>> the casing for APIs.
>>>>>
>>>>> Regards,
>>>>> VT
>>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]