Re: (CAY-1210) mysql does not use index for case insensitive searches

Aristedes Maniatis Tue, 11 Jan 2011 02:29:34 -0800

On 11/01/11 8:58 PM, Andrus Adamchik wrote:

I am taking this to dev, as there's still no clarity in my mind on how to 
approach it.


So the goal here is to optimize case-insensitive (CI) LIKE, and the issue is that MySQL 
doesn't support indexes on transformed columns (e.g. "UPPER(column)"), while it 
does support CI column collations, and hence - CI indexes and CI LIKE. Behavior of other 
DBs is unknown. Oracle supports UPPER(column) indexes, but beyond that we haven't 
researched it.


I've looked at MS-SQL and Derby as well. Derby does not support functional 
indexes, such as UPPER(column). MS-SQL does.

But even if the database does support that index, the problem is the the dba 
has to know to add these additional indexes. Otherwise they might just blame 
Cayenne for being slow compared to 'real' SQL.

Suggested solutions come down to either mapping collation behavior per 
DbAttribute in Cayenne, or trying to determine it in runtime via DB metadata.

My concern with former is that Cayenne model will have too much information 
about the underlying DB, not directly relevant to ORM. So if we are to go this 
way, we need to first demonstrate how this info may be used with other 
databases. E.g if we find that LIKE can be optimized for at least 2-3 other 
major DBs by taking collation case-sensitivity into account, then this solution 
may be worth pursuing.


I am pretty sure this is useful for every DB out there. I've never come across 
one that will optimise UPPER() searches without the user adding another index 
especially to support that. If the collation is case-insensitive, then UPPER is 
simply making things slower without adding functionality.

Postgresql has a column type: citext. That is case insensitive text, just like 
the mysql collation. UNIQUE constraints on that field work just the way you 
expect them to, and the same as mysql.

So another approach is that instead of thinking of this as 'collation', we can 
think of this as a different field type in the modeler. VARCHAR and ciVARCHAR, 
etc. After all, 'case insensitive' is just as important a concept as the 
difference between BLOB and TEXT.

The later approach may incur some overhead. Each LIKE query will have to 
determine whether the columns involved are CI or CS. One way to do it in MySQL 
is this:

select collation(c1) from t1 limit 1;
+-------------------+
| collation(c1) |
+-------------------+
| utf8_general_ci   |
+-------------------+
1 row in set (0.02 sec)

I guess with metadata caching, the overhead may be reduced to a bearable 
minimum. Not sure if that's the easiest approach?


My gut feeling is that this is a decision that the architecture designer makes 
when creating their model, not a by-product of deployment. That is, it should 
be explicitly set in the model and the appropriate database collation or field 
types set when forward-engineering the db.

Having said that, a "force all columns as case sensitive" switch in the 
DataNode setup could give the sysadmin the choice to force UPPER back into the SQL at the 
risk of slowing things down.

Ari


Also maybe there are other ways to solve this problem (maybe even outside Cayenne, just 
like that "UPPER(column)" index idea)?

Thoughts?

Andrus


On Jan 10, 2011, at 11:45 AM, Ari Maniatis (JIRA) wrote:

    [ 
https://issues.apache.org/jira/browse/CAY-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979519#action_12979519
 ]

Ari Maniatis commented on CAY-1210:
-----------------------------------

I'd like to solve this one since it also affects other database types and is a 
bit of a performance issue for us right now. What about this as a solution:

* Add new column to Cayenne modeler in the dbAttribute called "case sensitive". 
It is a checkbox which is ticked by default (to preserve existing behaviour).
* Change the SQL generator to not spit out UPPER for those fields

mysql does not use index for case insensitive searches
------------------------------------------------------

                Key: CAY-1210
                URL: https://issues.apache.org/jira/browse/CAY-1210
            Project: Cayenne
         Issue Type: Improvement
         Components: Database integration
           Reporter: Ari Maniatis
           Assignee: Ari Maniatis
            Fix For: 3.1M1


When performing a case insensitive search Cayenne spits out SQL which looks 
like this
SELECT .... WHERE upper(name) LIKE upper("fred")
This prevents any index being used for the search. Since mysql already 
performed case insensitive searches on text fields we need to suppress the 
'upper' functions being used in these situations. All searches on these fields 
are already case insensitive.
http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


--
-------------------------->
Aristedes Maniatis
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A

Re: (CAY-1210) mysql does not use index for case insensitive searches

Reply via email to