[
https://issues.apache.org/jira/browse/HIVE-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13667142#comment-13667142
]
John Omernik commented on HIVE-4070:
------------------------------------
A couple of points to consider:
1. " If we change the default that would change the results current users are
getting." I am in absolute agreement here. I do not wish to change the
default, the horse has left the gate. If I have referenced that in my previous
posts, it's in the context of other ideas etc. My issue with the current
default isn't the choice of what it is, as much as there is no way to change it
on a case by base basis.
2. Paragraph 2: Those are solid points based on risks to the ongoing
maintenance of Hive. That said, most Relational Databases, including MySQL have
the option to change collation for given databases. True it may not be global
setting (although the collation default is a global setting), but rather a per
database setting. I.e. in MySQL you can have one database use a case sensitive
collation and another use a case insensitive collation. Perhaps hive-site isn't
the place for this, but metadata is (Set it as a per database or table
setting?)
3. For the most part, we model functionality in hive based on what mysql does -
Because of Point 1, this is moot, but by default, MySQL is case insensitive.
4. Most users do not want to have to heavily test before upgrade, they want
consistent behaviour between versions. - This is exactly why I think we need an
option for administrators of Hive to be able to set the case sensitivity on
database/table level, and the default case sensitivity at a global level. Most
users do not test. From a philosophical (data philosopher?) point of view,
users will not test, therefore if failure is to happen because of lack of
testing, please fail with pomp and circumstance.
I.e. Both lines of users (MySQL/MSSQL Migrating to Hive and Oracle/Postgres
Migrating to Hive) are going to make assumptions when they write their queries
that will cause a failure to happen. If LIKE is Case Sensitive as it is,
Oracle/Postgres users will assume correctly and all will be well. MySQL/Users
will assume wrong, but they won't KNOW they assumed wrong, they will just
assume no results (given a lack of testing). The converse is actually the
preferred model: If LIKE is not case sensitive, then Oracle/Postgres users will
assume incorrectly, they will run a query, and they will get their expected
results, but will also get extra results allowing them to understand they
assumed wrong. MySQL/MSSQL users will assume correctly and get their results.
Going back to Point 1, the horse has left the building, we can't change default
because of what it would do to the current user base, thus the next best option
is to allow administrators to set that so users don't have to deal with it,
document it so good administrators can handle it out of the gate, and to find a
way to do it like '%MysQL%' (i.e. similar to collation settings on databases
rather then a global setting as described previously).
I do see the challenges with a global setting affecting how a function works;
no precedence for that, and thus it could introduce risk, but I still hold that
the current risk, especially on a system that is touted as modeled after MySQL
(that's how it was explained to me, thus I assumed LIKE was case insensitive),
to users is high. As I was typing this novelette, I realized I tossed out an
idea related to per database settings. Thoughts on that? That is similar to
how MySQL handles it, and thus follows the models mentioned while avoiding a
global setting that affects the behavior of a UDF. (The setting doesn't change
the UDF, the "collation equivalent" setting on the databases does, and thus the
global setting is just the default collation equivalent.
> Like operator in Hive is case sensitive while in MySQL (and most likely other
> DBs) it's case insensitive
> --------------------------------------------------------------------------------------------------------
>
> Key: HIVE-4070
> URL: https://issues.apache.org/jira/browse/HIVE-4070
> Project: Hive
> Issue Type: Bug
> Components: UDF
> Affects Versions: 0.10.0
> Reporter: Mark Grover
> Assignee: Mark Grover
> Priority: Trivial
>
> Hive's like operator seems to be case sensitive.
> See
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLike.java#L164
> However, MySQL's like operator is case insensitive. I don't have other DB's
> (like PostgreSQL) installed and handy but I am guessing their LIKE is case
> insensitive as well.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira