[ https://issues.apache.org/jira/browse/HIVE-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793112#comment-13793112 ]
Sergey Shelukhin commented on HIVE-5304: ---------------------------------------- [~ashutoshc] I updated the JIRA description with detailed investigation results... just fyi > Hive results can depend on metastore's underlying datastore > ----------------------------------------------------------- > > Key: HIVE-5304 > URL: https://issues.apache.org/jira/browse/HIVE-5304 > Project: Hive > Issue Type: Bug > Components: Metastore > Reporter: Sergey Shelukhin > > [removed old description] > Hive JDOQL filter pushdown and direct SQL may end up pushing StringCol op > 'SomeString' to underlying SQL datastore. However, the datastore may handle > these differently based on the encoding and collation used for the columns of > the database. > So, query results can change depending on the underlying store for the > metastore and its version. > drop_partitions_filter.q test illustrates this problem. In byte order > collation (proper way) USA is sorted before Uganda, but some collations may > do it the other way, causing the test to fail. > I am assuming that byte-order sort if the correct way to order things. > Our MySQL script specifies _bin collation, which is byte-order; Postgres 9.1 > and after, as far as I see, defaults to "C" collation, which is also > byte-order. > Derby seems to use byte-order by default, I didn't spend a lot of time on > Derby. > However, Postgres before 9.1 seems to default to "en_US.UTF8" and there's no > way to change column collation in our script if database is already created. > MySQL by default doesn't use _bin collation (on my machine), so if database > is auto-created, the order of things is going to change. > I didn't investigate MSSQL or Oracle. > For now it seems that: > 1) Auto-create shouldn't be used. > 2) If old version of postgres (<9.1) is used, the collation should be set > properly by whoever issues "create database" (that is not our script). > 3) We might want to add 'collate "C"' to varchar columns in the postgres > script to ensure the correct collation; however, this will break the script > for postgres <9.1. > 4) MSSQL and Oracle might warrant investigation. -- This message was sent by Atlassian JIRA (v6.1#6144)