-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23907/
-----------------------------------------------------------

Review request for hive.


Repository: hive-git


Description
-------

When we put UTF-8 characters in where clause of a hive query the results are 
empty for "where content like '%?%'" and results contain all rows for "where 
content not like '%?%';" even when few rows contain this character.

Steps to reproduce:

1. Save a file called data.txt in the root container. The contents of the files 
are as follows.

190     ?f??c??h?c?
899     d???geg??ea?eead?e
137     ??h?ge??g??
21      ??e?c??d??
767     ?c?g?????????????
281     ???aga?c?e??
573     ??hc?b??????hc?
966     ????e?eb??c????ga??
565     ????bb?ehd?ea??
778     ?????bbea??????a?
363     gd?a?a?b??fg?
822     a???h?e?h?gac????b
338     b??ff?e?e?ba?

2. Execute the following queries to setup the table.
a. CREATE TABLE hivetable(row INT, content STRING) ROW FORMAT DELIMITED FIELDS 
TERMINATED BY '
t' LOCATION '/hivetable';
b. LOAD DATA INPATH 'wasb:///data.txt' OVERWRITE INTO TABLE hivetable;

3. create a query file query.hql with following contents

INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput'
select * from hivetable where content like '%?%';

4. even though few rows contains this character the output is empty.

5. change the contents of query.hql to 

INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput'
select * from hivetable where content not like '%?%';

6. The output contains all rows including those containing the given character.

7. Similar results are observed when using "where content = '?f??c??h?c?'; "

8. We get expected results when using "where content like '%a%'; "


Diffs
-----

  cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 3cdedba 

Diff: https://reviews.apache.org/r/23907/diff/


Testing
-------

Tested, resolved the issue.


Thanks,

XIAOBING ZHOU

Reply via email to