----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23907/ -----------------------------------------------------------
Review request for hive. Repository: hive-git Description ------- When we put UTF-8 characters in where clause of a hive query the results are empty for "where content like '%?%'" and results contain all rows for "where content not like '%?%';" even when few rows contain this character. Steps to reproduce: 1. Save a file called data.txt in the root container. The contents of the files are as follows. 190 ?f??c??h?c? 899 d???geg??ea?eead?e 137 ??h?ge??g?? 21 ??e?c??d?? 767 ?c?g????????????? 281 ???aga?c?e?? 573 ??hc?b??????hc? 966 ????e?eb??c????ga?? 565 ????bb?ehd?ea?? 778 ?????bbea??????a? 363 gd?a?a?b??fg? 822 a???h?e?h?gac????b 338 b??ff?e?e?ba? 2. Execute the following queries to setup the table. a. CREATE TABLE hivetable(row INT, content STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' t' LOCATION '/hivetable'; b. LOAD DATA INPATH 'wasb:///data.txt' OVERWRITE INTO TABLE hivetable; 3. create a query file query.hql with following contents INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput' select * from hivetable where content like '%?%'; 4. even though few rows contains this character the output is empty. 5. change the contents of query.hql to INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput' select * from hivetable where content not like '%?%'; 6. The output contains all rows including those containing the given character. 7. Similar results are observed when using "where content = '?f??c??h?c?'; " 8. We get expected results when using "where content like '%a%'; " Diffs ----- cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 3cdedba Diff: https://reviews.apache.org/r/23907/diff/ Testing ------- Tested, resolved the issue. Thanks, XIAOBING ZHOU