[ 
https://issues.apache.org/jira/browse/HIVE-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973566#comment-13973566
 ] 

Jason Dere commented on HIVE-6843:
----------------------------------

Should this also work for unicode characters which require more than one Java 
character? If you add these checks to TestGenericUDFUtils, the 2nd check fails:
{code}
    Assert.assertEquals(3, GenericUDFUtils.findText(new 
Text("123\uD801\uDC00456"), new Text("\uD801\uDC00"), 0));
    Assert.assertEquals(4, GenericUDFUtils.findText(new 
Text("123\uD801\uDC00456"), new Text("4"), 0));
{code}

This would require using String.codePointCount() on the indexOf() result.

> INSTR for UTF-8 returns incorrect position
> ------------------------------------------
>
>                 Key: HIVE-6843
>                 URL: https://issues.apache.org/jira/browse/HIVE-6843
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 0.11.0, 0.12.0
>            Reporter: Clif Kranish
>            Assignee: Szehon Ho
>            Priority: Minor
>         Attachments: HIVE-6843.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to