[ 
https://issues.apache.org/jira/browse/SQOOP-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Huang updated SQOOP-1692:
------------------------------
    Summary: Confusion code occurred while importing data from MySQL to HBase  
(was: Confusion code occurred while importing data from MySQL into HBase)

> Confusion code occurred while importing data from MySQL to HBase
> ----------------------------------------------------------------
>
>                 Key: SQOOP-1692
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1692
>             Project: Sqoop
>          Issue Type: Bug
>          Components: hbase-integration
>    Affects Versions: 1.4.4
>            Reporter: Eric Huang
>             Fix For: 1.4.4
>
>         Attachments: confusion-code-when-importing.patch
>
>
> If the charset of MySQL is latin1(default) and tables contain Chinese 
> characters, Importing data from MySQL to HBase will cause confusion code. 
> Some guys said it's because charset "latin1"(similar with cp1252) of MySQL is 
> not standard latin1(ISO-8859-1). ISO-8859-1 latin1 treats the code points 
> between 0x80 and 0x9f as “undefined”. 
> For details:
> latin1 is the default character set. MySQL's latin1 is the same as the 
> Windows cp1252 character set. This means it is the same as the official ISO 
> 8859-1 or IANA (Internet Assigned Numbers Authority) latin1, except that IANA 
> latin1 treats the code points between 0x80 and 0x9f as “undefined,”  whereas 
> cp1252, and therefore MySQL's latin1, assign characters for those positions.  
> For example, 0x80 is the Euro sign. For the “undefined” entries in cp1252,  
> MySQL translates 0x81 to Unicode 0x0081, 0x8d to 0x008d, 0x8f to 0x008f, 0x90 
> to 0x0090, and 0x9d to 0x009d.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to