[
https://issues.apache.org/jira/browse/SQOOP-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Huang updated SQOOP-1692:
------------------------------
Attachment: confusion-code-when-importing.patch
> Confusion code occurred while importing data from MySQL into HBase
> ------------------------------------------------------------------
>
> Key: SQOOP-1692
> URL: https://issues.apache.org/jira/browse/SQOOP-1692
> Project: Sqoop
> Issue Type: Bug
> Components: hbase-integration
> Affects Versions: 1.4.4
> Reporter: Eric Huang
> Fix For: 1.4.4
>
> Attachments: confusion-code-when-importing.patch
>
>
> If the charset of MySQL is latin1(default) and tables contain Chinese
> characters, Importing data from MySQL to HBase will cause confusion code.
> Some guys said it's because charset "latin1"(similar with cp1252) of MySQL is
> not standard latin1(ISO-8859-1). ISO-8859-1 latin1 treats the code points
> between 0x80 and 0x9f as “undefined”.
> For details:
> latin1 is the default character set. MySQL's latin1 is the same as the
> Windows cp1252 character set. This means it is the same as the official ISO
> 8859-1 or IANA (Internet Assigned Numbers Authority) latin1, except that IANA
> latin1 treats the code points between 0x80 and 0x9f as “undefined,” whereas
> cp1252, and therefore MySQL's latin1, assign characters for those positions.
> For example, 0x80 is the Euro sign. For the “undefined” entries in cp1252,
> MySQL translates 0x81 to Unicode 0x0081, 0x8d to 0x008d, 0x8f to 0x008f, 0x90
> to 0x0090, and 0x9d to 0x009d.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)