[
https://issues.apache.org/jira/browse/SQOOP-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joseph Crotty updated SQOOP-2628:
---------------------------------
Description:
Sqoop doesn't honor UTF-8 chars when import --direct on a MySQL table.
Here is the key comma delimited output from attached example script w/o and w/
--direct:
{code}
1,Τη γλώσσα,"/fox/\jumps
1,���� ������������,"/fox/\jumps
{code}
I looked over sqoop --verbose output and hadoop logs but can't find anything
suspicious.
As an aside run the example script w/ --mysql-delimiters to get this puzzling
comma delimited output:
{code}
1,Τη γλώσσα,"/fox/\\jumps
1,'���� ������������','\"/fox/\\jumps'
{code}
Note, the difference between the text fields containing the word "fox." The
output should be identical but they are quoted differently.
Attached are scripts to create the MySQL utest example table and bash script I
used to demonstrate the --direct problem.
Environment
{code}
$ sqoop version
Warning: /home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog
does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo
does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/10/20 17:28:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Sqoop 1.4.6
git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25
Compiled by root on Mon Apr 27 14:38:36 CST 2015
$ hadoop version
Hadoop 2.6.0-amzn-1
Subversion [email protected]:/pkg/Aws157BigTop -r
edd5a97db145470a8723dde24f38c83724e0959c
Compiled by ec2-user on 2015-09-25T14:59Z
Compiled with protoc 2.5.0
>From source with checksum 7beeae31f3c4554b23d92f1e63dc85
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-amzn-1.jar
{code}
was:
Sqoop doesn't honor UTF-8 chars when import --direct on a MySQL table.
Here is the key comma delimited output from attached example script w/o and w/
--direct:
{code}
1,Τη γλώσσα,"/fox/\jumps
1,���� ������������,"/fox/\jumps
{code}
I looked over sqoop --verbose output and hadoop logs but can't find anything
suspicious.
As an aside run the example script w/ --mysql-delimiters to get this puzzling
comma delimited output:
{code}
1,Τη γλώσσα,"/fox/\\jumps
1,'���� ������������','\"/fox/\\jumps'
{code}
Note, the difference between the text fields containing the word "fox." I would
expect them to identical but they are quoted differently.
Attached are scripts to create the MySQL utest example table and bash script I
used to demonstrate the --direct problem.
Environment
{code}
$ sqoop version
Warning: /home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog
does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo
does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/10/20 17:28:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Sqoop 1.4.6
git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25
Compiled by root on Mon Apr 27 14:38:36 CST 2015
$ hadoop version
Hadoop 2.6.0-amzn-1
Subversion [email protected]:/pkg/Aws157BigTop -r
edd5a97db145470a8723dde24f38c83724e0959c
Compiled by ec2-user on 2015-09-25T14:59Z
Compiled with protoc 2.5.0
>From source with checksum 7beeae31f3c4554b23d92f1e63dc85
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-amzn-1.jar
{code}
> Import MySQL table --direct UTF-8 data corrupted
> ------------------------------------------------
>
> Key: SQOOP-2628
> URL: https://issues.apache.org/jira/browse/SQOOP-2628
> Project: Sqoop
> Issue Type: Bug
> Components: sqoop2-jdbc-connector
> Affects Versions: 1.4.6
> Environment: sqoop 1.4.6 hadoop 2.6.0-amzn-1
> Reporter: Joseph Crotty
> Attachments: create_utest_table.sql, sqoop_import.sh, sqoop_utest.log
>
>
> Sqoop doesn't honor UTF-8 chars when import --direct on a MySQL table.
> Here is the key comma delimited output from attached example script w/o and
> w/ --direct:
> {code}
> 1,Τη γλώσσα,"/fox/\jumps
> 1,���� ������������,"/fox/\jumps
> {code}
> I looked over sqoop --verbose output and hadoop logs but can't find anything
> suspicious.
> As an aside run the example script w/ --mysql-delimiters to get this puzzling
> comma delimited output:
> {code}
> 1,Τη γλώσσα,"/fox/\\jumps
> 1,'���� ������������','\"/fox/\\jumps'
> {code}
> Note, the difference between the text fields containing the word "fox." The
> output should be identical but they are quoted differently.
> Attached are scripts to create the MySQL utest example table and bash script
> I used to demonstrate the --direct problem.
> Environment
> {code}
> $ sqoop version
> Warning: /home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog
> does not exist! HCatalog jobs will fail.
> Please set $HCAT_HOME to the root of your HCatalog installation.
> Warning: /home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo
> does not exist! Accumulo imports will fail.
> Please set $ACCUMULO_HOME to the root of your Accumulo installation.
> 15/10/20 17:28:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
> Sqoop 1.4.6
> git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25
> Compiled by root on Mon Apr 27 14:38:36 CST 2015
> $ hadoop version
> Hadoop 2.6.0-amzn-1
> Subversion [email protected]:/pkg/Aws157BigTop -r
> edd5a97db145470a8723dde24f38c83724e0959c
> Compiled by ec2-user on 2015-09-25T14:59Z
> Compiled with protoc 2.5.0
> From source with checksum 7beeae31f3c4554b23d92f1e63dc85
> This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-amzn-1.jar
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)