[ 
https://issues.apache.org/jira/browse/SQOOP-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ranjan Bagchi updated SQOOP-2639:
---------------------------------
    Description: 
I am able to import utf-8 data (non-latin1) data successfully into HDFS via:

sqoop import --connect jdbc:mysql://host/db --username XX --password YY \
        --mysql-delimiters \
        --table MYSQL_SRC_TABLE --target-dir ${SQOOP_DIR_PREFIX}/mysql_table 
--direct 

However, using 

sqoop export --connect  jdbc:mysql://host/db --username XX --password YY \
        --mysql-delimiters \
        --table MYSQL_DEST_TABLE --export-dir ${SQOOP_DIR_PREFIX}/mysql_table \
        --direct 

Cuts off the fields after the first non-latin1 character (eg a letter w/ an 
umlaut).
I tried other options like  -- --default-character-set=utf8, without success.

I was able to fix the problem with the following change:
Change 
https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java,
 line 322 from 
this.mysqlCharSet = MySQLUtils.MYSQL_DEFAULT_CHARSET;
to
this.mysqlCharSet = "utf-8"; 

Hope this helps


  was:
I am able to import utf-8 data (non-latin1) data successfully into HDFS via:

sqoop import --connect jdbc:mysql://host/db --username XX --password YY \
        --mysql-delimiters \
        --table MYSQL_SRC_TABLE --target-dir ${SQOOP_DIR_PREFIX}/mysql_table 
--direct 

However, using 

sqoop export --connect  jdbc:mysql://host/db --username XX --password YY \
        --mysql-delimiters \
        --table MYSQL_DEST_TABLE --export-dir ${SQOOP_DIR_PREFIX}/mysql_table \
        --direct 

Cuts off the fields after the first non-latin1 character (eg a letter w/ an 
umlaut).
I tried other options like  -- --default-character-set=utf8, without success.

I was able to fix the problem with the following change:
Change 
https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java,
 line 322 from 
`this.mysqlCharSet = MySQLUtils.MYSQL_DEFAULT_CHARSET;`
to
`this.mysqlCharSet = "utf-8"; `

Hope this helps



> Unable to export utf-8 data to MySQL using --direct mode
> --------------------------------------------------------
>
>                 Key: SQOOP-2639
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2639
>             Project: Sqoop
>          Issue Type: Bug
>          Components: connectors/mysql
>    Affects Versions: 1.4.6
>            Reporter: Ranjan Bagchi
>
> I am able to import utf-8 data (non-latin1) data successfully into HDFS via:
> sqoop import --connect jdbc:mysql://host/db --username XX --password YY \
>         --mysql-delimiters \
>         --table MYSQL_SRC_TABLE --target-dir ${SQOOP_DIR_PREFIX}/mysql_table 
> --direct 
> However, using 
> sqoop export --connect  jdbc:mysql://host/db --username XX --password YY \
>         --mysql-delimiters \
>         --table MYSQL_DEST_TABLE --export-dir ${SQOOP_DIR_PREFIX}/mysql_table 
> \
>         --direct 
> Cuts off the fields after the first non-latin1 character (eg a letter w/ an 
> umlaut).
> I tried other options like  -- --default-character-set=utf8, without success.
> I was able to fix the problem with the following change:
> Change 
> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java,
>  line 322 from 
> this.mysqlCharSet = MySQLUtils.MYSQL_DEFAULT_CHARSET;
> to
> this.mysqlCharSet = "utf-8"; 
> Hope this helps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to