[
https://issues.apache.org/jira/browse/SQOOP-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292510#comment-14292510
]
Abraham Elmahrek commented on SQOOP-1579:
-----------------------------------------
I've tested using the following parameters:
* Using MySQL table that has the following information:
{code}
mysql> SELECT * FROM fl;
+----+------+------+
| id | text | fl |
+----+------+------+
| 0 |
| 0 |
| 1 | \n | 1 |
| 2 | bleh | 2 |
| 3 | bleh | 2.5 |
+----+------+------+
{code}
* Transfer data from MySQL to HDFS:
{code}
Job with id 1 and name mysql2hdfs (Enabled: true, Created by abe at 1/26/15
1:48 PM, Updated by abe at 1/26/15 1:48 PM)
Using link id 1 and Connector id 4
From database configuration
Schema name:
Table name: fl
Table SQL statement:
Table column names:
Partition column name:
Null value allowed for the partition column:
Boundary query:
Throttling resources
Extractors:
Loaders:
ToJob configuration
Override null value:
Null value:
Output format: TEXT_FILE
Compression format: NONE
Custom compression format:
Output directory: /tmp/sqoop2/hdfs
{code}
* Create a hive table from result:
{code}
create external table fl (
id int,
text string, fl float
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar" = "'",
"escapeChar" = "\\"
)
location '/tmp/sqoop2/hdfs';
{code}
* With the resulting Hive result set:
{code}
hive> SELECT * FROM fl;
OK
0 n 0.0
1 n 1.0
2 bleh 2.0
3 bleh 2.5
{code}
I see a couple of issues (as of
https://git-wip-us.apache.org/repos/asf?p=sqoop.git;a=commit;h=5ade862b10651ac2077691687f7375ecd75f1ec9):
* It seems the escape character "\" in Sqoop2 is not being escaped.
* HDFS connector is not unescaping
> Sqoop2 inserts new line in output file
> --------------------------------------
>
> Key: SQOOP-1579
> URL: https://issues.apache.org/jira/browse/SQOOP-1579
> Project: Sqoop
> Issue Type: Bug
> Components: sqoop2-hdfs-connector
> Reporter: Shakun Grover
> Assignee: Abraham Elmahrek
> Fix For: 1.99.5
>
>
> When we import many columns(say >20 columns) from RDBMS to HDFS, then Sqoop2
> inserts a new line in the output file.The newline appears at the end of
> certain columns.Doesn't seem to appear for every single column.
> When we try to view this data in Hive, it shows NULL at the new line
> separator in it.
> As per Abraham,this looks like a problem with unescaping the data.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)