[jira] [Commented] (SQOOP-1579) Sqoop2 inserts new line in output file

Abraham Elmahrek (JIRA) Mon, 26 Jan 2015 14:21:13 -0800

    [ 
https://issues.apache.org/jira/browse/SQOOP-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292510#comment-14292510
 ]


Abraham Elmahrek commented on SQOOP-1579:
-----------------------------------------

I've tested using the following parameters:
* Using MySQL table that has the following information:
{code}
mysql> SELECT * FROM fl;
+----+------+------+
| id | text | fl   |
+----+------+------+
|  0 |
    |    0 |
|  1 | \n   |    1 |
|  2 | bleh |    2 |
|  3 | bleh |  2.5 |
+----+------+------+
{code}
* Transfer data from MySQL to HDFS:
{code}
Job with id 1 and name mysql2hdfs (Enabled: true, Created by abe at 1/26/15 
1:48 PM, Updated by abe at 1/26/15 1:48 PM)
Using link id 1 and Connector id 4
  From database configuration
    Schema name: 
    Table name: fl
    Table SQL statement: 
    Table column names: 
    Partition column name: 
    Null value allowed for the partition column: 
    Boundary query: 
  Throttling resources
    Extractors: 
    Loaders: 
  ToJob configuration
    Override null value: 
    Null value: 
    Output format: TEXT_FILE
    Compression format: NONE
    Custom compression format: 
    Output directory: /tmp/sqoop2/hdfs
{code}
* Create a hive table from result:
{code}
create external table fl (
       id int,
       text string, fl float
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = ",",
   "quoteChar"     = "'",
   "escapeChar"    = "\\"
)
location '/tmp/sqoop2/hdfs';
{code}
* With the resulting Hive result set:
{code}
hive> SELECT * FROM fl;
OK
0       n       0.0
1       n       1.0
2       bleh    2.0
3       bleh    2.5
{code}

I see a couple of issues (as of 
https://git-wip-us.apache.org/repos/asf?p=sqoop.git;a=commit;h=5ade862b10651ac2077691687f7375ecd75f1ec9):
* It seems the escape character "\" in Sqoop2 is not being escaped.
* HDFS connector is not unescaping

> Sqoop2 inserts new line in output file
> --------------------------------------
>
>                 Key: SQOOP-1579
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1579
>             Project: Sqoop
>          Issue Type: Bug
>          Components: sqoop2-hdfs-connector
>            Reporter: Shakun Grover
>            Assignee: Abraham Elmahrek
>             Fix For: 1.99.5
>
>
> When we import many columns(say >20 columns) from RDBMS to HDFS, then Sqoop2 
> inserts a new line in the output file.The newline appears at the end of 
> certain columns.Doesn't seem to appear for every single column.
> When we try to view this data in Hive, it shows NULL at the new line 
> separator  in it.
> As per Abraham,this looks like a problem with unescaping the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SQOOP-1579) Sqoop2 inserts new line in output file

Reply via email to