[ 
https://issues.apache.org/jira/browse/SQOOP-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14304794#comment-14304794
 ] 

Veena Basavaraj edited comment on SQOOP-1579 at 2/4/15 8:48 AM:
----------------------------------------------------------------

[~abec] Since I have spent quite a lot of time on the IDF code, I am keen on 
understanding this issue in detail.

>From the ticket above and the patch I took a glance at a few things are 
>unclear to me, We have a JDBC connector to HDFS connector. ( and there is no 
>HIVE connector yet, it is still WIP). So if there is additional transformation 
>required from HDFS to load into HIVE, I would assume it is expected and I am 
>not sure if this needs to be in HDFS connector code

Things that I need clarification on, 

The unescape attribute is not clear to me. Sqoop CSV expects the text to be in 
single quotes. Sqoop Object format does not. So if JDBC is writing the data( 
FROM) as object array format and HDFS is reading it as object array ( TO), then 
other than the fact that the single quotes are added around the text and 
certain bytes are printed as special chars, there should be no change.

So if the data in MYSQL had a new line - \n which one of the example above has, 
then while writing ( TO side) in HDFS, hdfs connector will need to encode this. 
Is that what the unescape attribute is? 

I would suggest adding a few more examples to this ticket on what the behavioud 
would be for  other special characters and highlight a simple 3 line design. 

\ \ (no space) 
\'
\"
\Z
\r
\n
\0


was (Author: vybs):
[~abec] Since I have spent quite a lot of time on the IDF code, I am keen on 
understanding this issue in detail.

>From the ticket above and the patch I took a glance at a few things are 
>unclear to me, We have a JDBC connector to HDFS connector. ( and there is no 
>HIVE connector yet, it is still WIP). So if there is additional transformation 
>required from HDFS to load into HIVE, I would assume it is expected and I am 
>not sure if this needs to be in HDFS connector code

Things that I need clarification on, 

The unescape attribute is not clear to me. Sqoop CSV expects the text to be in 
single quotes. Sqoop Object format does not. So if JDBC is writing the data( 
FROM) as object array format and HDFS is reading it as object array ( TO), then 
other than the fact that the single quotes are added around the text and 
certain bytes are printed as special chars, there should be no change.

So if the data in MYSQL had a new line - \n which one of the example above has, 
then while writing ( TO side) in HDFS, hdfs connector will need to encode this. 
Is that what the unescape attribute is? 

> Sqoop2: Data transfer to load into Hive does not work
> -----------------------------------------------------
>
>                 Key: SQOOP-1579
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1579
>             Project: Sqoop
>          Issue Type: Bug
>          Components: sqoop2-hdfs-connector
>            Reporter: Shakun Grover
>            Assignee: Abraham Elmahrek
>             Fix For: 1.99.5
>
>         Attachments: SQOOP-1579.0.patch, SQOOP-1579.1.patch, 
> SQOOP-1579.2.patch
>
>
> When we import many columns(say >20 columns) from RDBMS to HDFS, then Sqoop2 
> inserts a new line in the output file.The newline appears at the end of 
> certain columns.Doesn't seem to appear for every single column.
> When we try to view this data in Hive, it shows NULL at the new line 
> separator  in it.
> As per Abraham,this looks like a problem with unescaping the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to