sqoop

Jarek Jarcec Cecho (JIRA) Wed, 19 Dec 2012 09:13:19 -0800

    [ 
https://issues.apache.org/jira/browse/SQOOP-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536146#comment-13536146
 ]


Jarek Jarcec Cecho commented on SQOOP-793:
------------------------------------------

Thank you for explaining your use case. It makes much more sense to me now. 
Sqoop is a tool that specialize in moving data from databases to hadoop and 
vice-versa. We're not trying to do general ETL tool as we would duplicate 
mapreduce functionality without major gain. Sqoop will always require working 
connection to the database in order to at least fetch the metadata (columns, 
column types), so I'm afraid that providing ability to read from HDFS will not 
solve your use case where you have no connectivity to database.

I would recommend to check how Sqoop is performing the mysqldump parsing and 
simply reuse that small portion of code in your own custom mapreduce job. I 
believe that this will be much more time effective than trying to change Sqoop.

Jarcec
                
> mysqldump > file > hdfs > sqoop
> -------------------------------
>
>                 Key: SQOOP-793
>                 URL: https://issues.apache.org/jira/browse/SQOOP-793
>             Project: Sqoop
>          Issue Type: New Feature
>          Components: connectors/mysql
>            Reporter: Guido Serra aka Zeph
>            Assignee: Guido Serra aka Zeph
>            Priority: Minor
>
> extend the MySQLDump module to be able to read from a mysqldump generated 
> file,
> saved on hdfs, instead of triggering the "--direct" option or connect via jdbc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (SQOOP-793) mysqldump > file > hdfs > sqoop

Reply via email to