[
https://issues.apache.org/jira/browse/SQOOP-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536146#comment-13536146
]
Jarek Jarcec Cecho commented on SQOOP-793:
------------------------------------------
Thank you for explaining your use case. It makes much more sense to me now.
Sqoop is a tool that specialize in moving data from databases to hadoop and
vice-versa. We're not trying to do general ETL tool as we would duplicate
mapreduce functionality without major gain. Sqoop will always require working
connection to the database in order to at least fetch the metadata (columns,
column types), so I'm afraid that providing ability to read from HDFS will not
solve your use case where you have no connectivity to database.
I would recommend to check how Sqoop is performing the mysqldump parsing and
simply reuse that small portion of code in your own custom mapreduce job. I
believe that this will be much more time effective than trying to change Sqoop.
Jarcec
> mysqldump > file > hdfs > sqoop
> -------------------------------
>
> Key: SQOOP-793
> URL: https://issues.apache.org/jira/browse/SQOOP-793
> Project: Sqoop
> Issue Type: New Feature
> Components: connectors/mysql
> Reporter: Guido Serra aka Zeph
> Assignee: Guido Serra aka Zeph
> Priority: Minor
>
> extend the MySQLDump module to be able to read from a mysqldump generated
> file,
> saved on hdfs, instead of triggering the "--direct" option or connect via jdbc
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira