[
https://issues.apache.org/jira/browse/HADOOP-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715662#action_12715662
]
Tom White commented on HADOOP-5844:
-----------------------------------
+1 Looks good.
> Use mysqldump when connecting to local mysql instance in Sqoop
> --------------------------------------------------------------
>
> Key: HADOOP-5844
> URL: https://issues.apache.org/jira/browse/HADOOP-5844
> Project: Hadoop Core
> Issue Type: New Feature
> Reporter: Aaron Kimball
> Assignee: Aaron Kimball
> Attachments: mysqldump.patch
>
>
> Sqoop uses MapReduce + DBInputFormat to read the contents of a table into
> HDFS. On many databases, this implementation is O(N^2) in the number of rows.
> Also, the use of multiple mappers has low value in terms of throughput,
> because the database itself is inherently singlethreaded. While
> DBInputFormat/JDBC provides a useful fallback mechanism for importing from
> databases, db-specific dump utilities will nearly always provide faster
> throughput, and should be selected when available. This patch allows users to
> use mysqldump to read from local mysql instances instead of the
> MapReduce-based input.
> If you provide sqoop with arguments of the form " --connect
> jdbc:mysql://localhost/somedatabase --local", it will use the mysqldump fast
> path to perform the import.
> This patch, naturally, requires that MySQL be installed on a machine to test
> it. Thus the test that this adds is called LocalMySQLTest (instead of the
> Hadoop-preferred file naming, TestLocalMySQL) so that Hudson doesn't
> automatically run it. You can run this test yourself by using "ant
> -Dtestcase=LocalMySQLTest test". See the notes in the javadoc for the
> LocalMySQLTest class in how to set up the MySQL test environment for this.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.