Use mysqldump when connecting to local mysql instance in Sqoop
--------------------------------------------------------------
Key: HADOOP-5844
URL: https://issues.apache.org/jira/browse/HADOOP-5844
Project: Hadoop Core
Issue Type: New Feature
Reporter: Aaron Kimball
Assignee: Aaron Kimball
Attachments: mysqldump.patch
Sqoop uses MapReduce + DBInputFormat to read the contents of a table into HDFS.
On many databases, this implementation is O(N^2) in the number of rows. Also,
the use of multiple mappers has low value in terms of throughput, because the
database itself is inherently singlethreaded. While DBInputFormat/JDBC provides
a useful fallback mechanism for importing from databases, db-specific dump
utilities will nearly always provide faster throughput, and should be selected
when available. This patch allows users to use mysqldump to read from local
mysql instances instead of the MapReduce-based input.
If you provide sqoop with arguments of the form " --connect
jdbc:mysql://localhost/somedatabase --local", it will use the mysqldump fast
path to perform the import.
This patch, naturally, requires that MySQL be installed on a machine to test
it. Thus the test that this adds is called LocalMySQLTest (instead of the
Hadoop-preferred file naming, TestLocalMySQL) so that Hudson doesn't
automatically run it. You can run this test yourself by using "ant
-Dtestcase=LocalMySQLTest test". See the notes in the javadoc for the
LocalMySQLTest class in how to set up the MySQL test environment for this.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.