[ https://issues.apache.org/jira/browse/HADOOP-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aaron Kimball updated HADOOP-5844: ---------------------------------- Status: Patch Available (was: Open) Cycling the patch status now that 5815 is in to actually test this > Use mysqldump when connecting to local mysql instance in Sqoop > -------------------------------------------------------------- > > Key: HADOOP-5844 > URL: https://issues.apache.org/jira/browse/HADOOP-5844 > Project: Hadoop Core > Issue Type: New Feature > Reporter: Aaron Kimball > Assignee: Aaron Kimball > Attachments: mysqldump.patch > > > Sqoop uses MapReduce + DBInputFormat to read the contents of a table into > HDFS. On many databases, this implementation is O(N^2) in the number of rows. > Also, the use of multiple mappers has low value in terms of throughput, > because the database itself is inherently singlethreaded. While > DBInputFormat/JDBC provides a useful fallback mechanism for importing from > databases, db-specific dump utilities will nearly always provide faster > throughput, and should be selected when available. This patch allows users to > use mysqldump to read from local mysql instances instead of the > MapReduce-based input. > If you provide sqoop with arguments of the form " --connect > jdbc:mysql://localhost/somedatabase --local", it will use the mysqldump fast > path to perform the import. > This patch, naturally, requires that MySQL be installed on a machine to test > it. Thus the test that this adds is called LocalMySQLTest (instead of the > Hadoop-preferred file naming, TestLocalMySQL) so that Hudson doesn't > automatically run it. You can run this test yourself by using "ant > -Dtestcase=LocalMySQLTest test". See the notes in the javadoc for the > LocalMySQLTest class in how to set up the MySQL test environment for this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.