[ https://issues.apache.org/jira/browse/HADOOP-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708807#action_12708807 ]
Hadoop QA commented on HADOOP-5815: ----------------------------------- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12407903/HADOOP-5815.patch against trunk revision 774138. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 release audit. The applied patch generated 489 release audit warnings (more than the trunk's current 486 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/332/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/332/artifact/trunk/current/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/332/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/332/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/332/console This message is automatically generated. > Sqoop: A database import tool for Hadoop > ---------------------------------------- > > Key: HADOOP-5815 > URL: https://issues.apache.org/jira/browse/HADOOP-5815 > Project: Hadoop Core > Issue Type: New Feature > Reporter: Aaron Kimball > Assignee: Aaron Kimball > Attachments: HADOOP-5815.patch > > > Overview: > Sqoop is a tool designed to help users import existing relational databases > into their Hadoop clusters. Sqoop uses JDBC to connect to a database, examine > the schema for tables, and auto-generate the necessary classes to import data > into HDFS. It then instantiates a MapReduce job to read the table from the > database via the DBInputFormat (JDBC-based InputFormat). The table is read > into a set of files loaded into HDFS. Both SequenceFile and text-based > targets are supported. > Longer term, Sqoop will support automatic connectivity to Hive, with the > ability to load data files directly into the Hive warehouse directory, and > also to inject the appropriate table definition into the metastore. > Some more specifics: > Sqoop is a program implemented as a contrib module. Its frontend is invoked > through "bin/hadoop jar sqoop.jar ..." and allows you to connect to arbitrary > JDBC databases and extract their tables into files in HDFS. The underlying > implementation utilizes the JDBC interface of HADOOP-2536 (DBInputFormat). > The DBWritable implementation needed to extract a table is generated by this > tool, based on the types of the columns seen in the table. Sqoop uses JDBC to > examine the table specification and translate this to the appropriate Java > types. > The generated classes are provided as .java files for the user to reuse. They > are also compiled into a jar and used to run a MapReduce task to perform the > data import. This either results in text files or SequenceFiles in HDFS. In > the latter case, these Java classes are embedded into the SequenceFiles as > well. > The program will extract a specific table from a database, or optionally, all > tables. For a table, it can read all columns, or just a subset. Since > HADOOP-2536 requires that a sorting key be specified for the import task, > Sqoop will auto-detect the presence of a primary key on a table and > automatically use it as the sort order; the user can also manually specify a > sorting column. > Example invocations: > To import an entire database: > hadoop jar sqoop.jar org.apache.hadoop.sqoop.Sqoop --connect > jdbc:mysql://db.example.com/company --all-tables > (Requires that all tables have primary keys) > To select a single table: > hadoop jar sqoop.jar org.apache.hadoop.sqoop.Sqoop --connect > jdbc:mysql://db.example.com/company --table employees > To select a subset of columns from a table: > hadoop jar sqoop.jar org.apache.hadoop.sqoop.Sqoop --connect > jdbc:mysql://db.example.com/company --table employees --columns > "employee_id,first_name,last_name,salary,start_date" > To explicitly set the sort column, import format, and import destination (the > table will go to /shared/imported_databases/employees): > hadoop jar sqoop.jar org.apache.hadoop.sqoop.Sqoop --connect > jdbc:mysql://db.example.com/company --table employees --order-by employee_id > --warehouse-dir /shared/imported_databases --as-sequencefile > Sqoop will automatically select the correct JDBC driver class name for HSQLdb > and MySQL; this can also be explicitly set, e.g.: > hadoop jar sqoop.jar org.apache.hadoop.sqoop.Sqoop --connect > jdbc:postgresql://db.example.com/company --driver org.postgresql.Driver > --all-tables > Testing has been conducted with HSQLDB and MySQL. A set of unit tests covers > a great deal of Sqoop's functionality, and this tool has been used in > practice at Cloudera and with a few other early test users on "real" > databases. > A readme file is included in the patch which contains documentation on how to > use the tool. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.