[jira] Commented: (HADOOP-5815) Sqoop: A database import tool for Hadoop

Hadoop QA (JIRA) Wed, 13 May 2009 00:36:10 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708807#action_12708807
 ]


Hadoop QA commented on HADOOP-5815:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12407903/HADOOP-5815.patch
  against trunk revision 774138.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 28 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    -1 release audit.  The applied patch generated 489 release audit warnings 
(more than the trunk's current 486 warnings).

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/332/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/332/artifact/trunk/current/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/332/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/332/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/332/console

This message is automatically generated.

> Sqoop: A database import tool for Hadoop
> ----------------------------------------
>
>                 Key: HADOOP-5815
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5815
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: HADOOP-5815.patch
>
>
> Overview:
> Sqoop is a tool designed to help users import existing relational databases 
> into their Hadoop clusters. Sqoop uses JDBC to connect to a database, examine 
> the schema for tables, and auto-generate the necessary classes to import data 
> into HDFS. It then instantiates a MapReduce job to read the table from the 
> database via the DBInputFormat (JDBC-based InputFormat). The table is read 
> into a set of files loaded into HDFS. Both SequenceFile and text-based 
> targets are supported.
> Longer term, Sqoop will support automatic connectivity to Hive, with the 
> ability to load data files directly into the Hive warehouse directory, and 
> also to inject the appropriate table definition into the metastore.
> Some more specifics:
> Sqoop is a program implemented as a contrib module. Its frontend is invoked 
> through "bin/hadoop jar sqoop.jar ..." and allows you to connect to arbitrary 
> JDBC databases and extract their tables into files in HDFS. The underlying 
> implementation utilizes the JDBC interface of HADOOP-2536 (DBInputFormat). 
> The DBWritable implementation needed to extract a table is generated by this 
> tool, based on the types of the columns seen in the table. Sqoop uses JDBC to 
> examine the table specification and translate this to the appropriate Java 
> types.
> The generated classes are provided as .java files for the user to reuse. They 
> are also compiled into a jar and used to run a MapReduce task to perform the 
> data import. This either results in text files or SequenceFiles in HDFS. In 
> the latter case, these Java classes are embedded into the SequenceFiles as 
> well.
> The program will extract a specific table from a database, or optionally, all 
> tables. For a table, it can read all columns, or just a subset. Since 
> HADOOP-2536 requires that a sorting key be specified for the import task, 
> Sqoop will auto-detect the presence of a primary key on a table and 
> automatically use it as the sort order; the user can also manually specify a 
> sorting column.
> Example invocations:
> To import an entire database:
> hadoop jar sqoop.jar org.apache.hadoop.sqoop.Sqoop --connect 
> jdbc:mysql://db.example.com/company --all-tables
> (Requires that all tables have primary keys)
> To select a single table:
> hadoop jar sqoop.jar org.apache.hadoop.sqoop.Sqoop --connect 
> jdbc:mysql://db.example.com/company --table employees
> To select a subset of columns from a table:
> hadoop jar sqoop.jar org.apache.hadoop.sqoop.Sqoop --connect 
> jdbc:mysql://db.example.com/company --table employees --columns 
> "employee_id,first_name,last_name,salary,start_date"
> To explicitly set the sort column, import format, and import destination (the 
> table will go to /shared/imported_databases/employees):
> hadoop jar sqoop.jar org.apache.hadoop.sqoop.Sqoop --connect 
> jdbc:mysql://db.example.com/company --table employees --order-by employee_id 
> --warehouse-dir /shared/imported_databases --as-sequencefile
> Sqoop will automatically select the correct JDBC driver class name for HSQLdb 
> and MySQL; this can also be explicitly set, e.g.:
> hadoop jar sqoop.jar org.apache.hadoop.sqoop.Sqoop --connect 
> jdbc:postgresql://db.example.com/company --driver org.postgresql.Driver 
> --all-tables
> Testing has been conducted with HSQLDB and MySQL. A set of unit tests covers 
> a great deal of Sqoop's functionality, and this tool has been used in 
> practice at Cloudera and with a few other early test users on "real" 
> databases.
> A readme file is included in the patch which contains documentation on how to 
> use the tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5815) Sqoop: A database import tool for Hadoop

Reply via email to