Updated Branches:
  refs/heads/sqoop2 ec252cadc -> 7353df980

SQOOP-1225: Sqoop 2 documentation for connector development

(Masatake Iwasaki via Jarek Jarcec Cecho)


Project: http://git-wip-us.apache.org/repos/asf/sqoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/sqoop/commit/7353df98
Tree: http://git-wip-us.apache.org/repos/asf/sqoop/tree/7353df98
Diff: http://git-wip-us.apache.org/repos/asf/sqoop/diff/7353df98

Branch: refs/heads/sqoop2
Commit: 7353df98091c1e002b441a1b053e9c1feeef1867
Parents: ec252ca
Author: Jarek Jarcec Cecho <[email protected]>
Authored: Sun Nov 10 18:50:21 2013 -0800
Committer: Jarek Jarcec Cecho <[email protected]>
Committed: Sun Nov 10 18:50:21 2013 -0800

----------------------------------------------------------------------
 docs/src/site/sphinx/ConnectorDevelopment.rst | 234 +++++++++++++++++----
 1 file changed, 193 insertions(+), 41 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/sqoop/blob/7353df98/docs/src/site/sphinx/ConnectorDevelopment.rst
----------------------------------------------------------------------
diff --git a/docs/src/site/sphinx/ConnectorDevelopment.rst 
b/docs/src/site/sphinx/ConnectorDevelopment.rst
index 918ca00..5121382 100644
--- a/docs/src/site/sphinx/ConnectorDevelopment.rst
+++ b/docs/src/site/sphinx/ConnectorDevelopment.rst
@@ -18,8 +18,10 @@
 Sqoop 2 Connector Development
 =============================
 
-This document describes you how to implement connector for Sqoop 2.
+This document describes you how to implement connector for Sqoop 2
+using the code of built-in connector ( ``GenericJdbcConnector`` ) as example.
 
+.. contents::
 
 What is Connector?
 ++++++++++++++++++
@@ -33,9 +35,9 @@ Interaction with Hadoop is taken cared by common modules of 
Sqoop 2 framework.
 Connector Implementation
 ++++++++++++++++++++++++
 
-The SqoopConnector class defines functionality
+The ``SqoopConnector`` class defines functionality
 which must be provided by Connectors.
-Each Connector must extends SqoopConnector and overrides methods shown below.
+Each Connector must extends ``SqoopConnector`` and overrides methods shown 
below.
 ::
 
   public abstract String getVersion();
@@ -47,24 +49,24 @@ Each Connector must extends SqoopConnector and overrides 
methods shown below.
   public abstract Validator getValidator();
   public abstract MetadataUpgrader getMetadataUpgrader();
 
-The getImporter method returns Importer_ instance
+The ``getImporter`` method returns Importer_ instance
 which is a placeholder for the modules needed for import.
 
-The getExporter method returns Exporter_ instance
+The ``getExporter`` method returns Exporter_ instance
 which is a placeholder for the modules needed for export.
 
-Methods such as getBundle, getConnectionConfigurationClass,
-getJobConfigurationClass and getValidator
+Methods such as ``getBundle`` , ``getConnectionConfigurationClass`` ,
+``getJobConfigurationClass`` and ``getValidator``
 are concerned to `Connector configurations`_ .
 
 
 Importer
 ========
 
-Connector#getImporter method returns Importer instance
+Connector's ``getImporter`` method returns ``Importer`` instance
 which is a placeholder for the modules needed for import
 such as Partitioner_ and Extractor_ .
-Built-in GenericJdbcConnector defines Importer like this.
+Built-in ``GenericJdbcConnector`` defines ``Importer`` like this.
 ::
 
   private static final Importer IMPORTER = new Importer(
@@ -87,7 +89,7 @@ Extractor
 Extractor (E for ETL) extracts data from external database and
 writes it to Sqoop framework for import.
 
-Extractor must overrides extract method.
+Extractor must overrides ``extract`` method.
 ::
 
   public abstract void extract(ExtractorContext context,
@@ -95,10 +97,10 @@ Extractor must overrides extract method.
                                JobConfiguration jobConfiguration,
                                Partition partition);
 
-The extract method extracts data from database in some way and
-writes it to DataWriter (provided by context) as `Intermediate 
representation`_ .
+The ``extract`` method extracts data from database in some way and
+writes it to ``DataWriter`` (provided by context) as `Intermediate 
representation`_ .
 
-Extractor must iterates in the extract method until the data from database 
exhausts.
+Extractor must iterates in the ``extract`` method until the data from database 
exhausts.
 ::
 
   while (resultSet.next()) {
@@ -111,13 +113,16 @@ Extractor must iterates in the extract method until the 
data from database exhau
 Partitioner
 -----------
 
-Partitioner creates Partition instances based on configurations.
-The number of Partition instances is interpreted as the number of map tasks.
-Partition instances are passed to Extractor_ as the argument of extract method.
+Partitioner creates ``Partition`` instances based on configurations.
+The number of ``Partition`` instances is decided
+based on the value users specified as the numbers of ectractors
+in job configuration.
+
+``Partition`` instances are passed to Extractor_ as the argument of 
``extract`` method.
 Extractor_ determines which portion of the data to extract by Partition.
 
 There is no actual convention for Partition classes
-other than being actually Writable and toString()-able.
+other than being actually ``Writable`` and ``toString()`` -able.
 ::
 
   public abstract class Partition {
@@ -126,7 +131,7 @@ other than being actually Writable and toString()-able.
     public abstract String toString();
   }
 
-Connectors can define the design of Partition on their own.
+Connectors can define the design of ``Partition`` on their own.
 
 
 Initializer and Destroyer
@@ -141,10 +146,10 @@ Destroyer is instantiated after MapReduce job is finished 
for clean up.
 Exporter
 ========
 
-Connector#getExporter method returns Exporter instance
+Connector's ``getExporter`` method returns ``Exporter`` instance
 which is a placeholder for the modules needed for export
 such as Loader_ .
-Built-in GenericJdbcConnector defines Exporter like this.
+Built-in ``GenericJdbcConnector`` defines ``Exporter`` like this.
 ::
 
   private static final Exporter EXPORTER = new Exporter(
@@ -166,17 +171,17 @@ Loader
 Loader (L for ETL) receives data from Sqoop framework and
 loads it to external database.
 
-Loader must overrides load method.
+Loader must overrides ``load`` method.
 ::
 
   public abstract void load(LoaderContext context,
                             ConnectionConfiguration connectionConfiguration,
                             JobConfiguration jobConfiguration) throws 
Exception;
 
-The load method reads data from DataReader (provided by context)
+The ``load`` method reads data from ``DataReader`` (provided by context)
 in `Intermediate representation`_ and loads it to database in some way.
 
-Loader must iterates in the load method until the data from DataReader 
exhausts.
+Loader must iterates in the ``load`` method until the data from ``DataReader`` 
exhausts.
 ::
 
   while ((array = context.getDataReader().readArrayRecord()) != null) {
@@ -196,26 +201,103 @@ Destroyer is instantiated after MapReduce job is 
finished for clean up.
 Connector Configurations
 ++++++++++++++++++++++++
 
+Connector specifications
+========================
+
+Framework of the Sqoop loads definitions of connectors
+from the file named ``sqoopconnector.properties``
+which each connector implementation provides.
+::
+
+  # Generic JDBC Connector Properties
+  org.apache.sqoop.connector.class = 
org.apache.sqoop.connector.jdbc.GenericJdbcConnector
+  org.apache.sqoop.connector.name = generic-jdbc-connector
+
+
 Configurations
 ==============
 
-The definition of the configurations are represented
-by models defined in org.apache.sqoop.model package.
+Implementation of ``SqoopConnector`` overrides methods such as
+``getConnectionConfigurationClass`` and ``getJobConfigurationClass``
+returning configuration class.
+::
 
+  @Override
+  public Class getConnectionConfigurationClass() {
+    return ConnectionConfiguration.class;
+  }
 
-ConnectionConfigurationClass
-----------------------------
+  @Override
+  public Class getJobConfigurationClass(MJob.Type jobType) {
+    switch (jobType) {
+      case IMPORT:
+        return ImportJobConfiguration.class;
+      case EXPORT:
+        return ExportJobConfiguration.class;
+      default:
+        return null;
+    }
+  }
+
+Configurations are represented
+by models defined in ``org.apache.sqoop.model`` package.
+Annotations such as
+``ConfigurationClass`` , ``FormClass`` , ``Form`` and ``Input``
+are provided for defining configurations of each connectors
+using these models.
+
+``ConfigurationClass`` is place holder for ``FormClasses`` .
+::
+
+  @ConfigurationClass
+  public class ConnectionConfiguration {
 
+    @Form public ConnectionForm connection;
 
-JobConfigurationClass
----------------------
+    public ConnectionConfiguration() {
+      connection = new ConnectionForm();
+    }
+  }
+
+Each ``FormClass`` defines names and types of configs.
+::
+
+  @FormClass
+  public class ConnectionForm {
+    @Input(size = 128) public String jdbcDriver;
+    @Input(size = 128) public String connectionString;
+    @Input(size = 40)  public String username;
+    @Input(size = 40, sensitive = true) public String password;
+    @Input public Map<String, String> jdbcProperties;
+  }
 
 
 ResourceBundle
 ==============
 
-Resources for Configurations_ are stored in properties file
-accessed by getBundle method of the Connector.
+Resources used by client user interfaces are defined in properties file.
+::
+
+  # jdbc driver
+  connection.jdbcDriver.label = JDBC Driver Class
+  connection.jdbcDriver.help = Enter the fully qualified class name of the 
JDBC \
+                     driver that will be used for establishing this connection.
+
+  # connect string
+  connection.connectionString.label = JDBC Connection String
+  connection.connectionString.help = Enter the value of JDBC connection string 
to be \
+                     used by this connector for creating connections.
+
+  ...
+
+Those resources are loaded by ``getBundle`` method of connector.
+::
+
+  @Override
+  public ResourceBundle getBundle(Locale locale) {
+    return ResourceBundle.getBundle(
+    GenericJdbcConnectorConstants.RESOURCE_BUNDLE_NAME, locale);
+  }
 
 
 Validator
@@ -227,24 +309,94 @@ Validator validates configurations set by users.
 Internal of Sqoop2 MapReduce Job
 ++++++++++++++++++++++++++++++++
 
-Sqoop 2 provides common MapReduce modules such as SqoopMapper and SqoopReducer
+Sqoop 2 provides common MapReduce modules such as ``SqoopMapper`` and 
``SqoopReducer``
 for the both of import and export.
 
-- InputFormat create splits using Partitioner.
+- For import, ``Extractor`` provided by connector extracts data from databases,
+  and ``Loader`` provided by Sqoop2 loads data into Hadoop.
 
-- SqoopMapper invokes Extractor's extract method.
+- For export, ``Extractor`` provided by Sqoop2 exracts data from Hadoop,
+  and ``Loader`` provided by connector loads data into databases.
 
-- SqoopReducer do no actual works.
+The diagram below describes the initialization phase of IMPORT job.
+``SqoopInputFormat`` create splits using ``Partitioner`` .
+::
 
-- OutputFormat invokes Loader's load method (via 
SqoopOutputFormatLoadExecutor).
+      ,----------------.          ,-----------.
+      |SqoopInputFormat|          |Partitioner|
+      `-------+--------'          `-----+-----'
+   getSplits  |                         |
+  ----------->|                         |
+              |      getPartitions      |
+              |------------------------>|
+              |                         |         ,---------.
+              |                         |-------> |Partition|
+              |                         |         `----+----'
+              |<- - - - - - - - - - - - |              |
+              |                         |              |          ,----------.
+              |-------------------------------------------------->|SqoopSplit|
+              |                         |              |          `----+-----'
+
+The diagram below describes the map phase of IMPORT job.
+``SqoopMapper`` invokes extractor's ``extract`` method.
+::
 
-.. todo: sequence diagram like figure.
+      ,-----------.
+      |SqoopMapper|
+      `-----+-----'
+     run    |
+  --------->|                                   ,-------------.
+            |---------------------------------->|MapDataWriter|
+            |                                   `------+------'
+            |                ,---------.               |
+            |--------------> |Extractor|               |
+            |                `----+----'               |
+            |      extract        |                    |
+            |-------------------->|                    |
+            |                     |                    |
+           read from DB           |                    |
+  <-------------------------------|      write*        |
+            |                     |------------------->|
+            |                     |                    |           ,----.
+            |                     |                    |---------->|Data|
+            |                     |                    |           `-+--'
+            |                     |                    |
+            |                     |                    |      context.write
+            |                     |                    
|-------------------------->
+
+The diagram below decribes the reduce phase of EXPORT job.
+``OutputFormat`` invokes loader's ``load`` method (via 
``SqoopOutputFormatLoadExecutor`` ).
+::
 
-For import, Extractor provided by Connector extracts data from databases,
-and Loader provided by Sqoop2 loads data into Hadoop.
+    ,-------.  ,---------------------.
+    |Reducer|  |SqoopNullOutputFormat|
+    `---+---'  `----------+----------'
+        |                 |   ,-----------------------------.
+        |                 |-> |SqoopOutputFormatLoadExecutor|
+        |                 |   `--------------+--------------'        ,----.
+        |                 |                  |---------------------> |Data|
+        |                 |                  |                       `-+--'
+        |                 |                  |   ,-----------------.   |
+        |                 |                  |-> |SqoopRecordWriter|   |
+      getRecordWriter     |                  |   `--------+--------'   |
+  ----------------------->| getRecordWriter  |            |            |
+        |                 |----------------->|            |            |     
,--------------.
+        |                 |                  |-----------------------------> 
|ConsumerThread|
+        |                 |                  |            |            |     
`------+-------'
+        |                 |<- - - - - - - - -|            |            |       
     |    ,------.
+  <- - - - - - - - - - - -|                  |            |            |       
     |--->|Loader|
+        |                 |                  |            |            |       
     |    `--+---'
+        |                 |                  |            |            |       
     |       |
+        |                 |                  |            |            |       
     | load  |
+   run  |                 |                  |            |            |       
     |------>|
+  ----->|                 |     write        |            |            |       
     |       |
+        |------------------------------------------------>| setContent |       
     | read* |
+        |                 |                  |            |----------->| 
getContent |<------|
+        |                 |                  |            |            
|<-----------|       |
+        |                 |                  |            |            |       
     | - - ->|
+        |                 |                  |            |            |       
     |       | write into DB
+        |                 |                  |            |            |       
     |       |-------------->
 
-For export, Extractor provided Sqoop2 exracts data from Hadoop,
-and Loader provided by Connector loads data into databases.
 
 
 .. _`Intermediate representation`: 
https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+representation

Reply via email to