Repository: sqoop Updated Branches: refs/heads/sqoop2 293e9ef63 -> ee097891b
SQOOP-1757: Sqoop2: Document generic jdbc connector (Abraham Elmahrek via Jarek Jarcec Cecho) Project: http://git-wip-us.apache.org/repos/asf/sqoop/repo Commit: http://git-wip-us.apache.org/repos/asf/sqoop/commit/ee097891 Tree: http://git-wip-us.apache.org/repos/asf/sqoop/tree/ee097891 Diff: http://git-wip-us.apache.org/repos/asf/sqoop/diff/ee097891 Branch: refs/heads/sqoop2 Commit: ee097891b727acc6dd0b1f6222d7758e436f71c9 Parents: 293e9ef Author: Jarek Jarcec Cecho <[email protected]> Authored: Sat Dec 13 07:55:31 2014 -0800 Committer: Jarek Jarcec Cecho <[email protected]> Committed: Sat Dec 13 07:55:31 2014 -0800 ---------------------------------------------------------------------- docs/src/site/sphinx/Connectors.rst | 199 +++++++++++++++++++++++++++++++ docs/src/site/sphinx/index.rst | 1 + 2 files changed, 200 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/sqoop/blob/ee097891/docs/src/site/sphinx/Connectors.rst ---------------------------------------------------------------------- diff --git a/docs/src/site/sphinx/Connectors.rst b/docs/src/site/sphinx/Connectors.rst new file mode 100644 index 0000000..bcc5b43 --- /dev/null +++ b/docs/src/site/sphinx/Connectors.rst @@ -0,0 +1,199 @@ +.. Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + + +================== +Sqoop 2 Connectors +================== + +This document describes how to use the built-in connectors. This includes a detailed description of how connectors partition, format their output, extract data, and load data. + +.. contents:: + :depth: 3 + +++++++++++++++++++++++ +Generic JDBC Connector +++++++++++++++++++++++ + +The Generic JDBC Connector can connect to any data source that adheres to the **JDBC 4** specification. + +----- +Usage +----- + +To use the Generic JDBC Connector, create a link for the connector and a job that uses the link. + +**Link Configuration** +++++++++++++++++++++++ + +Inputs associated with the link configuration include: + ++-----------------------------+---------+-----------------------------------------------------------------------+------------------------------------------+ +| Input | Type | Description | Example | ++=============================+=========+=======================================================================+==========================================+ +| JDBC Driver Class | String | The full class name of the JDBC driver. | com.mysql.jdbc.Driver | +| | | *Required* and accessible by the Sqoop server. | | ++-----------------------------+---------+-----------------------------------------------------------------------+------------------------------------------+ +| JDBC Connection String | String | The JDBC connection string to use when connecting to the data source. | jdbc:mysql://localhost/test | +| | | *Required*. Connectivity upon creation is optional. | | ++-----------------------------+---------+-----------------------------------------------------------------------+------------------------------------------+ +| Username | String | The username to provide when connecting to the data source. | sqoop | +| | | *Optional*. Connectivity upon creation is optional. | | ++-----------------------------+---------+-----------------------------------------------------------------------+------------------------------------------+ +| Password | String | The password to provide when connecting to the data source. | sqoop | +| | | *Optional*. Connectivity upon creation is optional. | | ++-----------------------------+---------+-----------------------------------------------------------------------+------------------------------------------+ +| JDBC Connection Properties | Map | A map of JDBC connection properties to pass to the JDBC driver | profileSQL=true&useFastDateParsing=false | +| | | *Optional*. | | ++-----------------------------+---------------------------------------------------------------------------------+------------------------------------------+ + +**FROM Job Configuration** +++++++++++++++++++++++++++ + +Inputs associated with the Job configuration for the FROM direction include: + ++-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+ +| Input | Type | Description | Example | ++=============================+=========+=========================================================================+=============================================+ +| Schema name | String | The schema name the table is part of. | sqoop | +| | | *Optional* | | ++-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+ +| Table name | String | The table name to import data from. | test | +| | | *Optional*. See note below. | | ++-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+ +| Table SQL statement | String | The SQL statement used to perform a **free form query**. | ``SELECT COUNT(*) FROM test ${CONDITIONS}`` | +| | | *Optional*. See note below. | | ++-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+ +| Table column names | String | Columns to extract from the JDBC data source. | col1,col2 | +| | | *Optional* Comma separated list of columns. | | ++-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+ +| Partition column name | Map | The column name used to partition the data transfer process. | col1 | +| | | *Optional*. Defaults to primary key of table. | | ++-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+ +| Null value allowed for | Boolean | True or false depending on whether NULL values are allowed in data | true | +| the partition column | | of the Partition column. *Optional*. | | ++-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+ +| Boundary query | String | The query used to define an upper and lower boundary when partitioning. | | +| | | *Optional*. | | ++-----------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+ + +**Notes** +========= + +1. *Table name* and *Table SQL statement* are mutually exclusive. If *Table name* is provided, the *Table SQL statement* should not be provided. If *Table SQL statement* is provided then *Table name* should not be provided. +2. *Table column names* should be provided only if *Table name* is provided. + +**TO Job Configuration** +++++++++++++++++++++++++ + +Inputs associated with the Job configuration for the TO direction include: + ++-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+ +| Input | Type | Description | Example | ++=============================+=========+=========================================================================+=================================================+ +| Schema name | String | The schema name the table is part of. | sqoop | +| | | *Optional* | | ++-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+ +| Table name | String | The table name to import data from. | test | +| | | *Optional*. See note below. | | ++-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+ +| Table SQL statement | String | The SQL statement used to perform a **free form query**. | ``INSERT INTO test (col1, col2) VALUES (?, ?)`` | +| | | *Optional*. See note below. | | ++-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+ +| Table column names | String | Columns to insert into the JDBC data source. | col1,col2 | +| | | *Optional* Comma separated list of columns. | | ++-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+ +| Stage table name | String | The name of the table used as a *staging table*. | staging | +| | | *Optional*. | | ++-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+ +| Should clear stage table | Boolean | True or false depending on whether the staging table should be cleared | true | +| | | after the data transfer has finished. *Optional*. | | ++-----------------------------+-----------------------------------------------------------------------------------+-------------------------------------------------+ + +**Notes** +========= + +1. *Table name* and *Table SQL statement* are mutually exclusive. If *Table name* is provided, the *Table SQL statement* should not be provided. If *Table SQL statement* is provided then *Table name* should not be provided. +2. *Table column names* should be provided only if *Table name* is provided. + +----------- +Partitioner +----------- + +The Generic JDBC Connector partitioner generates conditions to be used by the extractor. +It varies in how it partitions data transfer based on the partition column data type. +Though, each strategy roughly takes on the following form: +:: + + (upper boundary - lower boundary) / (max partitions) + +By default, the *primary key* will be used to partition the data unless otherwise specified. + +The following data types are currently supported: + +1. TINYINT +2. SMALLINT +3. INTEGER +4. BIGINT +5. REAL +6. FLOAT +7. DOUBLE +8. NUMERIC +9. DECIMAL +10. BIT +11. BOOLEAN +12. DATE +13. TIME +14. TIMESTAMP +15. CHAR +16. VARCHAR +17. LONGVARCHAR + +--------- +Extractor +--------- + +During the *extraction* phase, the JDBC data source is queried using SQL. This SQL will vary based on your configuration. + +- If *Table name* is provided, then the SQL statement generated will take on the form ``SELECT * FROM <table name>``. +- If *Table name* and *Columns* are provided, then the SQL statement generated will take on the form ``SELECT <columns> FROM <table name>``. +- If *Table SQL statement* is provided, then the provided SQL statement will be used. + +The conditions generated by the *partitioner* are appended to the end of the SQL query to query a section of data. + +The Generic JDBC connector extracts CSV data usable by the *CSV Intermediate Data Format*. + +------ +Loader +------ + +During the *loading* phase, the JDBC data source is queried using SQL. This SQL will vary based on your configuration. + +- If *Table name* is provided, then the SQL statement generated will take on the form ``INSERT INTO <table name> (col1, col2, ...) VALUES (?,?,..)``. +- If *Table name* and *Columns* are provided, then the SQL statement generated will take on the form ``INSERT INTO <table name> (<columns>) VALUES (?,?,..)``. +- If *Table SQL statement* is provided, then the provided SQL statement will be used. + +This connector expects to receive CSV data consumable by the *CSV Intermediate Data Format*. + +---------- +Destroyers +---------- + +The Generic JDBC Connector performs two operations in the destroyer in the TO direction: + +1. Copy the contents of the staging table to the desired table. +2. Clear the staging table. + +No operations are performed in the FROM direction. http://git-wip-us.apache.org/repos/asf/sqoop/blob/ee097891/docs/src/site/sphinx/index.rst ---------------------------------------------------------------------- diff --git a/docs/src/site/sphinx/index.rst b/docs/src/site/sphinx/index.rst index 8257858..9c95d08 100644 --- a/docs/src/site/sphinx/index.rst +++ b/docs/src/site/sphinx/index.rst @@ -47,6 +47,7 @@ If you are excited to start using Sqoop you can follow the links below to get a - `Sqoop 5 Minute Demo <Sqoop5MinutesDemo.html>`_ - `Command Line Shell Usage Guide <CommandLineClient.html>`_ +- `Connectors <Connectors.html>`_ Developer Guide -----------------
