svn commit: r1678546 [2/16] - in /sqoop/site/trunk/content/resources/docs/1.4.6: ./ api/ api/com/ api/com/cloudera/ api/com/cloudera/sqoop/ api/com/cloudera/sqoop/lib/ api/com/cloudera/sqoop/lib/class-use/ api/org/ api/org/apache/ api/org/apache/sqoop/...

gwenshap Sun, 10 May 2015 03:11:11 -0700

Added: sqoop/site/trunk/content/resources/docs/1.4.6/SqoopUserGuide.html
URL: 
http://svn.apache.org/viewvc/sqoop/site/trunk/content/resources/docs/1.4.6/SqoopUserGuide.html?rev=1678546&view=auto
==============================================================================
--- sqoop/site/trunk/content/resources/docs/1.4.6/SqoopUserGuide.html (added)
+++ sqoop/site/trunk/content/resources/docs/1.4.6/SqoopUserGuide.html Sun May 
10 10:10:43 2015
@@ -0,0 +1,4607 @@
+<html><head><meta http-equiv="Content-Type" content="text/html; 
charset=ISO-8859-1"><title>Sqoop User Guide (v1.4.6)</title><link 
rel="stylesheet" href="docbook.css" type="text/css"><meta name="generator" 
content="DocBook XSL Stylesheets V1.75.2"></head><body><div style="clear:both; 
margin-bottom: 4px"></div><div align="center"><a href="index.html"><img 
src="images/home.png" alt="Documentation Home"></a></div><span 
class="breadcrumbs"><div class="breadcrumbs"><span 
class="breadcrumb-node">Sqoop User Guide (v1.4.6)</span></div></span><div 
lang="en" class="article" title="Sqoop User Guide (v1.4.6)"><div 
class="titlepage"><div><div><h2 class="title"><a name="idp120136"></a>Sqoop 
User Guide (v1.4.6)</h2></div></div><hr></div><div class="toc"><p><b>Table of 
Contents</b></p><dl><dt><span class="section"><a href="#_introduction">1. 
Introduction</a></span></dt><dt><span class="section"><a 
href="#_supported_releases">2. Supported Releases</a></span></dt><dt><span 
class="section"><a href="#_s
 qoop_releases">3. Sqoop Releases</a></span></dt><dt><span class="section"><a 
href="#_prerequisites">4. Prerequisites</a></span></dt><dt><span 
class="section"><a href="#_basic_usage">5. Basic Usage</a></span></dt><dt><span 
class="section"><a href="#_sqoop_tools">6. Sqoop 
Tools</a></span></dt><dd><dl><dt><span class="section"><a 
href="#_using_command_aliases">6.1. Using Command 
Aliases</a></span></dt><dt><span class="section"><a 
href="#_controlling_the_hadoop_installation">6.2. Controlling the Hadoop 
Installation</a></span></dt><dt><span class="section"><a 
href="#_using_generic_and_specific_arguments">6.3. Using Generic and Specific 
Arguments</a></span></dt><dt><span class="section"><a 
href="#_using_options_files_to_pass_arguments">6.4. Using Options Files to Pass 
Arguments</a></span></dt><dt><span class="section"><a href="#_using_tools">6.5. 
Using Tools</a></span></dt></dl></dd><dt><span class="section"><a 
href="#_literal_sqoop_import_literal">7. <code class="literal">sqoop-import</c
 ode></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose">7.1. 
Purpose</a></span></dt><dt><span class="section"><a href="#_syntax">7.2. 
Syntax</a></span></dt><dd><dl><dt><span class="section"><a 
href="#_connecting_to_a_database_server">7.2.1. Connecting to a Database 
Server</a></span></dt><dt><span class="section"><a 
href="#_selecting_the_data_to_import">7.2.2. Selecting the Data to 
Import</a></span></dt><dt><span class="section"><a 
href="#_free_form_query_imports">7.2.3. Free-form Query 
Imports</a></span></dt><dt><span class="section"><a 
href="#_controlling_parallelism">7.2.4. Controlling 
Parallelism</a></span></dt><dt><span class="section"><a 
href="#_controlling_distributed_cache">7.2.5. Controlling Distributed 
Cache</a></span></dt><dt><span class="section"><a 
href="#_controlling_the_import_process">7.2.6. Controlling the Import 
Process</a></span></dt><dt><span class="section"><a 
href="#_controlling_transaction_isolation">7.2.7. Controlling transaction 
isolation</a
 ></span></dt><dt><span class="section"><a 
 >href="#_controlling_type_mapping">7.2.8. Controlling type 
 >mapping</a></span></dt><dt><span class="section"><a 
 >href="#_incremental_imports">7.2.9. Incremental 
 >Imports</a></span></dt><dt><span class="section"><a 
 >href="#_file_formats">7.2.10. File Formats</a></span></dt><dt><span 
 >class="section"><a href="#_large_objects">7.2.11. Large 
 >Objects</a></span></dt><dt><span class="section"><a 
 >href="#_importing_data_into_hive">7.2.12. Importing Data Into 
 >Hive</a></span></dt><dt><span class="section"><a 
 >href="#_importing_data_into_hbase">7.2.13. Importing Data Into 
 >HBase</a></span></dt><dt><span class="section"><a 
 >href="#_importing_data_into_accumulo">7.2.14. Importing Data Into 
 >Accumulo</a></span></dt><dt><span class="section"><a 
 >href="#_additional_import_configuration_properties">7.2.15. Additional Import 
 >Configuration Properties</a></span></dt></dl></dd><dt><span 
 >class="section"><a href="#_example_invocations">7.3. Example 
 >Invocations</a></span></dt>
 </dl></dd><dt><span class="section"><a 
href="#_literal_sqoop_import_all_tables_literal">8. <code 
class="literal">sqoop-import-all-tables</code></a></span></dt><dd><dl><dt><span 
class="section"><a href="#_purpose_2">8.1. Purpose</a></span></dt><dt><span 
class="section"><a href="#_syntax_2">8.2. Syntax</a></span></dt><dt><span 
class="section"><a href="#_example_invocations_2">8.3. Example 
Invocations</a></span></dt></dl></dd><dt><span class="section"><a 
href="#_literal_sqoop_import_mainframe_literal">9. <code 
class="literal">sqoop-import-mainframe</code></a></span></dt><dd><dl><dt><span 
class="section"><a href="#_purpose_3">9.1. Purpose</a></span></dt><dt><span 
class="section"><a href="#_syntax_3">9.2. 
Syntax</a></span></dt><dd><dl><dt><span class="section"><a 
href="#_connecting_to_a_mainframe">9.2.1. Connecting to a 
Mainframe</a></span></dt><dt><span class="section"><a 
href="#_selecting_the_files_to_import">9.2.2. Selecting the Files to 
Import</a></span></dt><dt><span class="section"
 ><a href="#_controlling_parallelism_2">9.2.3. Controlling 
 >Parallelism</a></span></dt><dt><span class="section"><a 
 >href="#_controlling_distributed_cache_2">9.2.4. Controlling Distributed 
 >Cache</a></span></dt><dt><span class="section"><a 
 >href="#_controlling_the_import_process_2">9.2.5. Controlling the Import 
 >Process</a></span></dt><dt><span class="section"><a 
 >href="#_file_formats_2">9.2.6. File Formats</a></span></dt><dt><span 
 >class="section"><a href="#_importing_data_into_hive_2">9.2.7. Importing Data 
 >Into Hive</a></span></dt><dt><span class="section"><a 
 >href="#_importing_data_into_hbase_2">9.2.8. Importing Data Into 
 >HBase</a></span></dt><dt><span class="section"><a 
 >href="#_importing_data_into_accumulo_2">9.2.9. Importing Data Into 
 >Accumulo</a></span></dt><dt><span class="section"><a 
 >href="#_additional_import_configuration_properties_2">9.2.10. Additional 
 >Import Configuration Properties</a></span></dt></dl></dd><dt><span 
 >class="section"><a href="#_example_invocations_3">9.3. Example 
 Invocations</a></span></dt></dl></dd><dt><span class="section"><a 
href="#_literal_sqoop_export_literal">10. <code 
class="literal">sqoop-export</code></a></span></dt><dd><dl><dt><span 
class="section"><a href="#_purpose_4">10.1. Purpose</a></span></dt><dt><span 
class="section"><a href="#_syntax_4">10.2. Syntax</a></span></dt><dt><span 
class="section"><a href="#_inserts_vs_updates">10.3. Inserts vs. 
Updates</a></span></dt><dt><span class="section"><a 
href="#_exports_and_transactions">10.4. Exports and 
Transactions</a></span></dt><dt><span class="section"><a 
href="#_failed_exports">10.5. Failed Exports</a></span></dt><dt><span 
class="section"><a href="#_example_invocations_4">10.6. Example 
Invocations</a></span></dt></dl></dd><dt><span class="section"><a 
href="#validation">11. <code 
class="literal">validation</code></a></span></dt><dd><dl><dt><span 
class="section"><a href="#_purpose_5">11.1. Purpose</a></span></dt><dt><span 
class="section"><a href="#_introduction_2">11.2. Introduction</
 a></span></dt><dt><span class="section"><a href="#_syntax_5">11.3. 
Syntax</a></span></dt><dt><span class="section"><a href="#_configuration">11.4. 
Configuration</a></span></dt><dt><span class="section"><a 
href="#_limitations">11.5. Limitations</a></span></dt><dt><span 
class="section"><a href="#_example_invocations_5">11.6. Example 
Invocations</a></span></dt></dl></dd><dt><span class="section"><a 
href="#_saved_jobs">12. Saved Jobs</a></span></dt><dt><span class="section"><a 
href="#_literal_sqoop_job_literal">13. <code 
class="literal">sqoop-job</code></a></span></dt><dd><dl><dt><span 
class="section"><a href="#_purpose_6">13.1. Purpose</a></span></dt><dt><span 
class="section"><a href="#_syntax_6">13.2. Syntax</a></span></dt><dt><span 
class="section"><a href="#_saved_jobs_and_passwords">13.3. Saved jobs and 
passwords</a></span></dt><dt><span class="section"><a 
href="#_saved_jobs_and_incremental_imports">13.4. Saved jobs and incremental 
imports</a></span></dt></dl></dd><dt><span class="s
 ection"><a href="#_literal_sqoop_metastore_literal">14. <code 
class="literal">sqoop-metastore</code></a></span></dt><dd><dl><dt><span 
class="section"><a href="#_purpose_7">14.1. Purpose</a></span></dt><dt><span 
class="section"><a href="#_syntax_7">14.2. 
Syntax</a></span></dt></dl></dd><dt><span class="section"><a 
href="#_literal_sqoop_merge_literal">15. <code 
class="literal">sqoop-merge</code></a></span></dt><dd><dl><dt><span 
class="section"><a href="#_purpose_8">15.1. Purpose</a></span></dt><dt><span 
class="section"><a href="#_syntax_8">15.2. 
Syntax</a></span></dt></dl></dd><dt><span class="section"><a 
href="#_literal_sqoop_codegen_literal">16. <code 
class="literal">sqoop-codegen</code></a></span></dt><dd><dl><dt><span 
class="section"><a href="#_purpose_9">16.1. Purpose</a></span></dt><dt><span 
class="section"><a href="#_syntax_9">16.2. Syntax</a></span></dt><dt><span 
class="section"><a href="#_example_invocations_6">16.3. Example 
Invocations</a></span></dt></dl></dd><dt><span clas
 s="section"><a href="#_literal_sqoop_create_hive_table_literal">17. <code 
class="literal">sqoop-create-hive-table</code></a></span></dt><dd><dl><dt><span 
class="section"><a href="#_purpose_10">17.1. Purpose</a></span></dt><dt><span 
class="section"><a href="#_syntax_10">17.2. Syntax</a></span></dt><dt><span 
class="section"><a href="#_example_invocations_7">17.3. Example 
Invocations</a></span></dt></dl></dd><dt><span class="section"><a 
href="#_literal_sqoop_eval_literal">18. <code 
class="literal">sqoop-eval</code></a></span></dt><dd><dl><dt><span 
class="section"><a href="#_purpose_11">18.1. Purpose</a></span></dt><dt><span 
class="section"><a href="#_syntax_11">18.2. Syntax</a></span></dt><dt><span 
class="section"><a href="#_example_invocations_8">18.3. Example 
Invocations</a></span></dt></dl></dd><dt><span class="section"><a 
href="#_literal_sqoop_list_databases_literal">19. <code 
class="literal">sqoop-list-databases</code></a></span></dt><dd><dl><dt><span 
class="section"><a href="#_pu
 rpose_12">19.1. Purpose</a></span></dt><dt><span class="section"><a 
href="#_syntax_12">19.2. Syntax</a></span></dt><dt><span class="section"><a 
href="#_example_invocations_9">19.3. Example 
Invocations</a></span></dt></dl></dd><dt><span class="section"><a 
href="#_literal_sqoop_list_tables_literal">20. <code 
class="literal">sqoop-list-tables</code></a></span></dt><dd><dl><dt><span 
class="section"><a href="#_purpose_13">20.1. Purpose</a></span></dt><dt><span 
class="section"><a href="#_syntax_13">20.2. Syntax</a></span></dt><dt><span 
class="section"><a href="#_example_invocations_10">20.3. Example 
Invocations</a></span></dt></dl></dd><dt><span class="section"><a 
href="#_literal_sqoop_help_literal">21. <code 
class="literal">sqoop-help</code></a></span></dt><dd><dl><dt><span 
class="section"><a href="#_purpose_14">21.1. Purpose</a></span></dt><dt><span 
class="section"><a href="#_syntax_14">21.2. Syntax</a></span></dt><dt><span 
class="section"><a href="#_example_invocations_11">21.3. Exampl
 e Invocations</a></span></dt></dl></dd><dt><span class="section"><a 
href="#_literal_sqoop_version_literal">22. <code 
class="literal">sqoop-version</code></a></span></dt><dd><dl><dt><span 
class="section"><a href="#_purpose_15">22.1. Purpose</a></span></dt><dt><span 
class="section"><a href="#_syntax_15">22.2. Syntax</a></span></dt><dt><span 
class="section"><a href="#_example_invocations_12">22.3. Example 
Invocations</a></span></dt></dl></dd><dt><span class="section"><a 
href="#_sqoop_hcatalog_integration">23. Sqoop-HCatalog 
Integration</a></span></dt><dd><dl><dt><span class="section"><a 
href="#_hcatalog_background">23.1. HCatalog Background</a></span></dt><dt><span 
class="section"><a href="#_exposing_hcatalog_tables_to_sqoop">23.2. Exposing 
HCatalog Tables to Sqoop</a></span></dt><dd><dl><dt><span class="section"><a 
href="#_new_command_line_options">23.2.1. New Command Line 
Options</a></span></dt><dt><span class="section"><a 
href="#_supported_sqoop_hive_options">23.2.2. Supported Sqoop
  Hive Options</a></span></dt><dt><span class="section"><a 
href="#_direct_mode_support">23.2.3. Direct Mode 
support</a></span></dt><dt><span class="section"><a 
href="#_unsupported_sqoop_options">23.2.4. Unsupported Sqoop 
Options</a></span></dt><dd><dl><dt><span class="section"><a 
href="#_unsupported_sqoop_hive_import_options">23.2.4.1. Unsupported Sqoop Hive 
Import Options</a></span></dt><dt><span class="section"><a 
href="#_unsupported_sqoop_export_and_import_options">23.2.4.2. Unsupported 
Sqoop Export and Import Options</a></span></dt></dl></dd><dt><span 
class="section"><a href="#_ignored_sqoop_options">23.2.5. Ignored Sqoop 
Options</a></span></dt></dl></dd><dt><span class="section"><a 
href="#_automatic_table_creation">23.3. Automatic Table 
Creation</a></span></dt><dt><span class="section"><a 
href="#_delimited_text_formats_and_field_and_line_delimiter_characters">23.4. 
Delimited Text Formats and Field and Line Delimiter 
Characters</a></span></dt><dt><span class="section"><a href="#_
 hcatalog_table_requirements">23.5. HCatalog Table 
Requirements</a></span></dt><dt><span class="section"><a 
href="#_support_for_partitioning">23.6. Support for 
Partitioning</a></span></dt><dt><span class="section"><a 
href="#_schema_mapping">23.7. Schema Mapping</a></span></dt><dt><span 
class="section"><a href="#_support_for_hcatalog_data_types">23.8. Support for 
HCatalog Data Types</a></span></dt><dt><span class="section"><a 
href="#_providing_hive_and_hcatalog_libraries_for_the_sqoop_job">23.9. 
Providing Hive and HCatalog Libraries for the Sqoop 
Job</a></span></dt><dt><span class="section"><a href="#_examples">23.10. 
Examples</a></span></dt><dt><span class="section"><a href="#_import">23.11. 
Import</a></span></dt><dt><span class="section"><a href="#_export">23.12. 
Export</a></span></dt></dl></dd><dt><span class="section"><a 
href="#_compatibility_notes">24. Compatibility 
Notes</a></span></dt><dd><dl><dt><span class="section"><a 
href="#_supported_databases">24.1. Supported Databases</a
 ></span></dt><dt><span class="section"><a href="#_mysql">24.2. 
 >MySQL</a></span></dt><dd><dl><dt><span class="section"><a 
 >href="#_zerodatetimebehavior">24.2.1. 
 >zeroDateTimeBehavior</a></span></dt><dt><span class="section"><a 
 >href="#_literal_unsigned_literal_columns">24.2.2. <code 
 >class="literal">UNSIGNED</code> columns</a></span></dt><dt><span 
 >class="section"><a 
 >href="#_literal_blob_literal_and_literal_clob_literal_columns">24.2.3. <code 
 >class="literal">BLOB</code> and <code class="literal">CLOB</code> 
 >columns</a></span></dt><dt><span class="section"><a 
 >href="#_importing_views_in_direct_mode">24.2.4. Importing views in direct 
 >mode</a></span></dt></dl></dd><dt><span class="section"><a 
 >href="#_postgresql">24.3. PostgreSQL</a></span></dt><dd><dl><dt><span 
 >class="section"><a href="#_importing_views_in_direct_mode_2">24.3.1. 
 >Importing views in direct mode</a></span></dt></dl></dd><dt><span 
 >class="section"><a href="#_oracle">24.4. 
 >Oracle</a></span></dt><dd><dl><dt><span class="section"><a 
 href="#_dates_and_times">24.4.1. Dates and 
Times</a></span></dt></dl></dd><dt><span class="section"><a 
href="#_schema_definition_in_hive">24.5. Schema Definition in 
Hive</a></span></dt><dt><span class="section"><a href="#_cubrid">24.6. 
CUBRID</a></span></dt></dl></dd><dt><span class="section"><a 
href="#connectors">25. Notes for specific 
connectors</a></span></dt><dd><dl><dt><span class="section"><a 
href="#_mysql_jdbc_connector">25.1. MySQL JDBC 
Connector</a></span></dt><dd><dl><dt><span class="section"><a 
href="#_upsert_functionality">25.1.1. Upsert 
functionality</a></span></dt></dl></dd><dt><span class="section"><a 
href="#_mysql_direct_connector">25.2. MySQL Direct 
Connector</a></span></dt><dd><dl><dt><span class="section"><a 
href="#_requirements">25.2.1. Requirements</a></span></dt><dt><span 
class="section"><a href="#_limitations_2">25.2.2. 
Limitations</a></span></dt><dt><span class="section"><a 
href="#_direct_mode_transactions">25.2.3. Direct-mode 
Transactions</a></span></dt></dl
 ></dd><dt><span class="section"><a href="#_microsoft_sql_connector">25.3. 
 >Microsoft SQL Connector</a></span></dt><dd><dl><dt><span class="section"><a 
 >href="#_extra_arguments">25.3.1. Extra arguments</a></span></dt><dt><span 
 >class="section"><a href="#_allow_identity_inserts">25.3.2. Allow identity 
 >inserts</a></span></dt><dt><span class="section"><a 
 >href="#_non_resilient_operations">25.3.3. Non-resilient 
 >operations</a></span></dt><dt><span class="section"><a 
 >href="#_schema_support">25.3.4. Schema support</a></span></dt><dt><span 
 >class="section"><a href="#_table_hints">25.3.5. Table 
 >hints</a></span></dt></dl></dd><dt><span class="section"><a 
 >href="#_postgresql_connector">25.4. PostgreSQL 
 >Connector</a></span></dt><dd><dl><dt><span class="section"><a 
 >href="#_extra_arguments_2">25.4.1. Extra arguments</a></span></dt><dt><span 
 >class="section"><a href="#_schema_support_2">25.4.2. Schema 
 >support</a></span></dt></dl></dd><dt><span class="section"><a 
 >href="#_postgresql_direct_connector">25.5. 
 PostgreSQL Direct Connector</a></span></dt><dd><dl><dt><span 
class="section"><a href="#_requirements_2">25.5.1. 
Requirements</a></span></dt><dt><span class="section"><a 
href="#_limitations_3">25.5.2. Limitations</a></span></dt></dl></dd><dt><span 
class="section"><a href="#_pg_bulkload_connector">25.6. pg_bulkload 
connector</a></span></dt><dd><dl><dt><span class="section"><a 
href="#_purpose_16">25.6.1. Purpose</a></span></dt><dt><span class="section"><a 
href="#_requirements_3">25.6.2. Requirements</a></span></dt><dt><span 
class="section"><a href="#_syntax_16">25.6.3. Syntax</a></span></dt><dt><span 
class="section"><a href="#_data_staging">25.6.4. Data 
Staging</a></span></dt></dl></dd><dt><span class="section"><a 
href="#_netezza_connector">25.7. Netezza 
Connector</a></span></dt><dd><dl><dt><span class="section"><a 
href="#_extra_arguments_3">25.7.1. Extra arguments</a></span></dt><dt><span 
class="section"><a href="#_direct_mode">25.7.2. Direct 
Mode</a></span></dt><dt><span class="secti
 on"><a href="#_null_string_handling">25.7.3. Null string 
handling</a></span></dt></dl></dd><dt><span class="section"><a 
href="#_data_connector_for_oracle_and_hadoop">25.8. Data Connector for Oracle 
and Hadoop</a></span></dt><dd><dl><dt><span class="section"><a 
href="#_about">25.8.1. About</a></span></dt><dd><dl><dt><span 
class="section"><a href="#_jobs">25.8.1.1. Jobs</a></span></dt><dt><span 
class="section"><a 
href="#_how_the_standard_oracle_manager_works_for_imports">25.8.1.2. How The 
Standard Oracle Manager Works for Imports</a></span></dt><dt><span 
class="section"><a 
href="#_how_the_data_connector_for_oracle_and_hadoop_works_for_imports">25.8.1.3.
 How The Data Connector for Oracle and Hadoop Works for 
Imports</a></span></dt><dt><span class="section"><a 
href="#_data_connector_for_oracle_and_hadoop_exports">25.8.1.4. Data Connector 
for Oracle and Hadoop Exports</a></span></dt></dl></dd><dt><span 
class="section"><a href="#_requirements_4">25.8.2. 
Requirements</a></span></dt><dd><dl
 ><dt><span class="section"><a 
 >href="#_ensure_the_oracle_database_jdbc_driver_is_setup_correctly">25.8.2.1. 
 >Ensure The Oracle Database JDBC Driver Is Setup 
 >Correctly</a></span></dt><dt><span class="section"><a 
 >href="#_oracle_roles_and_privileges">25.8.2.2. Oracle Roles and 
 >Privileges</a></span></dt><dt><span class="section"><a 
 >href="#_additional_oracle_roles_and_privileges_required_for_export">25.8.2.3. 
 >Additional Oracle Roles And Privileges Required for 
 >Export</a></span></dt><dt><span class="section"><a 
 >href="#_supported_data_types">25.8.2.4. Supported Data 
 >Types</a></span></dt></dl></dd><dt><span class="section"><a 
 >href="#_execute_sqoop_with_data_connector_for_oracle_and_hadoop">25.8.3. 
 >Execute Sqoop With Data Connector for Oracle and 
 >Hadoop</a></span></dt><dd><dl><dt><span class="section"><a 
 >href="#_connect_to_oracle_oracle_rac">25.8.3.1. Connect to Oracle / Oracle 
 >RAC</a></span></dt><dt><span class="section"><a 
 >href="#_connect_to_an_oracle_database_instance">25.8.3.2. Connect to 
 An Oracle Database Instance</a></span></dt><dt><span class="section"><a 
href="#_connect_to_an_oracle_rac">25.8.3.3. Connect to An Oracle 
RAC</a></span></dt><dt><span class="section"><a 
href="#_login_to_the_oracle_instance">25.8.3.4. Login to The Oracle 
Instance</a></span></dt><dt><span class="section"><a 
href="#_kill_data_connector_for_oracle_and_hadoop_jobs">25.8.3.5. Kill Data 
Connector for Oracle and Hadoop Jobs</a></span></dt></dl></dd><dt><span 
class="section"><a href="#_import_data_from_oracle">25.8.4. Import Data from 
Oracle</a></span></dt><dd><dl><dt><span class="section"><a 
href="#_match_hadoop_files_to_oracle_table_partitions">25.8.4.1. Match Hadoop 
Files to Oracle Table Partitions</a></span></dt><dt><span class="section"><a 
href="#_specify_the_partitions_to_import">25.8.4.2. Specify The Partitions To 
Import</a></span></dt><dt><span class="section"><a 
href="#_consistent_read_all_mappers_read_from_the_same_point_in_time">25.8.4.3. 
Consistent Read: All Mappers Read From The 
 Same Point In Time</a></span></dt></dl></dd><dt><span class="section"><a 
href="#_export_data_into_oracle">25.8.5. Export Data into 
Oracle</a></span></dt><dd><dl><dt><span class="section"><a 
href="#_insert_export">25.8.5.1. Insert-Export</a></span></dt><dt><span 
class="section"><a href="#_update_export">25.8.5.2. 
Update-Export</a></span></dt><dt><span class="section"><a 
href="#_merge_export">25.8.5.3. Merge-Export</a></span></dt><dt><span 
class="section"><a href="#_create_oracle_tables">25.8.5.4. Create Oracle 
Tables</a></span></dt><dt><span class="section"><a href="#_nologging">25.8.5.5. 
NOLOGGING</a></span></dt><dt><span class="section"><a 
href="#_partitioning">25.8.5.6. Partitioning</a></span></dt><dt><span 
class="section"><a href="#_match_rows_via_multiple_columns">25.8.5.7. Match 
Rows Via Multiple Columns</a></span></dt><dt><span class="section"><a 
href="#_storage_clauses">25.8.5.8. Storage 
Clauses</a></span></dt></dl></dd><dt><span class="section"><a 
href="#_manage_date_and_tim
 estamp_data_types">25.8.6. Manage Date And Timestamp Data 
Types</a></span></dt><dd><dl><dt><span class="section"><a 
href="#_import_date_and_timestamp_data_types_from_oracle">25.8.6.1. Import Date 
And Timestamp Data Types from Oracle</a></span></dt><dt><span 
class="section"><a 
href="#_the_data_connector_for_oracle_and_hadoop_does_not_apply_a_time_zone_to_date_timestamp_data_types">25.8.6.2.
 The Data Connector for Oracle and Hadoop Does Not Apply A Time Zone to DATE / 
TIMESTAMP Data Types</a></span></dt><dt><span class="section"><a 
href="#_the_data_connector_for_oracle_and_hadoop_retains_time_zone_information_in_timezone_data_types">25.8.6.3.
 The Data Connector for Oracle and Hadoop Retains Time Zone Information in 
TIMEZONE Data Types</a></span></dt><dt><span class="section"><a 
href="#_data_connector_for_oracle_and_hadoop_explicitly_states_time_zone_for_local_timezone_data_types">25.8.6.4.
 Data Connector for Oracle and Hadoop Explicitly States Time Zone for LOCAL 
TIMEZONE Data Types</
 a></span></dt><dt><span class="section"><a 
href="#_java_sql_timestamp">25.8.6.5. 
java.sql.Timestamp</a></span></dt><dt><span class="section"><a 
href="#_export_date_and_timestamp_data_types_into_oracle">25.8.6.6. Export Date 
And Timestamp Data Types into Oracle</a></span></dt></dl></dd><dt><span 
class="section"><a 
href="#_configure_the_data_connector_for_oracle_and_hadoop">25.8.7. Configure 
The Data Connector for Oracle and Hadoop</a></span></dt><dd><dl><dt><span 
class="section"><a href="#_oraoop_site_template_xml">25.8.7.1. 
oraoop-site-template.xml</a></span></dt><dt><span class="section"><a 
href="#_oraoop_oracle_session_initialization_statements">25.8.7.2. 
oraoop.oracle.session.initialization.statements</a></span></dt><dt><span 
class="section"><a href="#_oraoop_table_import_where_clause_location">25.8.7.3. 
oraoop.table.import.where.clause.location</a></span></dt><dt><span 
class="section"><a href="#_oracle_row_fetch_size">25.8.7.4. 
oracle.row.fetch.size</a></span></dt><dt><span clas
 s="section"><a href="#_oraoop_import_hint">25.8.7.5. 
oraoop.import.hint</a></span></dt><dt><span class="section"><a 
href="#_oraoop_oracle_append_values_hint_usage">25.8.7.6. 
oraoop.oracle.append.values.hint.usage</a></span></dt><dt><span 
class="section"><a href="#_mapred_map_tasks_speculative_execution">25.8.7.7. 
mapred.map.tasks.speculative.execution</a></span></dt><dt><span 
class="section"><a href="#_oraoop_block_allocation">25.8.7.8. 
oraoop.block.allocation</a></span></dt><dt><span class="section"><a 
href="#_oraoop_import_omit_lobs_and_long">25.8.7.9. 
oraoop.import.omit.lobs.and.long</a></span></dt><dt><span class="section"><a 
href="#_oraoop_locations">25.8.7.10. oraoop.locations</a></span></dt><dt><span 
class="section"><a href="#_sqoop_connection_factories">25.8.7.11. 
sqoop.connection.factories</a></span></dt><dt><span class="section"><a 
href="#_expressions_in_oraoop_site_xml">25.8.7.12. Expressions in 
oraoop-site.xml</a></span></dt></dl></dd><dt><span class="section"><a href="#
 _troubleshooting_the_data_connector_for_oracle_and_hadoop">25.8.8. 
Troubleshooting The Data Connector for Oracle and 
Hadoop</a></span></dt><dd><dl><dt><span class="section"><a 
href="#_quote_oracle_owners_and_tables">25.8.8.1. Quote Oracle Owners And 
Tables</a></span></dt><dt><span class="section"><a 
href="#_quote_oracle_columns">25.8.8.2. Quote Oracle 
Columns</a></span></dt><dt><span class="section"><a 
href="#_confirm_the_data_connector_for_oracle_and_hadoop_can_initialize_the_oracle_session">25.8.8.3.
 Confirm The Data Connector for Oracle and Hadoop Can Initialize The Oracle 
Session</a></span></dt><dt><span class="section"><a 
href="#_check_the_sqoop_debug_logs_for_error_messages">25.8.8.4. Check The 
Sqoop Debug Logs for Error Messages</a></span></dt><dt><span class="section"><a 
href="#_export_check_tables_are_compatible">25.8.8.5. Export: Check Tables Are 
Compatible</a></span></dt><dt><span class="section"><a 
href="#_export_parallelization">25.8.8.6. Export: Parallelization</a></sp
 an></dt><dt><span class="section"><a 
href="#_export_check_oraoop_oracle_append_values_hint_usage">25.8.8.7. Export: 
Check oraoop.oracle.append.values.hint.usage</a></span></dt><dt><span 
class="section"><a href="#_turn_on_verbose">25.8.8.8. Turn On 
Verbose</a></span></dt></dl></dd></dl></dd></dl></dd><dt><span 
class="section"><a href="#_getting_support">26. Getting 
Support</a></span></dt><dt><span class="section"><a 
href="#_troubleshooting">27. Troubleshooting</a></span></dt><dd><dl><dt><span 
class="section"><a href="#_general_troubleshooting_process">27.1. General 
Troubleshooting Process</a></span></dt><dt><span class="section"><a 
href="#_specific_troubleshooting_tips">27.2. Specific Troubleshooting 
Tips</a></span></dt><dd><dl><dt><span class="section"><a 
href="#_oracle_connection_reset_errors">27.2.1. Oracle: Connection Reset 
Errors</a></span></dt><dt><span class="section"><a 
href="#_oracle_case_sensitive_catalog_query_errors">27.2.2. Oracle: 
Case-Sensitive Catalog Query Errors</a>
 </span></dt><dt><span class="section"><a 
href="#_mysql_connection_failure">27.2.3. MySQL: Connection 
Failure</a></span></dt><dt><span class="section"><a 
href="#_oracle_ora_00933_error_sql_command_not_properly_ended">27.2.4. Oracle: 
ORA-00933 error (SQL command not properly ended)</a></span></dt><dt><span 
class="section"><a 
href="#_mysql_import_of_tinyint_1_from_mysql_behaves_strangely">27.2.5. MySQL: 
Import of TINYINT(1) from MySQL behaves 
strangely</a></span></dt></dl></dd></dl></dd></dl></div><pre class="screen">  
Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.</pre><div class="section" title="1. 
Introduction"><div class="titlepage"><div><div><h2 class="title" style="clear: 
both"><a name="_introduction"></a>1. 
Introduction</h2></div></div></div><p>Sqoop is a tool designed to transfer data 
between Hadoop and
+relational databases or mainframes. You can use Sqoop to import data from a
+relational database management system (RDBMS) such as MySQL or Oracle or a
+mainframe into the Hadoop Distributed File System (HDFS),
+transform the data in Hadoop MapReduce, and then export the data back
+into an RDBMS.</p><p>Sqoop automates most of this process, relying on the 
database to
+describe the schema for the data to be imported. Sqoop uses MapReduce
+to import and export the data, which provides parallel operation as
+well as fault tolerance.</p><p>This document describes how to get started 
using Sqoop to move data
+between databases and Hadoop or mainframe to Hadoop and provides reference
+information for the operation of the Sqoop command-line tool suite. This
+document is intended for:</p><div class="itemizedlist"><ul 
class="itemizedlist" type="disc"><li class="listitem">
+System and application programmers
+</li><li class="listitem">
+System administrators
+</li><li class="listitem">
+Database administrators
+</li><li class="listitem">
+Data analysts
+</li><li class="listitem">
+Data engineers
+</li></ul></div></div><div class="section" title="2. Supported Releases"><div 
class="titlepage"><div><div><h2 class="title" style="clear: both"><a 
name="_supported_releases"></a>2. Supported 
Releases</h2></div></div></div><p>This documentation applies to Sqoop 
v1.4.6.</p></div><div class="section" title="3. Sqoop Releases"><div 
class="titlepage"><div><div><h2 class="title" style="clear: both"><a 
name="_sqoop_releases"></a>3. Sqoop Releases</h2></div></div></div><p>Sqoop is 
an open source software product of the Apache Software 
Foundation.</p><p>Software development for Sqoop occurs at <a class="ulink" 
href="http://sqoop.apache.org"; target="_top">http://sqoop.apache.org</a>
+At that site you can obtain:</p><div class="itemizedlist"><ul 
class="itemizedlist" type="disc"><li class="listitem">
+New releases of Sqoop as well as its most recent source code
+</li><li class="listitem">
+An issue tracker
+</li><li class="listitem">
+A wiki that contains Sqoop documentation
+</li></ul></div></div><div class="section" title="4. Prerequisites"><div 
class="titlepage"><div><div><h2 class="title" style="clear: both"><a 
name="_prerequisites"></a>4. Prerequisites</h2></div></div></div><p>The 
following prerequisite knowledge is required for this product:</p><div 
class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem">
+Basic computer technology and terminology
+</li><li class="listitem">
+Familiarity with command-line interfaces such as <code 
class="literal">bash</code>
+</li><li class="listitem">
+Relational database management systems
+</li><li class="listitem">
+Basic familiarity with the purpose and operation of Hadoop
+</li></ul></div><p>Before you can use Sqoop, a release of Hadoop must be 
installed and
+configured. Sqoop is currently supporting 4 major Hadoop releases - 0.20,
+0.23, 1.0 and 2.0.</p><p>This document assumes you are using a Linux or 
Linux-like environment.
+If you are using Windows, you may be able to use cygwin to accomplish
+most of the following tasks. If you are using Mac OS X, you should see
+few (if any) compatibility errors. Sqoop is predominantly operated and
+tested on Linux.</p></div><div class="section" title="5. Basic Usage"><div 
class="titlepage"><div><div><h2 class="title" style="clear: both"><a 
name="_basic_usage"></a>5. Basic Usage</h2></div></div></div><p>With Sqoop, you 
can <span class="emphasis"><em>import</em></span> data from a relational 
database system or a
+mainframe into HDFS. The input to the import process is either database table
+or mainframe datasets. For databases, Sqoop will read the table row-by-row
+into HDFS.  For mainframe datasets, Sqoop will read records from each mainframe
+dataset into HDFS. The output of this import process is a set of files
+containing a copy of the imported table or datasets.
+The import process is performed in parallel. For this reason, the
+output will be in multiple files. These files may be delimited text
+files (for example, with commas or tabs separating each field), or
+binary Avro or SequenceFiles containing serialized record data.</p><p>A 
by-product of the import process is a generated Java class which
+can encapsulate one row of the imported table. This class is used
+during the import process by Sqoop itself. The Java source code for
+this class is also provided to you, for use in subsequent MapReduce
+processing of the data. This class can serialize and deserialize data
+to and from the SequenceFile format. It can also parse the
+delimited-text form of a record. These abilities allow you to quickly
+develop MapReduce applications that use the HDFS-stored records in
+your processing pipeline. You are also free to parse the delimiteds
+record data yourself, using any other tools you prefer.</p><p>After 
manipulating the imported records (for example, with MapReduce
+or Hive) you may have a result data set which you can then <span 
class="emphasis"><em>export</em></span>
+back to the relational database. Sqoop&#8217;s export process will read
+a set of delimited text files from HDFS in parallel, parse them into
+records, and insert them as new rows in a target database table, for
+consumption by external applications or users.</p><p>Sqoop includes some other 
commands which allow you to inspect the
+database you are working with. For example, you can list the available
+database schemas (with the <code class="literal">sqoop-list-databases</code> 
tool) and tables
+within a schema (with the <code class="literal">sqoop-list-tables</code> 
tool). Sqoop also
+includes a primitive SQL execution shell (the <code 
class="literal">sqoop-eval</code> tool).</p><p>Most aspects of the import, code 
generation, and export processes can
+be customized. For databases, you can control the specific row range or
+columns imported.  You can specify particular delimiters and escape characters
+for the file-based representation of the data, as well as the file format
+used.  You can also control the class or package names used in
+generated code. Subsequent sections of this document explain how to
+specify these and other arguments to Sqoop.</p></div><div class="section" 
title="6. Sqoop Tools"><div class="titlepage"><div><div><h2 class="title" 
style="clear: both"><a name="_sqoop_tools"></a>6. Sqoop 
Tools</h2></div></div></div><div class="toc"><dl><dt><span class="section"><a 
href="#_using_command_aliases">6.1. Using Command 
Aliases</a></span></dt><dt><span class="section"><a 
href="#_controlling_the_hadoop_installation">6.2. Controlling the Hadoop 
Installation</a></span></dt><dt><span class="section"><a 
href="#_using_generic_and_specific_arguments">6.3. Using Generic and Specific 
Arguments</a></span></dt><dt><span class="section"><a 
href="#_using_options_files_to_pass_arguments">6.4. Using Options Files to Pass 
Arguments</a></span></dt><dt><span class="section"><a href="#_using_tools">6.5. 
Using Tools</a></span></dt></dl></div><p>Sqoop is a collection of related 
tools. To use Sqoop, you specify the
+tool you want to use and the arguments that control the tool.</p><p>If Sqoop 
is compiled from its own source, you can run Sqoop without a formal
+installation process by running the <code class="literal">bin/sqoop</code> 
program. Users
+of a packaged deployment of Sqoop (such as an RPM shipped with Apache Bigtop)
+will see this program installed as <code 
class="literal">/usr/bin/sqoop</code>. The remainder of this
+documentation will refer to this program as <code 
class="literal">sqoop</code>. For example:</p><pre class="screen">$ sqoop 
tool-name [tool-arguments]</pre><div class="note" title="Note" 
style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" 
summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img 
alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td 
align="left" valign="top"><p>The following examples that begin with a <code 
class="literal">$</code> character indicate
+that the commands must be entered at a terminal prompt (such as
+<code class="literal">bash</code>). The <code class="literal">$</code> 
character represents the prompt itself; you should
+not start these commands by typing a <code class="literal">$</code>. You can 
also enter commands
+inline in the text of a paragraph; for example, <code class="literal">sqoop 
help</code>. These
+examples do not show a <code class="literal">$</code> prefix, but you should 
enter them the same
+way.  Don&#8217;t confuse the <code class="literal">$</code> shell prompt in 
the examples with the <code class="literal">$</code>
+that precedes an environment variable name. For example, the string
+literal <code class="literal">$HADOOP_HOME</code> includes a "<code 
class="literal">$</code>".</p></td></tr></table></div><p>Sqoop ships with a 
help tool. To display a list of all available
+tools, type the following command:</p><pre class="screen">$ sqoop help
+usage: sqoop COMMAND [ARGS]
+
+Available commands:
+  codegen            Generate code to interact with database records
+  create-hive-table  Import a table definition into Hive
+  eval               Evaluate a SQL statement and display the results
+  export             Export an HDFS directory to a database table
+  help               List available commands
+  import             Import a table from a database to HDFS
+  import-all-tables  Import tables from a database to HDFS
+  import-mainframe   Import mainframe datasets to HDFS
+  list-databases     List available databases on a server
+  list-tables        List available tables in a database
+  version            Display version information
+
+See 'sqoop help COMMAND' for information on a specific command.</pre><p>You 
can display help for a specific tool by entering: <code class="literal">sqoop 
help
+(tool-name)</code>; for example, <code class="literal">sqoop help 
import</code>.</p><p>You can also add the <code class="literal">--help</code> 
argument to any command: <code class="literal">sqoop import
+--help</code>.</p><div class="section" title="6.1. Using Command Aliases"><div 
class="titlepage"><div><div><h3 class="title"><a 
name="_using_command_aliases"></a>6.1. Using Command 
Aliases</h3></div></div></div><p>In addition to typing the <code 
class="literal">sqoop (toolname)</code> syntax, you can use alias
+scripts that specify the <code class="literal">sqoop-(toolname)</code> syntax. 
For example, the
+scripts <code class="literal">sqoop-import</code>, <code 
class="literal">sqoop-export</code>, etc. each select a specific
+tool.</p></div><div class="section" title="6.2. Controlling the Hadoop 
Installation"><div class="titlepage"><div><div><h3 class="title"><a 
name="_controlling_the_hadoop_installation"></a>6.2. Controlling the Hadoop 
Installation</h3></div></div></div><p>You invoke Sqoop through the program 
launch capability provided by
+Hadoop. The <code class="literal">sqoop</code> command-line program is a 
wrapper which runs the
+<code class="literal">bin/hadoop</code> script shipped with Hadoop. If you 
have multiple
+installations of Hadoop present on your machine, you can select the
+Hadoop installation by setting the <code 
class="literal">$HADOOP_COMMON_HOME</code> and
+<code class="literal">$HADOOP_MAPRED_HOME</code> environment 
variables.</p><p>For example:</p><pre class="screen">$ 
HADOOP_COMMON_HOME=/path/to/some/hadoop \
+  HADOOP_MAPRED_HOME=/path/to/some/hadoop-mapreduce \
+  sqoop import --arguments...</pre><p>or:</p><pre class="screen">$ export 
HADOOP_COMMON_HOME=/some/path/to/hadoop
+$ export HADOOP_MAPRED_HOME=/some/path/to/hadoop-mapreduce
+$ sqoop import --arguments...</pre><p>If either of these variables are not 
set, Sqoop will fall back to
+<code class="literal">$HADOOP_HOME</code>. If it is not set either, Sqoop will 
use the default
+installation locations for Apache Bigtop, <code 
class="literal">/usr/lib/hadoop</code> and
+<code class="literal">/usr/lib/hadoop-mapreduce</code>, 
respectively.</p><p>The active Hadoop configuration is loaded from <code 
class="literal">$HADOOP_HOME/conf/</code>,
+unless the <code class="literal">$HADOOP_CONF_DIR</code> environment variable 
is set.</p></div><div class="section" title="6.3. Using Generic and Specific 
Arguments"><div class="titlepage"><div><div><h3 class="title"><a 
name="_using_generic_and_specific_arguments"></a>6.3. Using Generic and 
Specific Arguments</h3></div></div></div><p>To control the operation of each 
Sqoop tool, you use generic and
+specific arguments.</p><p>For example:</p><pre class="screen">$ sqoop help 
import
+usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]
+
+Common arguments:
+   --connect &lt;jdbc-uri&gt;     Specify JDBC connect string
+   --connect-manager &lt;class-name&gt;     Specify connection manager class 
to use
+   --driver &lt;class-name&gt;    Manually specify JDBC driver class to use
+   --hadoop-mapred-home &lt;dir&gt;      Override $HADOOP_MAPRED_HOME
+   --help                   Print usage instructions
+   --password-file          Set path for file containing authentication 
password
+   -P                       Read password from console
+   --password &lt;password&gt;    Set authentication password
+   --username &lt;username&gt;    Set authentication username
+   --verbose                Print more information while working
+   --hadoop-home &lt;dir&gt;     Deprecated. Override $HADOOP_HOME
+
+[...]
+
+Generic Hadoop command-line arguments:
+(must preceed any tool-specific arguments)
+Generic options supported are
+-conf &lt;configuration file&gt;     specify an application configuration file
+-D &lt;property=value&gt;            use value for given property
+-fs &lt;local|namenode:port&gt;      specify a namenode
+-jt &lt;local|jobtracker:port&gt;    specify a job tracker
+-files &lt;comma separated list of files&gt;    specify comma separated files 
to be copied to the map reduce cluster
+-libjars &lt;comma separated list of jars&gt;    specify comma separated jar 
files to include in the classpath.
+-archives &lt;comma separated list of archives&gt;    specify comma separated 
archives to be unarchived on the compute machines.
+
+The general command line syntax is
+bin/hadoop command [genericOptions] [commandOptions]</pre><p>You must supply 
the generic arguments <code class="literal">-conf</code>, <code 
class="literal">-D</code>, and so on after the
+tool name but <span class="strong"><strong>before</strong></span> any 
tool-specific arguments (such as
+<code class="literal">--connect</code>). Note that generic Hadoop arguments 
are preceeded by a
+single dash character (<code class="literal">-</code>), whereas tool-specific 
arguments start
+with two dashes (<code class="literal">--</code>), unless they are single 
character arguments such as <code class="literal">-P</code>.</p><p>The <code 
class="literal">-conf</code>, <code class="literal">-D</code>, <code 
class="literal">-fs</code> and <code class="literal">-jt</code> arguments 
control the configuration
+and Hadoop server settings. For example, the <code class="literal">-D 
mapred.job.name=&lt;job_name&gt;</code> can
+be used to set the name of the MR job that Sqoop launches, if not specified,
+the name defaults to the jar name for the job - which is derived from the used
+table name.</p><p>The <code class="literal">-files</code>, <code 
class="literal">-libjars</code>, and <code class="literal">-archives</code> 
arguments are not typically used with
+Sqoop, but they are included as part of Hadoop&#8217;s internal 
argument-parsing
+system.</p></div><div class="section" title="6.4. Using Options Files to Pass 
Arguments"><div class="titlepage"><div><div><h3 class="title"><a 
name="_using_options_files_to_pass_arguments"></a>6.4. Using Options Files to 
Pass Arguments</h3></div></div></div><p>When using Sqoop, the command line 
options that do not change from
+invocation to invocation can be put in an options file for convenience.
+An options file is a text file where each line identifies an option in
+the order that it appears otherwise on the command line. Option files
+allow specifying a single option on multiple lines by using the
+back-slash character at the end of intermediate lines. Also supported
+are comments within option files that begin with the hash character.
+Comments must be specified on a new line and may not be mixed with
+option text. All comments and empty lines are ignored when option
+files are expanded. Unless options appear as quoted strings, any
+leading or trailing spaces are ignored. Quoted strings if used must
+not extend beyond the line on which they are specified.</p><p>Option files can 
be specified anywhere in the command line as long as
+the options within them follow the otherwise prescribed rules of
+options ordering. For instance, regardless of where the options are
+loaded from, they must follow the ordering such that generic options
+appear first, tool specific options next, finally followed by options
+that are intended to be passed to child programs.</p><p>To specify an options 
file, simply create an options file in a
+convenient location and pass it to the command line via
+<code class="literal">--options-file</code> argument.</p><p>Whenever an 
options file is specified, it is expanded on the
+command line before the tool is invoked. You can specify more than
+one option files within the same invocation if needed.</p><p>For example, the 
following Sqoop invocation for import can
+be specified alternatively as shown below:</p><pre class="screen">$ sqoop 
import --connect jdbc:mysql://localhost/db --username foo --table TEST
+
+$ sqoop --options-file /users/homer/work/import.txt --table TEST</pre><p>where 
the options file <code class="literal">/users/homer/work/import.txt</code> 
contains the following:</p><pre class="screen">import
+--connect
+jdbc:mysql://localhost/db
+--username
+foo</pre><p>The options file can have empty lines and comments for readability 
purposes.
+So the above example would work exactly the same if the options file
+<code class="literal">/users/homer/work/import.txt</code> contained the 
following:</p><pre class="screen">#
+# Options file for Sqoop import
+#
+
+# Specifies the tool being invoked
+import
+
+# Connect parameter and value
+--connect
+jdbc:mysql://localhost/db
+
+# Username parameter and value
+--username
+foo
+
+#
+# Remaining options should be specified in the command line.
+#</pre></div><div class="section" title="6.5. Using Tools"><div 
class="titlepage"><div><div><h3 class="title"><a name="_using_tools"></a>6.5. 
Using Tools</h3></div></div></div><p>The following sections will describe each 
tool&#8217;s operation. The
+tools are listed in the most likely order you will find them 
useful.</p></div></div><div class="section" title="7. sqoop-import"><div 
class="titlepage"><div><div><h2 class="title" style="clear: both"><a 
name="_literal_sqoop_import_literal"></a>7. <code 
class="literal">sqoop-import</code></h2></div></div></div><div 
class="toc"><dl><dt><span class="section"><a href="#_purpose">7.1. 
Purpose</a></span></dt><dt><span class="section"><a href="#_syntax">7.2. 
Syntax</a></span></dt><dd><dl><dt><span class="section"><a 
href="#_connecting_to_a_database_server">7.2.1. Connecting to a Database 
Server</a></span></dt><dt><span class="section"><a 
href="#_selecting_the_data_to_import">7.2.2. Selecting the Data to 
Import</a></span></dt><dt><span class="section"><a 
href="#_free_form_query_imports">7.2.3. Free-form Query 
Imports</a></span></dt><dt><span class="section"><a 
href="#_controlling_parallelism">7.2.4. Controlling 
Parallelism</a></span></dt><dt><span class="section"><a href="#_controlling_di
 stributed_cache">7.2.5. Controlling Distributed Cache</a></span></dt><dt><span 
class="section"><a href="#_controlling_the_import_process">7.2.6. Controlling 
the Import Process</a></span></dt><dt><span class="section"><a 
href="#_controlling_transaction_isolation">7.2.7. Controlling transaction 
isolation</a></span></dt><dt><span class="section"><a 
href="#_controlling_type_mapping">7.2.8. Controlling type 
mapping</a></span></dt><dt><span class="section"><a 
href="#_incremental_imports">7.2.9. Incremental 
Imports</a></span></dt><dt><span class="section"><a 
href="#_file_formats">7.2.10. File Formats</a></span></dt><dt><span 
class="section"><a href="#_large_objects">7.2.11. Large 
Objects</a></span></dt><dt><span class="section"><a 
href="#_importing_data_into_hive">7.2.12. Importing Data Into 
Hive</a></span></dt><dt><span class="section"><a 
href="#_importing_data_into_hbase">7.2.13. Importing Data Into 
HBase</a></span></dt><dt><span class="section"><a 
href="#_importing_data_into_accumulo">7
 .2.14. Importing Data Into Accumulo</a></span></dt><dt><span 
class="section"><a href="#_additional_import_configuration_properties">7.2.15. 
Additional Import Configuration Properties</a></span></dt></dl></dd><dt><span 
class="section"><a href="#_example_invocations">7.3. Example 
Invocations</a></span></dt></dl></div><div class="section" title="7.1. 
Purpose"><div class="titlepage"><div><div><h3 class="title"><a 
name="_purpose"></a>7.1. Purpose</h3></div></div></div><p>The <code 
class="literal">import</code> tool imports an individual table from an RDBMS to 
HDFS.
+Each row from a table is represented as a separate record in HDFS.
+Records can be stored as text files (one record per line), or in
+binary representation as Avro or SequenceFiles.</p></div><div class="section" 
title="7.2. Syntax"><div class="titlepage"><div><div><h3 class="title"><a 
name="_syntax"></a>7.2. Syntax</h3></div></div></div><div 
class="toc"><dl><dt><span class="section"><a 
href="#_connecting_to_a_database_server">7.2.1. Connecting to a Database 
Server</a></span></dt><dt><span class="section"><a 
href="#_selecting_the_data_to_import">7.2.2. Selecting the Data to 
Import</a></span></dt><dt><span class="section"><a 
href="#_free_form_query_imports">7.2.3. Free-form Query 
Imports</a></span></dt><dt><span class="section"><a 
href="#_controlling_parallelism">7.2.4. Controlling 
Parallelism</a></span></dt><dt><span class="section"><a 
href="#_controlling_distributed_cache">7.2.5. Controlling Distributed 
Cache</a></span></dt><dt><span class="section"><a 
href="#_controlling_the_import_process">7.2.6. Controlling the Import 
Process</a></span></dt><dt><span class="section"><a 
href="#_controlling_transaction_isolatio
 n">7.2.7. Controlling transaction isolation</a></span></dt><dt><span 
class="section"><a href="#_controlling_type_mapping">7.2.8. Controlling type 
mapping</a></span></dt><dt><span class="section"><a 
href="#_incremental_imports">7.2.9. Incremental 
Imports</a></span></dt><dt><span class="section"><a 
href="#_file_formats">7.2.10. File Formats</a></span></dt><dt><span 
class="section"><a href="#_large_objects">7.2.11. Large 
Objects</a></span></dt><dt><span class="section"><a 
href="#_importing_data_into_hive">7.2.12. Importing Data Into 
Hive</a></span></dt><dt><span class="section"><a 
href="#_importing_data_into_hbase">7.2.13. Importing Data Into 
HBase</a></span></dt><dt><span class="section"><a 
href="#_importing_data_into_accumulo">7.2.14. Importing Data Into 
Accumulo</a></span></dt><dt><span class="section"><a 
href="#_additional_import_configuration_properties">7.2.15. Additional Import 
Configuration Properties</a></span></dt></dl></div><pre class="screen">$ sqoop 
import (generic-args) (
 import-args)
+$ sqoop-import (generic-args) (import-args)</pre><p>While the Hadoop generic 
arguments must precede any import arguments,
+you can type the import arguments in any order with respect to one
+another.</p><div class="note" title="Note" style="margin-left: 0.5in; 
margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" 
align="center" valign="top" width="25"><img alt="[Note]" 
src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" 
valign="top"><p>In this document, arguments are grouped into collections
+organized by function. Some collections are present in several tools
+(for example, the "common" arguments). An extended description of their
+functionality is given only on the first presentation in this
+document.</p></td></tr></table></div><div class="table"><a 
name="idp29176"></a><p class="title"><b>Table 1. Common arguments</b></p><div 
class="table-contents"><table summary="Common arguments" 
style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt 
solid ; "><colgroup><col align="left"><col 
align="left"></colgroup><thead><tr><th style="border-right: 0.5pt solid ; 
border-bottom: 0.5pt solid ; " align="left">
+    Argument
+    </th><th style="border-bottom: 0.5pt solid ; " align="left">
+    Description
+    </th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid ; 
border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--connect &lt;jdbc-uri&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Specify JDBC connect string
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--connection-manager &lt;class-name&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Specify connection manager class to                                        
  use
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--driver &lt;class-name&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Manually specify JDBC driver class                                         
 to use
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--hadoop-mapred-home &lt;dir&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Override $HADOOP_MAPRED_HOME
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--help</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Print usage instructions
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--password-file</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Set path for a file containing the                                         
 authentication password
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">-P</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Read password from console
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--password &lt;password&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Set authentication password
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--username &lt;username&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Set authentication username
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--verbose</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Print more information while working
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--connection-param-file &lt;filename&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Optional properties file that                                          
provides connection parameters
+    </td></tr><tr><td style="border-right: 0.5pt solid ; " align="left">
+    <code class="literal">--relaxed-isolation</code>
+    </td><td style="" align="left">
+    Set connection transaction isolation                                       
   to read uncommitted for the mappers.
+    </td></tr></tbody></table></div></div><br class="table-break"><div 
class="section" title="7.2.1. Connecting to a Database Server"><div 
class="titlepage"><div><div><h4 class="title"><a 
name="_connecting_to_a_database_server"></a>7.2.1. Connecting to a Database 
Server</h4></div></div></div><p>Sqoop is designed to import tables from a 
database into HDFS. To do
+so, you must specify a <span class="emphasis"><em>connect string</em></span> 
that describes how to connect to the
+database. The <span class="emphasis"><em>connect string</em></span> is similar 
to a URL, and is communicated to
+Sqoop with the <code class="literal">--connect</code> argument. This describes 
the server and
+database to connect to; it may also specify the port. For example:</p><pre 
class="screen">$ sqoop import --connect 
jdbc:mysql://database.example.com/employees</pre><p>This string will connect to 
a MySQL database named <code class="literal">employees</code> on the
+host <code class="literal">database.example.com</code>. It&#8217;s important 
that you <span class="strong"><strong>do not</strong></span> use the URL
+<code class="literal">localhost</code> if you intend to use Sqoop with a 
distributed Hadoop
+cluster. The connect string you supply will be used on TaskTracker nodes
+throughout your MapReduce cluster; if you specify the
+literal name <code class="literal">localhost</code>, each node will connect to 
a different
+database (or more likely, no database at all). Instead, you should use
+the full hostname or IP address of the database host that can be seen
+by all your remote nodes.</p><p>You might need to authenticate against the 
database before you can
+access it. You can use the <code class="literal">--username</code> to supply a 
username to the database.
+Sqoop provides couple of different ways to supply a password,
+secure and non-secure, to the database which is detailed below.</p><p 
title="Secure way of supplying password to the database"><b>Secure way of 
supplying password to the database. </b>You should save the password in a file 
on the users home directory with 400
+permissions and specify the path to that file using the <span 
class="strong"><strong><code 
class="literal">--password-file</code></strong></span>
+argument, and is the preferred method of entering credentials. Sqoop will
+then read the password from the file and pass it to the MapReduce cluster
+using secure means with out exposing the password in the job configuration.
+The file containing the password can either be on the Local FS or HDFS.
+For example:</p><pre class="screen">$ sqoop import --connect 
jdbc:mysql://database.example.com/employees \
+    --username venkatesh --password-file ${user.home}/.password</pre><div 
class="warning" title="Warning" style="margin-left: 0.5in; margin-right: 
0.5in;"><table border="0" summary="Warning"><tr><td rowspan="2" align="center" 
valign="top" width="25"><img alt="[Warning]" src="images/warning.png"></td><th 
align="left">Warning</th></tr><tr><td align="left" valign="top"><p>Sqoop will 
read entire content of the password file and use it as
+a password. This will include any trailing white space characters such as
+new line characters that are added by default by most of the text editors.
+You need to make sure that your password file contains only characters
+that belongs to your password. On the command line you can use command
+<code class="literal">echo</code> with switch <code class="literal">-n</code> 
to store password without any trailing white space
+characters. For example to store password <code class="literal">secret</code> 
you would call
+<code class="literal">echo -n "secret" &gt; 
password.file</code>.</p></td></tr></table></div><p>Another way of supplying 
passwords is using the <code class="literal">-P</code> argument which will
+read a password from a console prompt.</p><p title="Protecting password from 
preying eyes"><b>Protecting password from preying eyes. </b>Hadoop 2.6.0 
provides an API to separate password storage from applications.
+This API is called the credential provided API and there is a new
+<code class="literal">credential</code>  command line tool to manage passwords 
and their aliases.
+The passwords are stored with their aliases in a keystore that is password
+protected.   The keystore password can be the provided to a password prompt
+on the command line, via an environment variable or defaulted to a software
+defined constant.   Please check the Hadoop documentation on the usage
+of this facility.</p><p>Once the password is stored using the Credential 
Provider facility and
+the Hadoop configuration has been suitably updated, all applications can
+optionally use the alias in place of the actual password and at runtime
+resolve the alias for the password to use.</p><p>Since the keystore or similar 
technology used for storing the credential
+provider is shared across components, passwords for various applications,
+various database and other passwords can be securely stored in them and only
+the alias needs to be exposed in configuration files, protecting the password
+from being visible.</p><p>Sqoop has been enhanced to allow usage of this 
funcionality if it is
+available in the underlying Hadoop version being used.   One new option
+has been introduced to provide the alias on the command line instead of the
+actual password (--password-alias).  The argument value this option is
+the alias on the storage associated with the actual password.
+Example usage is as follows:</p><pre class="screen">$ sqoop import --connect 
jdbc:mysql://database.example.com/employees \
+    --username dbuser --password-alias mydb.password.alias</pre><p>Similarly, 
if the command line option is not preferred, the alias can be saved
+in the file provided with --password-file option.  Along with this, the
+Sqoop configuration parameter org.apache.sqoop.credentials.loader.class
+should be set to the classname that provides the alias resolution:
+<code 
class="literal">org.apache.sqoop.util.password.CredentialProviderPasswordLoader</code></p><p>Example
 usage is as follows (assuming .password.alias has the alias for
+the real password) :</p><pre class="screen">$ sqoop import --connect 
jdbc:mysql://database.example.com/employees \
+    --username dbuser --password-file ${user.home}/.password-alias</pre><div 
class="warning" title="Warning" style="margin-left: 0.5in; margin-right: 
0.5in;"><table border="0" summary="Warning"><tr><td rowspan="2" align="center" 
valign="top" width="25"><img alt="[Warning]" src="images/warning.png"></td><th 
align="left">Warning</th></tr><tr><td align="left" valign="top"><p>The <code 
class="literal">--password</code> parameter is insecure, as other users may
+be able to read your password from the command-line arguments via
+the output of programs such as <code class="literal">ps</code>. The <span 
class="strong"><strong><code class="literal">-P</code></strong></span> argument 
is the preferred
+method over using the <code class="literal">--password</code> argument. 
Credentials may still be
+transferred between nodes of the MapReduce cluster using insecure means.
+For example:</p></td></tr></table></div><pre class="screen">$ sqoop import 
--connect jdbc:mysql://database.example.com/employees \
+    --username aaron --password 12345</pre><p>Sqoop automatically supports 
several databases, including MySQL.  Connect
+strings beginning with <code class="literal">jdbc:mysql://</code> are handled 
automatically in Sqoop.  (A
+full list of databases with built-in support is provided in the "Supported
+Databases" section. For some, you may need to install the JDBC driver
+yourself.)</p><p>You can use Sqoop with any other
+JDBC-compliant database. First, download the appropriate JDBC
+driver for the type of database you want to import, and install the .jar
+file in the <code class="literal">$SQOOP_HOME/lib</code> directory on your 
client machine. (This will
+be <code class="literal">/usr/lib/sqoop/lib</code> if you installed from an 
RPM or Debian package.)
+Each driver <code class="literal">.jar</code> file also has a specific driver 
class which defines
+the entry-point to the driver. For example, MySQL&#8217;s Connector/J library 
has
+a driver class of <code class="literal">com.mysql.jdbc.Driver</code>. Refer to 
your database
+vendor-specific documentation to determine the main driver class.
+This class must be provided as an argument to Sqoop with <code 
class="literal">--driver</code>.</p><p>For example, to connect to a SQLServer 
database, first download the driver from
+microsoft.com and install it in your Sqoop lib path.</p><p>Then run Sqoop. For 
example:</p><pre class="screen">$ sqoop import --driver 
com.microsoft.jdbc.sqlserver.SQLServerDriver \
+    --connect &lt;connect-string&gt; ...</pre><p>When connecting to a database 
using JDBC, you can optionally specify extra
+JDBC parameters via a property file using the option
+<code class="literal">--connection-param-file</code>. The contents of this 
file are parsed as standard
+Java properties and passed into the driver while creating a 
connection.</p><div class="note" title="Note" style="margin-left: 0.5in; 
margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" 
align="center" valign="top" width="25"><img alt="[Note]" 
src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" 
valign="top"><p>The parameters specified via the optional property file are only
+applicable to JDBC connections. Any fastpath connectors that use connections
+other than JDBC will ignore these parameters.</p></td></tr></table></div><div 
class="table"><a name="idp3493216"></a><p class="title"><b>Table 2. Validation 
arguments <a class="link" href="#validation" title="11. validation">More 
Details</a></b></p><div class="table-contents"><table summary="Validation 
arguments More Details" style="border-collapse: collapse;border-top: 0.5pt 
solid ; border-bottom: 0.5pt solid ; "><colgroup><col align="left"><col 
align="left"></colgroup><thead><tr><th style="border-right: 0.5pt solid ; 
border-bottom: 0.5pt solid ; " align="left">
+    Argument
+    </th><th style="border-bottom: 0.5pt solid ; " align="left">
+    Description
+    </th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid ; 
border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--validate</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Enable validation of data copied,                                          
   supports single table copy only.
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--validator &lt;class-name&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Specify validator class to use.
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--validation-threshold &lt;class-name&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Specify validation threshold class                                         
    to use.
+    </td></tr><tr><td style="border-right: 0.5pt solid ; " align="left">
+    <code class="literal">--validation-failurehandler &lt;class-name&gt;</code>
+    </td><td style="" align="left">
+    Specify validation failure                                             
handler class to use.
+    </td></tr></tbody></table></div></div><br class="table-break"><div 
class="table"><a name="idp3507040"></a><p class="title"><b>Table 3. Import 
control arguments:</b></p><div class="table-contents"><table summary="Import 
control arguments:" style="border-collapse: collapse;border-top: 0.5pt solid ; 
border-bottom: 0.5pt solid ; "><colgroup><col align="left"><col 
align="left"></colgroup><thead><tr><th style="border-right: 0.5pt solid ; 
border-bottom: 0.5pt solid ; " align="left">
+    Argument
+    </th><th style="border-bottom: 0.5pt solid ; " align="left">
+    Description
+    </th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid ; 
border-bottom: 0.5pt solid ; " align="left">
+    <code class="literal">--append</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Append data to an existing dataset                                  in HDFS
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--as-avrodatafile</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Imports data to Avro Data Files
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--as-sequencefile</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Imports data to SequenceFiles
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--as-textfile</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Imports data as plain text (default)
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--as-parquetfile</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Imports data to Parquet Files
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--boundary-query &lt;statement&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Boundary query to use for creating splits
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--columns &lt;col,col,col&#8230;&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Columns to import from table
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--delete-target-dir</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Delete the import target directory                                  if it 
exists
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--direct</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Use direct connector if exists for the database
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--fetch-size &lt;n&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Number of entries to read from database                                  
at once.
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--inline-lob-limit &lt;n&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Set the maximum size for an inline LOB
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">-m,--num-mappers &lt;n&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Use <span class="emphasis"><em>n</em></span> map tasks to import in 
parallel
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">-e,--query &lt;statement&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Import the results of <span class="emphasis"><em><code 
class="literal">statement</code></em></span>.
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--split-by &lt;column-name&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Column of the table used to split work                                  
units.  Cannot be used with                                  <code 
class="literal">--autoreset-to-one-mapper</code> option.
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--autoreset-to-one-mapper</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Import should use one mapper if a table                                  
has no primary key and no split-by column                                  is 
provided.  Cannot be used with                                  <code 
class="literal">--split-by &lt;col&gt;</code> option.
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--table &lt;table-name&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Table to read
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--target-dir &lt;dir&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    HDFS destination dir
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--warehouse-dir &lt;dir&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    HDFS parent for table destination
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--where &lt;where clause&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    WHERE clause to use during import
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">-z,--compress</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Enable compression
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--compression-codec &lt;c&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    Use Hadoop codec (default gzip)
+    </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt 
solid ; " align="left">
+    <code class="literal">--null-string &lt;null-string&gt;</code>
+    </td><td style="border-bottom: 0.5pt solid ; " align="left">
+    The string to be written for a null                                  value 
for string columns
+    </td></tr><tr><td style="border-right: 0.5pt solid ; " align="left">
+    <code class="literal">--null-non-string &lt;null-string&gt;</code>
+    </td><td style="" align="left">
+    The string to be written for a null                                  value 
for non-string columns
+    </td></tr></tbody></table></div></div><br class="table-break"><p>The <code 
class="literal">--null-string</code> and <code 
class="literal">--null-non-string</code> arguments are optional.\
+If not specified, then the string "null" will be used.</p></div><div 
class="section" title="7.2.2. Selecting the Data to Import"><div 
class="titlepage"><div><div><h4 class="title"><a 
name="_selecting_the_data_to_import"></a>7.2.2. Selecting the Data to 
Import</h4></div></div></div><p>Sqoop typically imports data in a table-centric 
fashion. Use the
+<code class="literal">--table</code> argument to select the table to import. 
For example, <code class="literal">--table
+employees</code>. This argument can also identify a <code 
class="literal">VIEW</code> or other table-like
+entity in a database.</p><p>By default, all columns within a table are 
selected for import.
+Imported data is written to HDFS in its "natural order;" that is, a
+table containing columns A, B, and C result in an import of data such
+as:</p><pre class="screen">A1,B1,C1
+A2,B2,C2
+...</pre><p>You can select a subset of columns and control their ordering by 
using
+the <code class="literal">--columns</code> argument. This should include a 
comma-delimited list
+of columns to import. For example: <code class="literal">--columns 
"name,employee_id,jobtitle"</code>.</p><p>You can control which rows are 
imported by adding a SQL <code class="literal">WHERE</code> clause
+to the import statement. By default, Sqoop generates statements of the
+form <code class="literal">SELECT &lt;column list&gt; FROM &lt;table 
name&gt;</code>. You can append a
+<code class="literal">WHERE</code> clause to this with the <code 
class="literal">--where</code> argument. For example: <code 
class="literal">--where
+"id &gt; 400"</code>. Only rows where the <code class="literal">id</code> 
column has a value greater than
+400 will be imported.</p><p>By default sqoop will use query <code 
class="literal">select min(&lt;split-by&gt;), max(&lt;split-by&gt;) from
+&lt;table name&gt;</code> to find out boundaries for creating splits. In some 
cases this query
+is not the most optimal so you can specify any arbitrary query returning two
+numeric columns using <code class="literal">--boundary-query</code> 
argument.</p></div><div class="section" title="7.2.3. Free-form Query 
Imports"><div class="titlepage"><div><div><h4 class="title"><a 
name="_free_form_query_imports"></a>7.2.3. Free-form Query 
Imports</h4></div></div></div><p>Sqoop can also import the result set of an 
arbitrary SQL query. Instead of
+using the <code class="literal">--table</code>, <code 
class="literal">--columns</code> and <code class="literal">--where</code> 
arguments, you can specify
+a SQL statement with the <code class="literal">--query</code> 
argument.</p><p>When importing a free-form query, you must specify a 
destination directory
+with <code class="literal">--target-dir</code>.</p><p>If you want to import 
the results of a query in parallel, then each map task
+will need to execute a copy of the query, with results partitioned by bounding
+conditions inferred by Sqoop. Your query must include the token <code 
class="literal">$CONDITIONS</code>
+which each Sqoop process will replace with a unique condition expression.
+You must also select a splitting column with <code 
class="literal">--split-by</code>.</p><p>For example:</p><pre class="screen">$ 
sqoop import \
+  --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
+  --split-by a.id --target-dir /user/foo/joinresults</pre><p>Alternately, the 
query can be executed once and imported serially, by
+specifying a single map task with <code class="literal">-m 1</code>:</p><pre 
class="screen">$ sqoop import \
+  --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
+  -m 1 --target-dir /user/foo/joinresults</pre><div class="note" title="Note" 
style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" 
summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img 
alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td 
align="left" valign="top"><p>If you are issuing the query wrapped with double 
quotes ("),
+you will have to use <code class="literal">\$CONDITIONS</code> instead of just 
<code class="literal">$CONDITIONS</code>
+to disallow your shell from treating it as a shell variable.
+For example, a double quoted query may look like:
+<code class="literal">"SELECT * FROM x WHERE a='foo' AND 
\$CONDITIONS"</code></p></td></tr></table></div><div class="note" title="Note" 
style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" 
summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img 
alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td 
align="left" valign="top"><p>The facility of using free-form query in the 
current version of Sqoop
+is limited to simple queries where there are no ambiguous projections and
+no <code class="literal">OR</code> conditions in the <code 
class="literal">WHERE</code> clause. Use of complex queries such as
+queries that have sub-queries or joins leading to ambiguous projections can
+lead to unexpected results.</p></td></tr></table></div></div><div 
class="section" title="7.2.4. Controlling Parallelism"><div 
class="titlepage"><div><div><h4 class="title"><a 
name="_controlling_parallelism"></a>7.2.4. Controlling 
Parallelism</h4></div></div></div><p>Sqoop imports data in parallel from most 
database sources. You can
+specify the number
+of map tasks (parallel processes) to use to perform the import by
+using the <code class="literal">-m</code> or <code 
class="literal">--num-mappers</code> argument. Each of these arguments
+takes an integer value which corresponds to the degree of parallelism
+to employ. By default, four tasks are used. Some databases may see
+improved performance by increasing this value to 8 or 16. Do not
+increase the degree of parallelism greater than that available within
+your MapReduce cluster; tasks will run serially and will likely
+increase the amount of time required to perform the import. Likewise,
+do not increase the degree of parallism higher than that which your
+database can reasonably support. Connecting 100 concurrent clients to
+your database may increase the load on the database server to a point
+where performance suffers as a result.</p><p>When performing parallel imports, 
Sqoop needs a criterion by which it
+can split the workload. Sqoop uses a <span class="emphasis"><em>splitting 
column</em></span> to split the
+workload. By default, Sqoop will identify the primary key column (if
+present) in a table and use it as the splitting column. The low and
+high values for the splitting column are retrieved from the database,
+and the map tasks operate on evenly-sized components of the total
+range. For example, if you had a table with a primary key column of
+<code class="literal">id</code> whose minimum value was 0 and maximum value 
was 1000, and Sqoop
+was directed to use 4 tasks, Sqoop would run four processes which each
+execute SQL statements of the form <code class="literal">SELECT * FROM 
sometable WHERE id
+&gt;= lo AND id &lt; hi</code>, with <code class="literal">(lo, hi)</code> set 
to (0, 250), (250, 500),
+(500, 750), and (750, 1001) in the different tasks.</p><p>If the actual values 
for the primary key are not uniformly distributed
+across its range, then this can result in unbalanced tasks. You should
+explicitly choose a different column with the <code 
class="literal">--split-by</code> argument.
+For example, <code class="literal">--split-by employee_id</code>. Sqoop cannot 
currently split on
+multi-column indices. If your table has no index column, or has a
+multi-column key, then you must also manually choose a splitting
+column.</p><p>If a table does not have a primary key defined and the <code 
class="literal">--split-by &lt;col&gt;</code>
+is not provided, then import will fail unless the number
+of mappers is explicitly set to one with the <code 
class="literal">--num-mappers 1</code> option
+or the <code class="literal">--autoreset-to-one-mapper</code> option is used.  
The option
+<code class="literal">--autoreset-to-one-mapper</code> is typically used with 
the import-all-tables
+tool to automatically handle tables without a primary key in a 
schema.</p></div><div class="section" title="7.2.5. Controlling Distributed 
Cache"><div class="titlepage"><div><div><h4 class="title"><a 
name="_controlling_distributed_cache"></a>7.2.5. Controlling Distributed 
Cache</h4></div></div></div><p>Sqoop will copy the jars in $SQOOP_HOME/lib 
folder to job cache every
+time when start a Sqoop job. When launched by Oozie this is unnecessary
+since Oozie use its own Sqoop share lib which keeps Sqoop dependencies
+in the distributed cache. Oozie will do the localization on each
+worker node for the Sqoop dependencies only once during the first Sqoop
+job and reuse the jars on worker node for subsquencial jobs. Using
+option <code class="literal">--skip-dist-cache</code> in Sqoop command when 
launched by Oozie will
+skip the step which Sqoop copies its dependencies to job cache and save
+massive I/O.</p></div><div class="section" title="7.2.6. Controlling the 
Import Process"><div class="titlepage"><div><div><h4 class="title"><a 
name="_controlling_the_import_process"></a>7.2.6. Controlling the Import 
Process</h4></div></div></div><p>By default, the import process will use JDBC 
which provides a
+reasonable cross-vendor import channel. Some databases can perform
+imports in a more high-performance fashion by using database-specific
+data movement tools. For example, MySQL provides the <code 
class="literal">mysqldump</code> tool
+which can export data from MySQL to other systems very quickly. By
+supplying the <code class="literal">--direct</code> argument, you are 
specifying that Sqoop
+should attempt the direct import channel. This channel may be
+higher performance than using JDBC.</p><p>Details about use of direct mode 
with each specific RDBMS, installation requirements, available
+options and limitations can be found in <a class="xref" href="#connectors" 
title="25. Notes for specific connectors">Section 25, &#8220;Notes for specific 
connectors&#8221;</a>.</p><p>By default, Sqoop will import a table named <code 
class="literal">foo</code> to a directory named
+<code class="literal">foo</code> inside your home directory in HDFS. For 
example, if your
+username is <code class="literal">someuser</code>, then the import tool will 
write to
+<code class="literal">/user/someuser/foo/(files)</code>. You can adjust the 
parent directory of
+the import with the <code class="literal">--warehouse-dir</code> argument. For 
example:</p><pre class="screen">$ sqoop import --connnect &lt;connect-str&gt; 
--table foo --warehouse-dir /shared \
+    ...</pre><p>This command would write to a set of files in the <code 
class="literal">/shared/foo/</code> directory.</p><p>You can also explicitly 
choose the target directory, like so:</p><pre class="screen">$ sqoop import 
--connnect &lt;connect-str&gt; --table foo --target-dir /dest \
+    ...</pre><p>This will import the files into the <code 
class="literal">/dest</code> directory. <code 
class="literal">--target-dir</code> is
+incompatible with <code class="literal">--warehouse-dir</code>.</p><p>When 
using direct mode, you can specify additional arguments which
+should be passed to the underlying tool. If the argument
+<code class="literal">--</code> is given on the command-line, then subsequent 
arguments are sent
+directly to the underlying tool. For example, the following adjusts
+the character set used by <code class="literal">mysqldump</code>:</p><pre 
class="screen">$ sqoop import --connect jdbc:mysql://server.foo.com/db --table 
bar \
+    --direct -- --default-character-set=latin1</pre><p>By default, imports go 
to a new target location. If the destination directory
+already exists in HDFS, Sqoop will refuse to import and overwrite that
+directory&#8217;s contents. If you use the <code 
class="literal">--append</code> argument, Sqoop will import
+data to a temporary directory and then rename the files into the normal
+target directory in a manner that does not conflict with existing filenames
+in that directory.</p></div><div class="section" title="7.2.7. Controlling 
transaction isolation"><div class="titlepage"><div><div><h4 class="title"><a 
name="_controlling_transaction_isolation"></a>7.2.7. Controlling transaction 
isolation</h4></div></div></div><p>By default, Sqoop uses the read committed 
transaction isolation in the mappers
+to import data.  This may not be the ideal in all ETL workflows and it may
+desired to reduce the isolation guarantees.   The <code 
class="literal">--relaxed-isolation</code> option
+can be used to instruct Sqoop to use read uncommitted isolation 
level.</p><p>The <code class="literal">read-uncommitted</code> isolation level 
is not supported on all databases
+(for example, Oracle), so specifying the option <code 
class="literal">--relaxed-isolation</code>
+may not be supported on all databases.</p></div><div class="section" 
title="7.2.8. Controlling type mapping"><div class="titlepage"><div><div><h4 
class="title"><a name="_controlling_type_mapping"></a>7.2.8. Controlling type 
mapping</h4></div></div></div><p>Sqoop is preconfigured to map most SQL types 
to appropriate Java or Hive
+representatives. However the default mapping might not be suitable for
+everyone and might be overridden by <code 
class="literal">--map-column-java</code> (for changing
+mapping to Java) or <code class="literal">--map-column-hive</code> (for 
changing Hive mapping).</p><div class="table"><a name="idp3579352"></a><p 
class="title"><b>Table 4. Parameters for overriding mapping</b></p><div 
class="table-contents"><table summary="Parameters for overriding mapping" 
style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt 
solid ; "><colgroup><col align="left"><col 
align="left"></colgroup><thead><tr><th style="border-right: 0.5pt solid ; 
border-bottom: 0.5pt solid ; " align="left">


[... 4007 lines stripped ...]

svn commit: r1678546 [2/16] - in /sqoop/site/trunk/content/resources/docs/1.4.6: ./ api/ api/com/ api/com/cloudera/ api/com/cloudera/sqoop/ api/com/cloudera/sqoop/lib/ api/com/cloudera/sqoop/lib/class-use/ api/org/ api/org/apache/ api/org/apache/sqoop/...

Reply via email to