Added: sqoop/site/trunk/content/resources/docs/1.4.6/SqoopUserGuide.html URL: http://svn.apache.org/viewvc/sqoop/site/trunk/content/resources/docs/1.4.6/SqoopUserGuide.html?rev=1678546&view=auto ============================================================================== --- sqoop/site/trunk/content/resources/docs/1.4.6/SqoopUserGuide.html (added) +++ sqoop/site/trunk/content/resources/docs/1.4.6/SqoopUserGuide.html Sun May 10 10:10:43 2015 @@ -0,0 +1,4607 @@ +<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Sqoop User Guide (v1.4.6)</title><link rel="stylesheet" href="docbook.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.75.2"></head><body><div style="clear:both; margin-bottom: 4px"></div><div align="center"><a href="index.html"><img src="images/home.png" alt="Documentation Home"></a></div><span class="breadcrumbs"><div class="breadcrumbs"><span class="breadcrumb-node">Sqoop User Guide (v1.4.6)</span></div></span><div lang="en" class="article" title="Sqoop User Guide (v1.4.6)"><div class="titlepage"><div><div><h2 class="title"><a name="idp120136"></a>Sqoop User Guide (v1.4.6)</h2></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="#_introduction">1. Introduction</a></span></dt><dt><span class="section"><a href="#_supported_releases">2. Supported Releases</a></span></dt><dt><span class="section"><a href="#_s qoop_releases">3. Sqoop Releases</a></span></dt><dt><span class="section"><a href="#_prerequisites">4. Prerequisites</a></span></dt><dt><span class="section"><a href="#_basic_usage">5. Basic Usage</a></span></dt><dt><span class="section"><a href="#_sqoop_tools">6. Sqoop Tools</a></span></dt><dd><dl><dt><span class="section"><a href="#_using_command_aliases">6.1. Using Command Aliases</a></span></dt><dt><span class="section"><a href="#_controlling_the_hadoop_installation">6.2. Controlling the Hadoop Installation</a></span></dt><dt><span class="section"><a href="#_using_generic_and_specific_arguments">6.3. Using Generic and Specific Arguments</a></span></dt><dt><span class="section"><a href="#_using_options_files_to_pass_arguments">6.4. Using Options Files to Pass Arguments</a></span></dt><dt><span class="section"><a href="#_using_tools">6.5. Using Tools</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_import_literal">7. <code class="literal">sqoop-import</c ode></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose">7.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax">7.2. Syntax</a></span></dt><dd><dl><dt><span class="section"><a href="#_connecting_to_a_database_server">7.2.1. Connecting to a Database Server</a></span></dt><dt><span class="section"><a href="#_selecting_the_data_to_import">7.2.2. Selecting the Data to Import</a></span></dt><dt><span class="section"><a href="#_free_form_query_imports">7.2.3. Free-form Query Imports</a></span></dt><dt><span class="section"><a href="#_controlling_parallelism">7.2.4. Controlling Parallelism</a></span></dt><dt><span class="section"><a href="#_controlling_distributed_cache">7.2.5. Controlling Distributed Cache</a></span></dt><dt><span class="section"><a href="#_controlling_the_import_process">7.2.6. Controlling the Import Process</a></span></dt><dt><span class="section"><a href="#_controlling_transaction_isolation">7.2.7. Controlling transaction isolation</a ></span></dt><dt><span class="section"><a >href="#_controlling_type_mapping">7.2.8. Controlling type >mapping</a></span></dt><dt><span class="section"><a >href="#_incremental_imports">7.2.9. Incremental >Imports</a></span></dt><dt><span class="section"><a >href="#_file_formats">7.2.10. File Formats</a></span></dt><dt><span >class="section"><a href="#_large_objects">7.2.11. Large >Objects</a></span></dt><dt><span class="section"><a >href="#_importing_data_into_hive">7.2.12. Importing Data Into >Hive</a></span></dt><dt><span class="section"><a >href="#_importing_data_into_hbase">7.2.13. Importing Data Into >HBase</a></span></dt><dt><span class="section"><a >href="#_importing_data_into_accumulo">7.2.14. Importing Data Into >Accumulo</a></span></dt><dt><span class="section"><a >href="#_additional_import_configuration_properties">7.2.15. Additional Import >Configuration Properties</a></span></dt></dl></dd><dt><span >class="section"><a href="#_example_invocations">7.3. Example >Invocations</a></span></dt> </dl></dd><dt><span class="section"><a href="#_literal_sqoop_import_all_tables_literal">8. <code class="literal">sqoop-import-all-tables</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_2">8.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_2">8.2. Syntax</a></span></dt><dt><span class="section"><a href="#_example_invocations_2">8.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_import_mainframe_literal">9. <code class="literal">sqoop-import-mainframe</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_3">9.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_3">9.2. Syntax</a></span></dt><dd><dl><dt><span class="section"><a href="#_connecting_to_a_mainframe">9.2.1. Connecting to a Mainframe</a></span></dt><dt><span class="section"><a href="#_selecting_the_files_to_import">9.2.2. Selecting the Files to Import</a></span></dt><dt><span class="section" ><a href="#_controlling_parallelism_2">9.2.3. Controlling >Parallelism</a></span></dt><dt><span class="section"><a >href="#_controlling_distributed_cache_2">9.2.4. Controlling Distributed >Cache</a></span></dt><dt><span class="section"><a >href="#_controlling_the_import_process_2">9.2.5. Controlling the Import >Process</a></span></dt><dt><span class="section"><a >href="#_file_formats_2">9.2.6. File Formats</a></span></dt><dt><span >class="section"><a href="#_importing_data_into_hive_2">9.2.7. Importing Data >Into Hive</a></span></dt><dt><span class="section"><a >href="#_importing_data_into_hbase_2">9.2.8. Importing Data Into >HBase</a></span></dt><dt><span class="section"><a >href="#_importing_data_into_accumulo_2">9.2.9. Importing Data Into >Accumulo</a></span></dt><dt><span class="section"><a >href="#_additional_import_configuration_properties_2">9.2.10. Additional >Import Configuration Properties</a></span></dt></dl></dd><dt><span >class="section"><a href="#_example_invocations_3">9.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_export_literal">10. <code class="literal">sqoop-export</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_4">10.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_4">10.2. Syntax</a></span></dt><dt><span class="section"><a href="#_inserts_vs_updates">10.3. Inserts vs. Updates</a></span></dt><dt><span class="section"><a href="#_exports_and_transactions">10.4. Exports and Transactions</a></span></dt><dt><span class="section"><a href="#_failed_exports">10.5. Failed Exports</a></span></dt><dt><span class="section"><a href="#_example_invocations_4">10.6. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#validation">11. <code class="literal">validation</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_5">11.1. Purpose</a></span></dt><dt><span class="section"><a href="#_introduction_2">11.2. Introduction</ a></span></dt><dt><span class="section"><a href="#_syntax_5">11.3. Syntax</a></span></dt><dt><span class="section"><a href="#_configuration">11.4. Configuration</a></span></dt><dt><span class="section"><a href="#_limitations">11.5. Limitations</a></span></dt><dt><span class="section"><a href="#_example_invocations_5">11.6. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_saved_jobs">12. Saved Jobs</a></span></dt><dt><span class="section"><a href="#_literal_sqoop_job_literal">13. <code class="literal">sqoop-job</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_6">13.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_6">13.2. Syntax</a></span></dt><dt><span class="section"><a href="#_saved_jobs_and_passwords">13.3. Saved jobs and passwords</a></span></dt><dt><span class="section"><a href="#_saved_jobs_and_incremental_imports">13.4. Saved jobs and incremental imports</a></span></dt></dl></dd><dt><span class="s ection"><a href="#_literal_sqoop_metastore_literal">14. <code class="literal">sqoop-metastore</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_7">14.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_7">14.2. Syntax</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_merge_literal">15. <code class="literal">sqoop-merge</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_8">15.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_8">15.2. Syntax</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_codegen_literal">16. <code class="literal">sqoop-codegen</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_9">16.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_9">16.2. Syntax</a></span></dt><dt><span class="section"><a href="#_example_invocations_6">16.3. Example Invocations</a></span></dt></dl></dd><dt><span clas s="section"><a href="#_literal_sqoop_create_hive_table_literal">17. <code class="literal">sqoop-create-hive-table</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_10">17.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_10">17.2. Syntax</a></span></dt><dt><span class="section"><a href="#_example_invocations_7">17.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_eval_literal">18. <code class="literal">sqoop-eval</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_11">18.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_11">18.2. Syntax</a></span></dt><dt><span class="section"><a href="#_example_invocations_8">18.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_list_databases_literal">19. <code class="literal">sqoop-list-databases</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_pu rpose_12">19.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_12">19.2. Syntax</a></span></dt><dt><span class="section"><a href="#_example_invocations_9">19.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_list_tables_literal">20. <code class="literal">sqoop-list-tables</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_13">20.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_13">20.2. Syntax</a></span></dt><dt><span class="section"><a href="#_example_invocations_10">20.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_help_literal">21. <code class="literal">sqoop-help</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_14">21.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_14">21.2. Syntax</a></span></dt><dt><span class="section"><a href="#_example_invocations_11">21.3. Exampl e Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_literal_sqoop_version_literal">22. <code class="literal">sqoop-version</code></a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_15">22.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax_15">22.2. Syntax</a></span></dt><dt><span class="section"><a href="#_example_invocations_12">22.3. Example Invocations</a></span></dt></dl></dd><dt><span class="section"><a href="#_sqoop_hcatalog_integration">23. Sqoop-HCatalog Integration</a></span></dt><dd><dl><dt><span class="section"><a href="#_hcatalog_background">23.1. HCatalog Background</a></span></dt><dt><span class="section"><a href="#_exposing_hcatalog_tables_to_sqoop">23.2. Exposing HCatalog Tables to Sqoop</a></span></dt><dd><dl><dt><span class="section"><a href="#_new_command_line_options">23.2.1. New Command Line Options</a></span></dt><dt><span class="section"><a href="#_supported_sqoop_hive_options">23.2.2. Supported Sqoop Hive Options</a></span></dt><dt><span class="section"><a href="#_direct_mode_support">23.2.3. Direct Mode support</a></span></dt><dt><span class="section"><a href="#_unsupported_sqoop_options">23.2.4. Unsupported Sqoop Options</a></span></dt><dd><dl><dt><span class="section"><a href="#_unsupported_sqoop_hive_import_options">23.2.4.1. Unsupported Sqoop Hive Import Options</a></span></dt><dt><span class="section"><a href="#_unsupported_sqoop_export_and_import_options">23.2.4.2. Unsupported Sqoop Export and Import Options</a></span></dt></dl></dd><dt><span class="section"><a href="#_ignored_sqoop_options">23.2.5. Ignored Sqoop Options</a></span></dt></dl></dd><dt><span class="section"><a href="#_automatic_table_creation">23.3. Automatic Table Creation</a></span></dt><dt><span class="section"><a href="#_delimited_text_formats_and_field_and_line_delimiter_characters">23.4. Delimited Text Formats and Field and Line Delimiter Characters</a></span></dt><dt><span class="section"><a href="#_ hcatalog_table_requirements">23.5. HCatalog Table Requirements</a></span></dt><dt><span class="section"><a href="#_support_for_partitioning">23.6. Support for Partitioning</a></span></dt><dt><span class="section"><a href="#_schema_mapping">23.7. Schema Mapping</a></span></dt><dt><span class="section"><a href="#_support_for_hcatalog_data_types">23.8. Support for HCatalog Data Types</a></span></dt><dt><span class="section"><a href="#_providing_hive_and_hcatalog_libraries_for_the_sqoop_job">23.9. Providing Hive and HCatalog Libraries for the Sqoop Job</a></span></dt><dt><span class="section"><a href="#_examples">23.10. Examples</a></span></dt><dt><span class="section"><a href="#_import">23.11. Import</a></span></dt><dt><span class="section"><a href="#_export">23.12. Export</a></span></dt></dl></dd><dt><span class="section"><a href="#_compatibility_notes">24. Compatibility Notes</a></span></dt><dd><dl><dt><span class="section"><a href="#_supported_databases">24.1. Supported Databases</a ></span></dt><dt><span class="section"><a href="#_mysql">24.2. >MySQL</a></span></dt><dd><dl><dt><span class="section"><a >href="#_zerodatetimebehavior">24.2.1. >zeroDateTimeBehavior</a></span></dt><dt><span class="section"><a >href="#_literal_unsigned_literal_columns">24.2.2. <code >class="literal">UNSIGNED</code> columns</a></span></dt><dt><span >class="section"><a >href="#_literal_blob_literal_and_literal_clob_literal_columns">24.2.3. <code >class="literal">BLOB</code> and <code class="literal">CLOB</code> >columns</a></span></dt><dt><span class="section"><a >href="#_importing_views_in_direct_mode">24.2.4. Importing views in direct >mode</a></span></dt></dl></dd><dt><span class="section"><a >href="#_postgresql">24.3. PostgreSQL</a></span></dt><dd><dl><dt><span >class="section"><a href="#_importing_views_in_direct_mode_2">24.3.1. >Importing views in direct mode</a></span></dt></dl></dd><dt><span >class="section"><a href="#_oracle">24.4. >Oracle</a></span></dt><dd><dl><dt><span class="section"><a href="#_dates_and_times">24.4.1. Dates and Times</a></span></dt></dl></dd><dt><span class="section"><a href="#_schema_definition_in_hive">24.5. Schema Definition in Hive</a></span></dt><dt><span class="section"><a href="#_cubrid">24.6. CUBRID</a></span></dt></dl></dd><dt><span class="section"><a href="#connectors">25. Notes for specific connectors</a></span></dt><dd><dl><dt><span class="section"><a href="#_mysql_jdbc_connector">25.1. MySQL JDBC Connector</a></span></dt><dd><dl><dt><span class="section"><a href="#_upsert_functionality">25.1.1. Upsert functionality</a></span></dt></dl></dd><dt><span class="section"><a href="#_mysql_direct_connector">25.2. MySQL Direct Connector</a></span></dt><dd><dl><dt><span class="section"><a href="#_requirements">25.2.1. Requirements</a></span></dt><dt><span class="section"><a href="#_limitations_2">25.2.2. Limitations</a></span></dt><dt><span class="section"><a href="#_direct_mode_transactions">25.2.3. Direct-mode Transactions</a></span></dt></dl ></dd><dt><span class="section"><a href="#_microsoft_sql_connector">25.3. >Microsoft SQL Connector</a></span></dt><dd><dl><dt><span class="section"><a >href="#_extra_arguments">25.3.1. Extra arguments</a></span></dt><dt><span >class="section"><a href="#_allow_identity_inserts">25.3.2. Allow identity >inserts</a></span></dt><dt><span class="section"><a >href="#_non_resilient_operations">25.3.3. Non-resilient >operations</a></span></dt><dt><span class="section"><a >href="#_schema_support">25.3.4. Schema support</a></span></dt><dt><span >class="section"><a href="#_table_hints">25.3.5. Table >hints</a></span></dt></dl></dd><dt><span class="section"><a >href="#_postgresql_connector">25.4. PostgreSQL >Connector</a></span></dt><dd><dl><dt><span class="section"><a >href="#_extra_arguments_2">25.4.1. Extra arguments</a></span></dt><dt><span >class="section"><a href="#_schema_support_2">25.4.2. Schema >support</a></span></dt></dl></dd><dt><span class="section"><a >href="#_postgresql_direct_connector">25.5. PostgreSQL Direct Connector</a></span></dt><dd><dl><dt><span class="section"><a href="#_requirements_2">25.5.1. Requirements</a></span></dt><dt><span class="section"><a href="#_limitations_3">25.5.2. Limitations</a></span></dt></dl></dd><dt><span class="section"><a href="#_pg_bulkload_connector">25.6. pg_bulkload connector</a></span></dt><dd><dl><dt><span class="section"><a href="#_purpose_16">25.6.1. Purpose</a></span></dt><dt><span class="section"><a href="#_requirements_3">25.6.2. Requirements</a></span></dt><dt><span class="section"><a href="#_syntax_16">25.6.3. Syntax</a></span></dt><dt><span class="section"><a href="#_data_staging">25.6.4. Data Staging</a></span></dt></dl></dd><dt><span class="section"><a href="#_netezza_connector">25.7. Netezza Connector</a></span></dt><dd><dl><dt><span class="section"><a href="#_extra_arguments_3">25.7.1. Extra arguments</a></span></dt><dt><span class="section"><a href="#_direct_mode">25.7.2. Direct Mode</a></span></dt><dt><span class="secti on"><a href="#_null_string_handling">25.7.3. Null string handling</a></span></dt></dl></dd><dt><span class="section"><a href="#_data_connector_for_oracle_and_hadoop">25.8. Data Connector for Oracle and Hadoop</a></span></dt><dd><dl><dt><span class="section"><a href="#_about">25.8.1. About</a></span></dt><dd><dl><dt><span class="section"><a href="#_jobs">25.8.1.1. Jobs</a></span></dt><dt><span class="section"><a href="#_how_the_standard_oracle_manager_works_for_imports">25.8.1.2. How The Standard Oracle Manager Works for Imports</a></span></dt><dt><span class="section"><a href="#_how_the_data_connector_for_oracle_and_hadoop_works_for_imports">25.8.1.3. How The Data Connector for Oracle and Hadoop Works for Imports</a></span></dt><dt><span class="section"><a href="#_data_connector_for_oracle_and_hadoop_exports">25.8.1.4. Data Connector for Oracle and Hadoop Exports</a></span></dt></dl></dd><dt><span class="section"><a href="#_requirements_4">25.8.2. Requirements</a></span></dt><dd><dl ><dt><span class="section"><a >href="#_ensure_the_oracle_database_jdbc_driver_is_setup_correctly">25.8.2.1. >Ensure The Oracle Database JDBC Driver Is Setup >Correctly</a></span></dt><dt><span class="section"><a >href="#_oracle_roles_and_privileges">25.8.2.2. Oracle Roles and >Privileges</a></span></dt><dt><span class="section"><a >href="#_additional_oracle_roles_and_privileges_required_for_export">25.8.2.3. >Additional Oracle Roles And Privileges Required for >Export</a></span></dt><dt><span class="section"><a >href="#_supported_data_types">25.8.2.4. Supported Data >Types</a></span></dt></dl></dd><dt><span class="section"><a >href="#_execute_sqoop_with_data_connector_for_oracle_and_hadoop">25.8.3. >Execute Sqoop With Data Connector for Oracle and >Hadoop</a></span></dt><dd><dl><dt><span class="section"><a >href="#_connect_to_oracle_oracle_rac">25.8.3.1. Connect to Oracle / Oracle >RAC</a></span></dt><dt><span class="section"><a >href="#_connect_to_an_oracle_database_instance">25.8.3.2. Connect to An Oracle Database Instance</a></span></dt><dt><span class="section"><a href="#_connect_to_an_oracle_rac">25.8.3.3. Connect to An Oracle RAC</a></span></dt><dt><span class="section"><a href="#_login_to_the_oracle_instance">25.8.3.4. Login to The Oracle Instance</a></span></dt><dt><span class="section"><a href="#_kill_data_connector_for_oracle_and_hadoop_jobs">25.8.3.5. Kill Data Connector for Oracle and Hadoop Jobs</a></span></dt></dl></dd><dt><span class="section"><a href="#_import_data_from_oracle">25.8.4. Import Data from Oracle</a></span></dt><dd><dl><dt><span class="section"><a href="#_match_hadoop_files_to_oracle_table_partitions">25.8.4.1. Match Hadoop Files to Oracle Table Partitions</a></span></dt><dt><span class="section"><a href="#_specify_the_partitions_to_import">25.8.4.2. Specify The Partitions To Import</a></span></dt><dt><span class="section"><a href="#_consistent_read_all_mappers_read_from_the_same_point_in_time">25.8.4.3. Consistent Read: All Mappers Read From The Same Point In Time</a></span></dt></dl></dd><dt><span class="section"><a href="#_export_data_into_oracle">25.8.5. Export Data into Oracle</a></span></dt><dd><dl><dt><span class="section"><a href="#_insert_export">25.8.5.1. Insert-Export</a></span></dt><dt><span class="section"><a href="#_update_export">25.8.5.2. Update-Export</a></span></dt><dt><span class="section"><a href="#_merge_export">25.8.5.3. Merge-Export</a></span></dt><dt><span class="section"><a href="#_create_oracle_tables">25.8.5.4. Create Oracle Tables</a></span></dt><dt><span class="section"><a href="#_nologging">25.8.5.5. NOLOGGING</a></span></dt><dt><span class="section"><a href="#_partitioning">25.8.5.6. Partitioning</a></span></dt><dt><span class="section"><a href="#_match_rows_via_multiple_columns">25.8.5.7. Match Rows Via Multiple Columns</a></span></dt><dt><span class="section"><a href="#_storage_clauses">25.8.5.8. Storage Clauses</a></span></dt></dl></dd><dt><span class="section"><a href="#_manage_date_and_tim estamp_data_types">25.8.6. Manage Date And Timestamp Data Types</a></span></dt><dd><dl><dt><span class="section"><a href="#_import_date_and_timestamp_data_types_from_oracle">25.8.6.1. Import Date And Timestamp Data Types from Oracle</a></span></dt><dt><span class="section"><a href="#_the_data_connector_for_oracle_and_hadoop_does_not_apply_a_time_zone_to_date_timestamp_data_types">25.8.6.2. The Data Connector for Oracle and Hadoop Does Not Apply A Time Zone to DATE / TIMESTAMP Data Types</a></span></dt><dt><span class="section"><a href="#_the_data_connector_for_oracle_and_hadoop_retains_time_zone_information_in_timezone_data_types">25.8.6.3. The Data Connector for Oracle and Hadoop Retains Time Zone Information in TIMEZONE Data Types</a></span></dt><dt><span class="section"><a href="#_data_connector_for_oracle_and_hadoop_explicitly_states_time_zone_for_local_timezone_data_types">25.8.6.4. Data Connector for Oracle and Hadoop Explicitly States Time Zone for LOCAL TIMEZONE Data Types</ a></span></dt><dt><span class="section"><a href="#_java_sql_timestamp">25.8.6.5. java.sql.Timestamp</a></span></dt><dt><span class="section"><a href="#_export_date_and_timestamp_data_types_into_oracle">25.8.6.6. Export Date And Timestamp Data Types into Oracle</a></span></dt></dl></dd><dt><span class="section"><a href="#_configure_the_data_connector_for_oracle_and_hadoop">25.8.7. Configure The Data Connector for Oracle and Hadoop</a></span></dt><dd><dl><dt><span class="section"><a href="#_oraoop_site_template_xml">25.8.7.1. oraoop-site-template.xml</a></span></dt><dt><span class="section"><a href="#_oraoop_oracle_session_initialization_statements">25.8.7.2. oraoop.oracle.session.initialization.statements</a></span></dt><dt><span class="section"><a href="#_oraoop_table_import_where_clause_location">25.8.7.3. oraoop.table.import.where.clause.location</a></span></dt><dt><span class="section"><a href="#_oracle_row_fetch_size">25.8.7.4. oracle.row.fetch.size</a></span></dt><dt><span clas s="section"><a href="#_oraoop_import_hint">25.8.7.5. oraoop.import.hint</a></span></dt><dt><span class="section"><a href="#_oraoop_oracle_append_values_hint_usage">25.8.7.6. oraoop.oracle.append.values.hint.usage</a></span></dt><dt><span class="section"><a href="#_mapred_map_tasks_speculative_execution">25.8.7.7. mapred.map.tasks.speculative.execution</a></span></dt><dt><span class="section"><a href="#_oraoop_block_allocation">25.8.7.8. oraoop.block.allocation</a></span></dt><dt><span class="section"><a href="#_oraoop_import_omit_lobs_and_long">25.8.7.9. oraoop.import.omit.lobs.and.long</a></span></dt><dt><span class="section"><a href="#_oraoop_locations">25.8.7.10. oraoop.locations</a></span></dt><dt><span class="section"><a href="#_sqoop_connection_factories">25.8.7.11. sqoop.connection.factories</a></span></dt><dt><span class="section"><a href="#_expressions_in_oraoop_site_xml">25.8.7.12. Expressions in oraoop-site.xml</a></span></dt></dl></dd><dt><span class="section"><a href="# _troubleshooting_the_data_connector_for_oracle_and_hadoop">25.8.8. Troubleshooting The Data Connector for Oracle and Hadoop</a></span></dt><dd><dl><dt><span class="section"><a href="#_quote_oracle_owners_and_tables">25.8.8.1. Quote Oracle Owners And Tables</a></span></dt><dt><span class="section"><a href="#_quote_oracle_columns">25.8.8.2. Quote Oracle Columns</a></span></dt><dt><span class="section"><a href="#_confirm_the_data_connector_for_oracle_and_hadoop_can_initialize_the_oracle_session">25.8.8.3. Confirm The Data Connector for Oracle and Hadoop Can Initialize The Oracle Session</a></span></dt><dt><span class="section"><a href="#_check_the_sqoop_debug_logs_for_error_messages">25.8.8.4. Check The Sqoop Debug Logs for Error Messages</a></span></dt><dt><span class="section"><a href="#_export_check_tables_are_compatible">25.8.8.5. Export: Check Tables Are Compatible</a></span></dt><dt><span class="section"><a href="#_export_parallelization">25.8.8.6. Export: Parallelization</a></sp an></dt><dt><span class="section"><a href="#_export_check_oraoop_oracle_append_values_hint_usage">25.8.8.7. Export: Check oraoop.oracle.append.values.hint.usage</a></span></dt><dt><span class="section"><a href="#_turn_on_verbose">25.8.8.8. Turn On Verbose</a></span></dt></dl></dd></dl></dd></dl></dd><dt><span class="section"><a href="#_getting_support">26. Getting Support</a></span></dt><dt><span class="section"><a href="#_troubleshooting">27. Troubleshooting</a></span></dt><dd><dl><dt><span class="section"><a href="#_general_troubleshooting_process">27.1. General Troubleshooting Process</a></span></dt><dt><span class="section"><a href="#_specific_troubleshooting_tips">27.2. Specific Troubleshooting Tips</a></span></dt><dd><dl><dt><span class="section"><a href="#_oracle_connection_reset_errors">27.2.1. Oracle: Connection Reset Errors</a></span></dt><dt><span class="section"><a href="#_oracle_case_sensitive_catalog_query_errors">27.2.2. Oracle: Case-Sensitive Catalog Query Errors</a> </span></dt><dt><span class="section"><a href="#_mysql_connection_failure">27.2.3. MySQL: Connection Failure</a></span></dt><dt><span class="section"><a href="#_oracle_ora_00933_error_sql_command_not_properly_ended">27.2.4. Oracle: ORA-00933 error (SQL command not properly ended)</a></span></dt><dt><span class="section"><a href="#_mysql_import_of_tinyint_1_from_mysql_behaves_strangely">27.2.5. MySQL: Import of TINYINT(1) from MySQL behaves strangely</a></span></dt></dl></dd></dl></dd></dl></div><pre class="screen"> Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License.</pre><div class="section" title="1. Introduction"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_introduction"></a>1. Introduction</h2></div></div></div><p>Sqoop is a tool designed to transfer data between Hadoop and +relational databases or mainframes. You can use Sqoop to import data from a +relational database management system (RDBMS) such as MySQL or Oracle or a +mainframe into the Hadoop Distributed File System (HDFS), +transform the data in Hadoop MapReduce, and then export the data back +into an RDBMS.</p><p>Sqoop automates most of this process, relying on the database to +describe the schema for the data to be imported. Sqoop uses MapReduce +to import and export the data, which provides parallel operation as +well as fault tolerance.</p><p>This document describes how to get started using Sqoop to move data +between databases and Hadoop or mainframe to Hadoop and provides reference +information for the operation of the Sqoop command-line tool suite. This +document is intended for:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"> +System and application programmers +</li><li class="listitem"> +System administrators +</li><li class="listitem"> +Database administrators +</li><li class="listitem"> +Data analysts +</li><li class="listitem"> +Data engineers +</li></ul></div></div><div class="section" title="2. Supported Releases"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_supported_releases"></a>2. Supported Releases</h2></div></div></div><p>This documentation applies to Sqoop v1.4.6.</p></div><div class="section" title="3. Sqoop Releases"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_sqoop_releases"></a>3. Sqoop Releases</h2></div></div></div><p>Sqoop is an open source software product of the Apache Software Foundation.</p><p>Software development for Sqoop occurs at <a class="ulink" href="http://sqoop.apache.org" target="_top">http://sqoop.apache.org</a> +At that site you can obtain:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"> +New releases of Sqoop as well as its most recent source code +</li><li class="listitem"> +An issue tracker +</li><li class="listitem"> +A wiki that contains Sqoop documentation +</li></ul></div></div><div class="section" title="4. Prerequisites"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_prerequisites"></a>4. Prerequisites</h2></div></div></div><p>The following prerequisite knowledge is required for this product:</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"> +Basic computer technology and terminology +</li><li class="listitem"> +Familiarity with command-line interfaces such as <code class="literal">bash</code> +</li><li class="listitem"> +Relational database management systems +</li><li class="listitem"> +Basic familiarity with the purpose and operation of Hadoop +</li></ul></div><p>Before you can use Sqoop, a release of Hadoop must be installed and +configured. Sqoop is currently supporting 4 major Hadoop releases - 0.20, +0.23, 1.0 and 2.0.</p><p>This document assumes you are using a Linux or Linux-like environment. +If you are using Windows, you may be able to use cygwin to accomplish +most of the following tasks. If you are using Mac OS X, you should see +few (if any) compatibility errors. Sqoop is predominantly operated and +tested on Linux.</p></div><div class="section" title="5. Basic Usage"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_basic_usage"></a>5. Basic Usage</h2></div></div></div><p>With Sqoop, you can <span class="emphasis"><em>import</em></span> data from a relational database system or a +mainframe into HDFS. The input to the import process is either database table +or mainframe datasets. For databases, Sqoop will read the table row-by-row +into HDFS. For mainframe datasets, Sqoop will read records from each mainframe +dataset into HDFS. The output of this import process is a set of files +containing a copy of the imported table or datasets. +The import process is performed in parallel. For this reason, the +output will be in multiple files. These files may be delimited text +files (for example, with commas or tabs separating each field), or +binary Avro or SequenceFiles containing serialized record data.</p><p>A by-product of the import process is a generated Java class which +can encapsulate one row of the imported table. This class is used +during the import process by Sqoop itself. The Java source code for +this class is also provided to you, for use in subsequent MapReduce +processing of the data. This class can serialize and deserialize data +to and from the SequenceFile format. It can also parse the +delimited-text form of a record. These abilities allow you to quickly +develop MapReduce applications that use the HDFS-stored records in +your processing pipeline. You are also free to parse the delimiteds +record data yourself, using any other tools you prefer.</p><p>After manipulating the imported records (for example, with MapReduce +or Hive) you may have a result data set which you can then <span class="emphasis"><em>export</em></span> +back to the relational database. Sqoop’s export process will read +a set of delimited text files from HDFS in parallel, parse them into +records, and insert them as new rows in a target database table, for +consumption by external applications or users.</p><p>Sqoop includes some other commands which allow you to inspect the +database you are working with. For example, you can list the available +database schemas (with the <code class="literal">sqoop-list-databases</code> tool) and tables +within a schema (with the <code class="literal">sqoop-list-tables</code> tool). Sqoop also +includes a primitive SQL execution shell (the <code class="literal">sqoop-eval</code> tool).</p><p>Most aspects of the import, code generation, and export processes can +be customized. For databases, you can control the specific row range or +columns imported. You can specify particular delimiters and escape characters +for the file-based representation of the data, as well as the file format +used. You can also control the class or package names used in +generated code. Subsequent sections of this document explain how to +specify these and other arguments to Sqoop.</p></div><div class="section" title="6. Sqoop Tools"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_sqoop_tools"></a>6. Sqoop Tools</h2></div></div></div><div class="toc"><dl><dt><span class="section"><a href="#_using_command_aliases">6.1. Using Command Aliases</a></span></dt><dt><span class="section"><a href="#_controlling_the_hadoop_installation">6.2. Controlling the Hadoop Installation</a></span></dt><dt><span class="section"><a href="#_using_generic_and_specific_arguments">6.3. Using Generic and Specific Arguments</a></span></dt><dt><span class="section"><a href="#_using_options_files_to_pass_arguments">6.4. Using Options Files to Pass Arguments</a></span></dt><dt><span class="section"><a href="#_using_tools">6.5. Using Tools</a></span></dt></dl></div><p>Sqoop is a collection of related tools. To use Sqoop, you specify the +tool you want to use and the arguments that control the tool.</p><p>If Sqoop is compiled from its own source, you can run Sqoop without a formal +installation process by running the <code class="literal">bin/sqoop</code> program. Users +of a packaged deployment of Sqoop (such as an RPM shipped with Apache Bigtop) +will see this program installed as <code class="literal">/usr/bin/sqoop</code>. The remainder of this +documentation will refer to this program as <code class="literal">sqoop</code>. For example:</p><pre class="screen">$ sqoop tool-name [tool-arguments]</pre><div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>The following examples that begin with a <code class="literal">$</code> character indicate +that the commands must be entered at a terminal prompt (such as +<code class="literal">bash</code>). The <code class="literal">$</code> character represents the prompt itself; you should +not start these commands by typing a <code class="literal">$</code>. You can also enter commands +inline in the text of a paragraph; for example, <code class="literal">sqoop help</code>. These +examples do not show a <code class="literal">$</code> prefix, but you should enter them the same +way. Don’t confuse the <code class="literal">$</code> shell prompt in the examples with the <code class="literal">$</code> +that precedes an environment variable name. For example, the string +literal <code class="literal">$HADOOP_HOME</code> includes a "<code class="literal">$</code>".</p></td></tr></table></div><p>Sqoop ships with a help tool. To display a list of all available +tools, type the following command:</p><pre class="screen">$ sqoop help +usage: sqoop COMMAND [ARGS] + +Available commands: + codegen Generate code to interact with database records + create-hive-table Import a table definition into Hive + eval Evaluate a SQL statement and display the results + export Export an HDFS directory to a database table + help List available commands + import Import a table from a database to HDFS + import-all-tables Import tables from a database to HDFS + import-mainframe Import mainframe datasets to HDFS + list-databases List available databases on a server + list-tables List available tables in a database + version Display version information + +See 'sqoop help COMMAND' for information on a specific command.</pre><p>You can display help for a specific tool by entering: <code class="literal">sqoop help +(tool-name)</code>; for example, <code class="literal">sqoop help import</code>.</p><p>You can also add the <code class="literal">--help</code> argument to any command: <code class="literal">sqoop import +--help</code>.</p><div class="section" title="6.1. Using Command Aliases"><div class="titlepage"><div><div><h3 class="title"><a name="_using_command_aliases"></a>6.1. Using Command Aliases</h3></div></div></div><p>In addition to typing the <code class="literal">sqoop (toolname)</code> syntax, you can use alias +scripts that specify the <code class="literal">sqoop-(toolname)</code> syntax. For example, the +scripts <code class="literal">sqoop-import</code>, <code class="literal">sqoop-export</code>, etc. each select a specific +tool.</p></div><div class="section" title="6.2. Controlling the Hadoop Installation"><div class="titlepage"><div><div><h3 class="title"><a name="_controlling_the_hadoop_installation"></a>6.2. Controlling the Hadoop Installation</h3></div></div></div><p>You invoke Sqoop through the program launch capability provided by +Hadoop. The <code class="literal">sqoop</code> command-line program is a wrapper which runs the +<code class="literal">bin/hadoop</code> script shipped with Hadoop. If you have multiple +installations of Hadoop present on your machine, you can select the +Hadoop installation by setting the <code class="literal">$HADOOP_COMMON_HOME</code> and +<code class="literal">$HADOOP_MAPRED_HOME</code> environment variables.</p><p>For example:</p><pre class="screen">$ HADOOP_COMMON_HOME=/path/to/some/hadoop \ + HADOOP_MAPRED_HOME=/path/to/some/hadoop-mapreduce \ + sqoop import --arguments...</pre><p>or:</p><pre class="screen">$ export HADOOP_COMMON_HOME=/some/path/to/hadoop +$ export HADOOP_MAPRED_HOME=/some/path/to/hadoop-mapreduce +$ sqoop import --arguments...</pre><p>If either of these variables are not set, Sqoop will fall back to +<code class="literal">$HADOOP_HOME</code>. If it is not set either, Sqoop will use the default +installation locations for Apache Bigtop, <code class="literal">/usr/lib/hadoop</code> and +<code class="literal">/usr/lib/hadoop-mapreduce</code>, respectively.</p><p>The active Hadoop configuration is loaded from <code class="literal">$HADOOP_HOME/conf/</code>, +unless the <code class="literal">$HADOOP_CONF_DIR</code> environment variable is set.</p></div><div class="section" title="6.3. Using Generic and Specific Arguments"><div class="titlepage"><div><div><h3 class="title"><a name="_using_generic_and_specific_arguments"></a>6.3. Using Generic and Specific Arguments</h3></div></div></div><p>To control the operation of each Sqoop tool, you use generic and +specific arguments.</p><p>For example:</p><pre class="screen">$ sqoop help import +usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS] + +Common arguments: + --connect <jdbc-uri> Specify JDBC connect string + --connect-manager <class-name> Specify connection manager class to use + --driver <class-name> Manually specify JDBC driver class to use + --hadoop-mapred-home <dir> Override $HADOOP_MAPRED_HOME + --help Print usage instructions + --password-file Set path for file containing authentication password + -P Read password from console + --password <password> Set authentication password + --username <username> Set authentication username + --verbose Print more information while working + --hadoop-home <dir> Deprecated. Override $HADOOP_HOME + +[...] + +Generic Hadoop command-line arguments: +(must preceed any tool-specific arguments) +Generic options supported are +-conf <configuration file> specify an application configuration file +-D <property=value> use value for given property +-fs <local|namenode:port> specify a namenode +-jt <local|jobtracker:port> specify a job tracker +-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster +-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. +-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines. + +The general command line syntax is +bin/hadoop command [genericOptions] [commandOptions]</pre><p>You must supply the generic arguments <code class="literal">-conf</code>, <code class="literal">-D</code>, and so on after the +tool name but <span class="strong"><strong>before</strong></span> any tool-specific arguments (such as +<code class="literal">--connect</code>). Note that generic Hadoop arguments are preceeded by a +single dash character (<code class="literal">-</code>), whereas tool-specific arguments start +with two dashes (<code class="literal">--</code>), unless they are single character arguments such as <code class="literal">-P</code>.</p><p>The <code class="literal">-conf</code>, <code class="literal">-D</code>, <code class="literal">-fs</code> and <code class="literal">-jt</code> arguments control the configuration +and Hadoop server settings. For example, the <code class="literal">-D mapred.job.name=<job_name></code> can +be used to set the name of the MR job that Sqoop launches, if not specified, +the name defaults to the jar name for the job - which is derived from the used +table name.</p><p>The <code class="literal">-files</code>, <code class="literal">-libjars</code>, and <code class="literal">-archives</code> arguments are not typically used with +Sqoop, but they are included as part of Hadoop’s internal argument-parsing +system.</p></div><div class="section" title="6.4. Using Options Files to Pass Arguments"><div class="titlepage"><div><div><h3 class="title"><a name="_using_options_files_to_pass_arguments"></a>6.4. Using Options Files to Pass Arguments</h3></div></div></div><p>When using Sqoop, the command line options that do not change from +invocation to invocation can be put in an options file for convenience. +An options file is a text file where each line identifies an option in +the order that it appears otherwise on the command line. Option files +allow specifying a single option on multiple lines by using the +back-slash character at the end of intermediate lines. Also supported +are comments within option files that begin with the hash character. +Comments must be specified on a new line and may not be mixed with +option text. All comments and empty lines are ignored when option +files are expanded. Unless options appear as quoted strings, any +leading or trailing spaces are ignored. Quoted strings if used must +not extend beyond the line on which they are specified.</p><p>Option files can be specified anywhere in the command line as long as +the options within them follow the otherwise prescribed rules of +options ordering. For instance, regardless of where the options are +loaded from, they must follow the ordering such that generic options +appear first, tool specific options next, finally followed by options +that are intended to be passed to child programs.</p><p>To specify an options file, simply create an options file in a +convenient location and pass it to the command line via +<code class="literal">--options-file</code> argument.</p><p>Whenever an options file is specified, it is expanded on the +command line before the tool is invoked. You can specify more than +one option files within the same invocation if needed.</p><p>For example, the following Sqoop invocation for import can +be specified alternatively as shown below:</p><pre class="screen">$ sqoop import --connect jdbc:mysql://localhost/db --username foo --table TEST + +$ sqoop --options-file /users/homer/work/import.txt --table TEST</pre><p>where the options file <code class="literal">/users/homer/work/import.txt</code> contains the following:</p><pre class="screen">import +--connect +jdbc:mysql://localhost/db +--username +foo</pre><p>The options file can have empty lines and comments for readability purposes. +So the above example would work exactly the same if the options file +<code class="literal">/users/homer/work/import.txt</code> contained the following:</p><pre class="screen"># +# Options file for Sqoop import +# + +# Specifies the tool being invoked +import + +# Connect parameter and value +--connect +jdbc:mysql://localhost/db + +# Username parameter and value +--username +foo + +# +# Remaining options should be specified in the command line. +#</pre></div><div class="section" title="6.5. Using Tools"><div class="titlepage"><div><div><h3 class="title"><a name="_using_tools"></a>6.5. Using Tools</h3></div></div></div><p>The following sections will describe each tool’s operation. The +tools are listed in the most likely order you will find them useful.</p></div></div><div class="section" title="7. sqoop-import"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="_literal_sqoop_import_literal"></a>7. <code class="literal">sqoop-import</code></h2></div></div></div><div class="toc"><dl><dt><span class="section"><a href="#_purpose">7.1. Purpose</a></span></dt><dt><span class="section"><a href="#_syntax">7.2. Syntax</a></span></dt><dd><dl><dt><span class="section"><a href="#_connecting_to_a_database_server">7.2.1. Connecting to a Database Server</a></span></dt><dt><span class="section"><a href="#_selecting_the_data_to_import">7.2.2. Selecting the Data to Import</a></span></dt><dt><span class="section"><a href="#_free_form_query_imports">7.2.3. Free-form Query Imports</a></span></dt><dt><span class="section"><a href="#_controlling_parallelism">7.2.4. Controlling Parallelism</a></span></dt><dt><span class="section"><a href="#_controlling_di stributed_cache">7.2.5. Controlling Distributed Cache</a></span></dt><dt><span class="section"><a href="#_controlling_the_import_process">7.2.6. Controlling the Import Process</a></span></dt><dt><span class="section"><a href="#_controlling_transaction_isolation">7.2.7. Controlling transaction isolation</a></span></dt><dt><span class="section"><a href="#_controlling_type_mapping">7.2.8. Controlling type mapping</a></span></dt><dt><span class="section"><a href="#_incremental_imports">7.2.9. Incremental Imports</a></span></dt><dt><span class="section"><a href="#_file_formats">7.2.10. File Formats</a></span></dt><dt><span class="section"><a href="#_large_objects">7.2.11. Large Objects</a></span></dt><dt><span class="section"><a href="#_importing_data_into_hive">7.2.12. Importing Data Into Hive</a></span></dt><dt><span class="section"><a href="#_importing_data_into_hbase">7.2.13. Importing Data Into HBase</a></span></dt><dt><span class="section"><a href="#_importing_data_into_accumulo">7 .2.14. Importing Data Into Accumulo</a></span></dt><dt><span class="section"><a href="#_additional_import_configuration_properties">7.2.15. Additional Import Configuration Properties</a></span></dt></dl></dd><dt><span class="section"><a href="#_example_invocations">7.3. Example Invocations</a></span></dt></dl></div><div class="section" title="7.1. Purpose"><div class="titlepage"><div><div><h3 class="title"><a name="_purpose"></a>7.1. Purpose</h3></div></div></div><p>The <code class="literal">import</code> tool imports an individual table from an RDBMS to HDFS. +Each row from a table is represented as a separate record in HDFS. +Records can be stored as text files (one record per line), or in +binary representation as Avro or SequenceFiles.</p></div><div class="section" title="7.2. Syntax"><div class="titlepage"><div><div><h3 class="title"><a name="_syntax"></a>7.2. Syntax</h3></div></div></div><div class="toc"><dl><dt><span class="section"><a href="#_connecting_to_a_database_server">7.2.1. Connecting to a Database Server</a></span></dt><dt><span class="section"><a href="#_selecting_the_data_to_import">7.2.2. Selecting the Data to Import</a></span></dt><dt><span class="section"><a href="#_free_form_query_imports">7.2.3. Free-form Query Imports</a></span></dt><dt><span class="section"><a href="#_controlling_parallelism">7.2.4. Controlling Parallelism</a></span></dt><dt><span class="section"><a href="#_controlling_distributed_cache">7.2.5. Controlling Distributed Cache</a></span></dt><dt><span class="section"><a href="#_controlling_the_import_process">7.2.6. Controlling the Import Process</a></span></dt><dt><span class="section"><a href="#_controlling_transaction_isolatio n">7.2.7. Controlling transaction isolation</a></span></dt><dt><span class="section"><a href="#_controlling_type_mapping">7.2.8. Controlling type mapping</a></span></dt><dt><span class="section"><a href="#_incremental_imports">7.2.9. Incremental Imports</a></span></dt><dt><span class="section"><a href="#_file_formats">7.2.10. File Formats</a></span></dt><dt><span class="section"><a href="#_large_objects">7.2.11. Large Objects</a></span></dt><dt><span class="section"><a href="#_importing_data_into_hive">7.2.12. Importing Data Into Hive</a></span></dt><dt><span class="section"><a href="#_importing_data_into_hbase">7.2.13. Importing Data Into HBase</a></span></dt><dt><span class="section"><a href="#_importing_data_into_accumulo">7.2.14. Importing Data Into Accumulo</a></span></dt><dt><span class="section"><a href="#_additional_import_configuration_properties">7.2.15. Additional Import Configuration Properties</a></span></dt></dl></div><pre class="screen">$ sqoop import (generic-args) ( import-args) +$ sqoop-import (generic-args) (import-args)</pre><p>While the Hadoop generic arguments must precede any import arguments, +you can type the import arguments in any order with respect to one +another.</p><div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>In this document, arguments are grouped into collections +organized by function. Some collections are present in several tools +(for example, the "common" arguments). An extended description of their +functionality is given only on the first presentation in this +document.</p></td></tr></table></div><div class="table"><a name="idp29176"></a><p class="title"><b>Table 1. Common arguments</b></p><div class="table-contents"><table summary="Common arguments" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; "><colgroup><col align="left"><col align="left"></colgroup><thead><tr><th style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + Argument + </th><th style="border-bottom: 0.5pt solid ; " align="left"> + Description + </th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--connect <jdbc-uri></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Specify JDBC connect string + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--connection-manager <class-name></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Specify connection manager class to use + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--driver <class-name></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Manually specify JDBC driver class to use + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--hadoop-mapred-home <dir></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Override $HADOOP_MAPRED_HOME + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--help</code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Print usage instructions + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--password-file</code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Set path for a file containing the authentication password + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">-P</code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Read password from console + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--password <password></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Set authentication password + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--username <username></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Set authentication username + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--verbose</code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Print more information while working + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--connection-param-file <filename></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Optional properties file that provides connection parameters + </td></tr><tr><td style="border-right: 0.5pt solid ; " align="left"> + <code class="literal">--relaxed-isolation</code> + </td><td style="" align="left"> + Set connection transaction isolation to read uncommitted for the mappers. + </td></tr></tbody></table></div></div><br class="table-break"><div class="section" title="7.2.1. Connecting to a Database Server"><div class="titlepage"><div><div><h4 class="title"><a name="_connecting_to_a_database_server"></a>7.2.1. Connecting to a Database Server</h4></div></div></div><p>Sqoop is designed to import tables from a database into HDFS. To do +so, you must specify a <span class="emphasis"><em>connect string</em></span> that describes how to connect to the +database. The <span class="emphasis"><em>connect string</em></span> is similar to a URL, and is communicated to +Sqoop with the <code class="literal">--connect</code> argument. This describes the server and +database to connect to; it may also specify the port. For example:</p><pre class="screen">$ sqoop import --connect jdbc:mysql://database.example.com/employees</pre><p>This string will connect to a MySQL database named <code class="literal">employees</code> on the +host <code class="literal">database.example.com</code>. It’s important that you <span class="strong"><strong>do not</strong></span> use the URL +<code class="literal">localhost</code> if you intend to use Sqoop with a distributed Hadoop +cluster. The connect string you supply will be used on TaskTracker nodes +throughout your MapReduce cluster; if you specify the +literal name <code class="literal">localhost</code>, each node will connect to a different +database (or more likely, no database at all). Instead, you should use +the full hostname or IP address of the database host that can be seen +by all your remote nodes.</p><p>You might need to authenticate against the database before you can +access it. You can use the <code class="literal">--username</code> to supply a username to the database. +Sqoop provides couple of different ways to supply a password, +secure and non-secure, to the database which is detailed below.</p><p title="Secure way of supplying password to the database"><b>Secure way of supplying password to the database. </b>You should save the password in a file on the users home directory with 400 +permissions and specify the path to that file using the <span class="strong"><strong><code class="literal">--password-file</code></strong></span> +argument, and is the preferred method of entering credentials. Sqoop will +then read the password from the file and pass it to the MapReduce cluster +using secure means with out exposing the password in the job configuration. +The file containing the password can either be on the Local FS or HDFS. +For example:</p><pre class="screen">$ sqoop import --connect jdbc:mysql://database.example.com/employees \ + --username venkatesh --password-file ${user.home}/.password</pre><div class="warning" title="Warning" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Warning"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Warning]" src="images/warning.png"></td><th align="left">Warning</th></tr><tr><td align="left" valign="top"><p>Sqoop will read entire content of the password file and use it as +a password. This will include any trailing white space characters such as +new line characters that are added by default by most of the text editors. +You need to make sure that your password file contains only characters +that belongs to your password. On the command line you can use command +<code class="literal">echo</code> with switch <code class="literal">-n</code> to store password without any trailing white space +characters. For example to store password <code class="literal">secret</code> you would call +<code class="literal">echo -n "secret" > password.file</code>.</p></td></tr></table></div><p>Another way of supplying passwords is using the <code class="literal">-P</code> argument which will +read a password from a console prompt.</p><p title="Protecting password from preying eyes"><b>Protecting password from preying eyes. </b>Hadoop 2.6.0 provides an API to separate password storage from applications. +This API is called the credential provided API and there is a new +<code class="literal">credential</code> command line tool to manage passwords and their aliases. +The passwords are stored with their aliases in a keystore that is password +protected. The keystore password can be the provided to a password prompt +on the command line, via an environment variable or defaulted to a software +defined constant. Please check the Hadoop documentation on the usage +of this facility.</p><p>Once the password is stored using the Credential Provider facility and +the Hadoop configuration has been suitably updated, all applications can +optionally use the alias in place of the actual password and at runtime +resolve the alias for the password to use.</p><p>Since the keystore or similar technology used for storing the credential +provider is shared across components, passwords for various applications, +various database and other passwords can be securely stored in them and only +the alias needs to be exposed in configuration files, protecting the password +from being visible.</p><p>Sqoop has been enhanced to allow usage of this funcionality if it is +available in the underlying Hadoop version being used. One new option +has been introduced to provide the alias on the command line instead of the +actual password (--password-alias). The argument value this option is +the alias on the storage associated with the actual password. +Example usage is as follows:</p><pre class="screen">$ sqoop import --connect jdbc:mysql://database.example.com/employees \ + --username dbuser --password-alias mydb.password.alias</pre><p>Similarly, if the command line option is not preferred, the alias can be saved +in the file provided with --password-file option. Along with this, the +Sqoop configuration parameter org.apache.sqoop.credentials.loader.class +should be set to the classname that provides the alias resolution: +<code class="literal">org.apache.sqoop.util.password.CredentialProviderPasswordLoader</code></p><p>Example usage is as follows (assuming .password.alias has the alias for +the real password) :</p><pre class="screen">$ sqoop import --connect jdbc:mysql://database.example.com/employees \ + --username dbuser --password-file ${user.home}/.password-alias</pre><div class="warning" title="Warning" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Warning"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Warning]" src="images/warning.png"></td><th align="left">Warning</th></tr><tr><td align="left" valign="top"><p>The <code class="literal">--password</code> parameter is insecure, as other users may +be able to read your password from the command-line arguments via +the output of programs such as <code class="literal">ps</code>. The <span class="strong"><strong><code class="literal">-P</code></strong></span> argument is the preferred +method over using the <code class="literal">--password</code> argument. Credentials may still be +transferred between nodes of the MapReduce cluster using insecure means. +For example:</p></td></tr></table></div><pre class="screen">$ sqoop import --connect jdbc:mysql://database.example.com/employees \ + --username aaron --password 12345</pre><p>Sqoop automatically supports several databases, including MySQL. Connect +strings beginning with <code class="literal">jdbc:mysql://</code> are handled automatically in Sqoop. (A +full list of databases with built-in support is provided in the "Supported +Databases" section. For some, you may need to install the JDBC driver +yourself.)</p><p>You can use Sqoop with any other +JDBC-compliant database. First, download the appropriate JDBC +driver for the type of database you want to import, and install the .jar +file in the <code class="literal">$SQOOP_HOME/lib</code> directory on your client machine. (This will +be <code class="literal">/usr/lib/sqoop/lib</code> if you installed from an RPM or Debian package.) +Each driver <code class="literal">.jar</code> file also has a specific driver class which defines +the entry-point to the driver. For example, MySQL’s Connector/J library has +a driver class of <code class="literal">com.mysql.jdbc.Driver</code>. Refer to your database +vendor-specific documentation to determine the main driver class. +This class must be provided as an argument to Sqoop with <code class="literal">--driver</code>.</p><p>For example, to connect to a SQLServer database, first download the driver from +microsoft.com and install it in your Sqoop lib path.</p><p>Then run Sqoop. For example:</p><pre class="screen">$ sqoop import --driver com.microsoft.jdbc.sqlserver.SQLServerDriver \ + --connect <connect-string> ...</pre><p>When connecting to a database using JDBC, you can optionally specify extra +JDBC parameters via a property file using the option +<code class="literal">--connection-param-file</code>. The contents of this file are parsed as standard +Java properties and passed into the driver while creating a connection.</p><div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>The parameters specified via the optional property file are only +applicable to JDBC connections. Any fastpath connectors that use connections +other than JDBC will ignore these parameters.</p></td></tr></table></div><div class="table"><a name="idp3493216"></a><p class="title"><b>Table 2. Validation arguments <a class="link" href="#validation" title="11. validation">More Details</a></b></p><div class="table-contents"><table summary="Validation arguments More Details" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; "><colgroup><col align="left"><col align="left"></colgroup><thead><tr><th style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + Argument + </th><th style="border-bottom: 0.5pt solid ; " align="left"> + Description + </th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--validate</code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Enable validation of data copied, supports single table copy only. + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--validator <class-name></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Specify validator class to use. + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--validation-threshold <class-name></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Specify validation threshold class to use. + </td></tr><tr><td style="border-right: 0.5pt solid ; " align="left"> + <code class="literal">--validation-failurehandler <class-name></code> + </td><td style="" align="left"> + Specify validation failure handler class to use. + </td></tr></tbody></table></div></div><br class="table-break"><div class="table"><a name="idp3507040"></a><p class="title"><b>Table 3. Import control arguments:</b></p><div class="table-contents"><table summary="Import control arguments:" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; "><colgroup><col align="left"><col align="left"></colgroup><thead><tr><th style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + Argument + </th><th style="border-bottom: 0.5pt solid ; " align="left"> + Description + </th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--append</code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Append data to an existing dataset in HDFS + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--as-avrodatafile</code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Imports data to Avro Data Files + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--as-sequencefile</code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Imports data to SequenceFiles + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--as-textfile</code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Imports data as plain text (default) + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--as-parquetfile</code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Imports data to Parquet Files + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--boundary-query <statement></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Boundary query to use for creating splits + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--columns <col,col,col…></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Columns to import from table + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--delete-target-dir</code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Delete the import target directory if it exists + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--direct</code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Use direct connector if exists for the database + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--fetch-size <n></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Number of entries to read from database at once. + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--inline-lob-limit <n></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Set the maximum size for an inline LOB + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">-m,--num-mappers <n></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Use <span class="emphasis"><em>n</em></span> map tasks to import in parallel + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">-e,--query <statement></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Import the results of <span class="emphasis"><em><code class="literal">statement</code></em></span>. + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--split-by <column-name></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Column of the table used to split work units. Cannot be used with <code class="literal">--autoreset-to-one-mapper</code> option. + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--autoreset-to-one-mapper</code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Import should use one mapper if a table has no primary key and no split-by column is provided. Cannot be used with <code class="literal">--split-by <col></code> option. + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--table <table-name></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Table to read + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--target-dir <dir></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + HDFS destination dir + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--warehouse-dir <dir></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + HDFS parent for table destination + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--where <where clause></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + WHERE clause to use during import + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">-z,--compress</code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Enable compression + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--compression-codec <c></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + Use Hadoop codec (default gzip) + </td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left"> + <code class="literal">--null-string <null-string></code> + </td><td style="border-bottom: 0.5pt solid ; " align="left"> + The string to be written for a null value for string columns + </td></tr><tr><td style="border-right: 0.5pt solid ; " align="left"> + <code class="literal">--null-non-string <null-string></code> + </td><td style="" align="left"> + The string to be written for a null value for non-string columns + </td></tr></tbody></table></div></div><br class="table-break"><p>The <code class="literal">--null-string</code> and <code class="literal">--null-non-string</code> arguments are optional.\ +If not specified, then the string "null" will be used.</p></div><div class="section" title="7.2.2. Selecting the Data to Import"><div class="titlepage"><div><div><h4 class="title"><a name="_selecting_the_data_to_import"></a>7.2.2. Selecting the Data to Import</h4></div></div></div><p>Sqoop typically imports data in a table-centric fashion. Use the +<code class="literal">--table</code> argument to select the table to import. For example, <code class="literal">--table +employees</code>. This argument can also identify a <code class="literal">VIEW</code> or other table-like +entity in a database.</p><p>By default, all columns within a table are selected for import. +Imported data is written to HDFS in its "natural order;" that is, a +table containing columns A, B, and C result in an import of data such +as:</p><pre class="screen">A1,B1,C1 +A2,B2,C2 +...</pre><p>You can select a subset of columns and control their ordering by using +the <code class="literal">--columns</code> argument. This should include a comma-delimited list +of columns to import. For example: <code class="literal">--columns "name,employee_id,jobtitle"</code>.</p><p>You can control which rows are imported by adding a SQL <code class="literal">WHERE</code> clause +to the import statement. By default, Sqoop generates statements of the +form <code class="literal">SELECT <column list> FROM <table name></code>. You can append a +<code class="literal">WHERE</code> clause to this with the <code class="literal">--where</code> argument. For example: <code class="literal">--where +"id > 400"</code>. Only rows where the <code class="literal">id</code> column has a value greater than +400 will be imported.</p><p>By default sqoop will use query <code class="literal">select min(<split-by>), max(<split-by>) from +<table name></code> to find out boundaries for creating splits. In some cases this query +is not the most optimal so you can specify any arbitrary query returning two +numeric columns using <code class="literal">--boundary-query</code> argument.</p></div><div class="section" title="7.2.3. Free-form Query Imports"><div class="titlepage"><div><div><h4 class="title"><a name="_free_form_query_imports"></a>7.2.3. Free-form Query Imports</h4></div></div></div><p>Sqoop can also import the result set of an arbitrary SQL query. Instead of +using the <code class="literal">--table</code>, <code class="literal">--columns</code> and <code class="literal">--where</code> arguments, you can specify +a SQL statement with the <code class="literal">--query</code> argument.</p><p>When importing a free-form query, you must specify a destination directory +with <code class="literal">--target-dir</code>.</p><p>If you want to import the results of a query in parallel, then each map task +will need to execute a copy of the query, with results partitioned by bounding +conditions inferred by Sqoop. Your query must include the token <code class="literal">$CONDITIONS</code> +which each Sqoop process will replace with a unique condition expression. +You must also select a splitting column with <code class="literal">--split-by</code>.</p><p>For example:</p><pre class="screen">$ sqoop import \ + --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \ + --split-by a.id --target-dir /user/foo/joinresults</pre><p>Alternately, the query can be executed once and imported serially, by +specifying a single map task with <code class="literal">-m 1</code>:</p><pre class="screen">$ sqoop import \ + --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \ + -m 1 --target-dir /user/foo/joinresults</pre><div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>If you are issuing the query wrapped with double quotes ("), +you will have to use <code class="literal">\$CONDITIONS</code> instead of just <code class="literal">$CONDITIONS</code> +to disallow your shell from treating it as a shell variable. +For example, a double quoted query may look like: +<code class="literal">"SELECT * FROM x WHERE a='foo' AND \$CONDITIONS"</code></p></td></tr></table></div><div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p>The facility of using free-form query in the current version of Sqoop +is limited to simple queries where there are no ambiguous projections and +no <code class="literal">OR</code> conditions in the <code class="literal">WHERE</code> clause. Use of complex queries such as +queries that have sub-queries or joins leading to ambiguous projections can +lead to unexpected results.</p></td></tr></table></div></div><div class="section" title="7.2.4. Controlling Parallelism"><div class="titlepage"><div><div><h4 class="title"><a name="_controlling_parallelism"></a>7.2.4. Controlling Parallelism</h4></div></div></div><p>Sqoop imports data in parallel from most database sources. You can +specify the number +of map tasks (parallel processes) to use to perform the import by +using the <code class="literal">-m</code> or <code class="literal">--num-mappers</code> argument. Each of these arguments +takes an integer value which corresponds to the degree of parallelism +to employ. By default, four tasks are used. Some databases may see +improved performance by increasing this value to 8 or 16. Do not +increase the degree of parallelism greater than that available within +your MapReduce cluster; tasks will run serially and will likely +increase the amount of time required to perform the import. Likewise, +do not increase the degree of parallism higher than that which your +database can reasonably support. Connecting 100 concurrent clients to +your database may increase the load on the database server to a point +where performance suffers as a result.</p><p>When performing parallel imports, Sqoop needs a criterion by which it +can split the workload. Sqoop uses a <span class="emphasis"><em>splitting column</em></span> to split the +workload. By default, Sqoop will identify the primary key column (if +present) in a table and use it as the splitting column. The low and +high values for the splitting column are retrieved from the database, +and the map tasks operate on evenly-sized components of the total +range. For example, if you had a table with a primary key column of +<code class="literal">id</code> whose minimum value was 0 and maximum value was 1000, and Sqoop +was directed to use 4 tasks, Sqoop would run four processes which each +execute SQL statements of the form <code class="literal">SELECT * FROM sometable WHERE id +>= lo AND id < hi</code>, with <code class="literal">(lo, hi)</code> set to (0, 250), (250, 500), +(500, 750), and (750, 1001) in the different tasks.</p><p>If the actual values for the primary key are not uniformly distributed +across its range, then this can result in unbalanced tasks. You should +explicitly choose a different column with the <code class="literal">--split-by</code> argument. +For example, <code class="literal">--split-by employee_id</code>. Sqoop cannot currently split on +multi-column indices. If your table has no index column, or has a +multi-column key, then you must also manually choose a splitting +column.</p><p>If a table does not have a primary key defined and the <code class="literal">--split-by <col></code> +is not provided, then import will fail unless the number +of mappers is explicitly set to one with the <code class="literal">--num-mappers 1</code> option +or the <code class="literal">--autoreset-to-one-mapper</code> option is used. The option +<code class="literal">--autoreset-to-one-mapper</code> is typically used with the import-all-tables +tool to automatically handle tables without a primary key in a schema.</p></div><div class="section" title="7.2.5. Controlling Distributed Cache"><div class="titlepage"><div><div><h4 class="title"><a name="_controlling_distributed_cache"></a>7.2.5. Controlling Distributed Cache</h4></div></div></div><p>Sqoop will copy the jars in $SQOOP_HOME/lib folder to job cache every +time when start a Sqoop job. When launched by Oozie this is unnecessary +since Oozie use its own Sqoop share lib which keeps Sqoop dependencies +in the distributed cache. Oozie will do the localization on each +worker node for the Sqoop dependencies only once during the first Sqoop +job and reuse the jars on worker node for subsquencial jobs. Using +option <code class="literal">--skip-dist-cache</code> in Sqoop command when launched by Oozie will +skip the step which Sqoop copies its dependencies to job cache and save +massive I/O.</p></div><div class="section" title="7.2.6. Controlling the Import Process"><div class="titlepage"><div><div><h4 class="title"><a name="_controlling_the_import_process"></a>7.2.6. Controlling the Import Process</h4></div></div></div><p>By default, the import process will use JDBC which provides a +reasonable cross-vendor import channel. Some databases can perform +imports in a more high-performance fashion by using database-specific +data movement tools. For example, MySQL provides the <code class="literal">mysqldump</code> tool +which can export data from MySQL to other systems very quickly. By +supplying the <code class="literal">--direct</code> argument, you are specifying that Sqoop +should attempt the direct import channel. This channel may be +higher performance than using JDBC.</p><p>Details about use of direct mode with each specific RDBMS, installation requirements, available +options and limitations can be found in <a class="xref" href="#connectors" title="25. Notes for specific connectors">Section 25, “Notes for specific connectors”</a>.</p><p>By default, Sqoop will import a table named <code class="literal">foo</code> to a directory named +<code class="literal">foo</code> inside your home directory in HDFS. For example, if your +username is <code class="literal">someuser</code>, then the import tool will write to +<code class="literal">/user/someuser/foo/(files)</code>. You can adjust the parent directory of +the import with the <code class="literal">--warehouse-dir</code> argument. For example:</p><pre class="screen">$ sqoop import --connnect <connect-str> --table foo --warehouse-dir /shared \ + ...</pre><p>This command would write to a set of files in the <code class="literal">/shared/foo/</code> directory.</p><p>You can also explicitly choose the target directory, like so:</p><pre class="screen">$ sqoop import --connnect <connect-str> --table foo --target-dir /dest \ + ...</pre><p>This will import the files into the <code class="literal">/dest</code> directory. <code class="literal">--target-dir</code> is +incompatible with <code class="literal">--warehouse-dir</code>.</p><p>When using direct mode, you can specify additional arguments which +should be passed to the underlying tool. If the argument +<code class="literal">--</code> is given on the command-line, then subsequent arguments are sent +directly to the underlying tool. For example, the following adjusts +the character set used by <code class="literal">mysqldump</code>:</p><pre class="screen">$ sqoop import --connect jdbc:mysql://server.foo.com/db --table bar \ + --direct -- --default-character-set=latin1</pre><p>By default, imports go to a new target location. If the destination directory +already exists in HDFS, Sqoop will refuse to import and overwrite that +directory’s contents. If you use the <code class="literal">--append</code> argument, Sqoop will import +data to a temporary directory and then rename the files into the normal +target directory in a manner that does not conflict with existing filenames +in that directory.</p></div><div class="section" title="7.2.7. Controlling transaction isolation"><div class="titlepage"><div><div><h4 class="title"><a name="_controlling_transaction_isolation"></a>7.2.7. Controlling transaction isolation</h4></div></div></div><p>By default, Sqoop uses the read committed transaction isolation in the mappers +to import data. This may not be the ideal in all ETL workflows and it may +desired to reduce the isolation guarantees. The <code class="literal">--relaxed-isolation</code> option +can be used to instruct Sqoop to use read uncommitted isolation level.</p><p>The <code class="literal">read-uncommitted</code> isolation level is not supported on all databases +(for example, Oracle), so specifying the option <code class="literal">--relaxed-isolation</code> +may not be supported on all databases.</p></div><div class="section" title="7.2.8. Controlling type mapping"><div class="titlepage"><div><div><h4 class="title"><a name="_controlling_type_mapping"></a>7.2.8. Controlling type mapping</h4></div></div></div><p>Sqoop is preconfigured to map most SQL types to appropriate Java or Hive +representatives. However the default mapping might not be suitable for +everyone and might be overridden by <code class="literal">--map-column-java</code> (for changing +mapping to Java) or <code class="literal">--map-column-hive</code> (for changing Hive mapping).</p><div class="table"><a name="idp3579352"></a><p class="title"><b>Table 4. Parameters for overriding mapping</b></p><div class="table-contents"><table summary="Parameters for overriding mapping" style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; "><colgroup><col align="left"><col align="left"></colgroup><thead><tr><th style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; " align="left">
[... 4007 lines stripped ...]
