Repository: sqoop Updated Branches: refs/heads/trunk a7f5e0d29 -> d57f9fb06
SQOOP-3293: Document SQOOP-2976 (Fero Szabo by Szabolcs Vasas) Project: http://git-wip-us.apache.org/repos/asf/sqoop/repo Commit: http://git-wip-us.apache.org/repos/asf/sqoop/commit/d57f9fb0 Tree: http://git-wip-us.apache.org/repos/asf/sqoop/tree/d57f9fb0 Diff: http://git-wip-us.apache.org/repos/asf/sqoop/diff/d57f9fb0 Branch: refs/heads/trunk Commit: d57f9fb06b55650adc75cd1972df0024d7e4dba1 Parents: a7f5e0d Author: Szabolcs Vasas <va...@apache.org> Authored: Wed Mar 21 15:51:02 2018 +0100 Committer: Szabolcs Vasas <va...@apache.org> Committed: Wed Mar 21 15:51:46 2018 +0100 ---------------------------------------------------------------------- COMPILING.txt | 10 ++++++++++ src/docs/user/import.txt | 34 ++++++++++++++++++++++++++++++++-- 2 files changed, 42 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/sqoop/blob/d57f9fb0/COMPILING.txt ---------------------------------------------------------------------- diff --git a/COMPILING.txt b/COMPILING.txt index 86be509..3b82250 100644 --- a/COMPILING.txt +++ b/COMPILING.txt @@ -411,3 +411,13 @@ To switch back to the previous version of Hadoop 0.20, for example, run: ++++ ant test -Dhadoopversion=20 ++++ + +== Building the documentation + +Building the documentation requires that you have toxml installed. +Also, one needs to set the XML_CATALOG_FILES environment variable. + +++++ +export XML_CATALOG_FILES=/usr/local/etc/xml/catalog +ant docs +++++ http://git-wip-us.apache.org/repos/asf/sqoop/blob/d57f9fb0/src/docs/user/import.txt ---------------------------------------------------------------------- diff --git a/src/docs/user/import.txt b/src/docs/user/import.txt index 330d544..e91a5a8 100644 --- a/src/docs/user/import.txt +++ b/src/docs/user/import.txt @@ -257,7 +257,7 @@ username is +someuser+, then the import tool will write to the import with the +\--warehouse-dir+ argument. For example: ---- -$ sqoop import --connnect <connect-str> --table foo --warehouse-dir /shared \ +$ sqoop import --connect <connect-str> --table foo --warehouse-dir /shared \ ... ---- @@ -266,7 +266,7 @@ This command would write to a set of files in the +/shared/foo/+ directory. You can also explicitly choose the target directory, like so: ---- -$ sqoop import --connnect <connect-str> --table foo --target-dir /dest \ +$ sqoop import --connect <connect-str> --table foo --target-dir /dest \ ... ---- @@ -444,6 +444,27 @@ argument, or specify any Hadoop compression codec using the +\--compression-codec+ argument. This applies to SequenceFile, text, and Avro files. +Enabling Logical Types in Avro and Parquet import for numbers +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +To enable the use of logical types in Sqoop's avro schema generation, +i.e. used during both avro and parquet imports, one has to use the +sqoop.avro.logical_types.decimal.enable flag. This is necessary if one +wants to store values as decimals in the avro file format. + +Padding number types in avro import +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Certain databases, such as Oracle and Postgres store number and decimal +values without padding. For example 1.5 in a column declared +as NUMBER (20,5) is stored as is in Oracle, while the equivalent +DECIMAL (20, 5) is stored as 1.50000 in an SQL server instance. +This leads to a scale mismatch during avro import. + +To avoid this error, one can use the sqoop.avro.decimal_padding.enable flag +to turn on padding with 0s. This flag has to be used together with the +sqoop.avro.logical_types.decimal.enable flag set to true. + Large Objects ^^^^^^^^^^^^^ @@ -777,3 +798,12 @@ rows copied into HDFS: $ sqoop import --connect jdbc:mysql://db.foo.com/corp \ --table EMPLOYEES --validate ---- + +Enabling logical types in avro import and also turning on padding with 0s: + +---- +$ sqoop import -Dsqoop.avro.decimal_padding.enable=true -Dsqoop.avro.logical_types.decimal.enable=true + --connect $CON --username $USER --password $PASS --query "select * from table_name where \$CONDITIONS" + --target-dir hdfs://nameservice1//etl/target_path --as-avrodatafile --verbose -m 1 + +----