Ruslan Dautkhanov created SQOOP-2921:
----------------------------------------
Summary: support for nested data types
Key: SQOOP-2921
URL: https://issues.apache.org/jira/browse/SQOOP-2921
Project: Sqoop
Issue Type: New Feature
Components: build, codegen, connectors
Affects Versions: 1.4.5, 2.0.0, 1.99.7
Reporter: Ruslan Dautkhanov
It would be great if sqoop export and sqoop import would support
exporting and importing nested collections natively.
For example, Oracle supports nested data types directly, e.g.:
http://www.orafaq.com/wiki/NESTED_TABLE
Hive/Impala/Spark also support nested collection data types, i.e. in Hive -
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-ComplexTypeConstructors
We currently have to export base table, and then create for each nested
collection staging tables in Hive, then sqoop all of them separately.
A)
- At best, it would be great if sqoop could export base table and nested
collections at once;
B)
- As a minimum, it would be awesome if sqoop could at least export a given one
nested collection plus a few columns from the base table, e.g:
Let's say we have following table
{quote}TABLE client_transactions
(
.. .
, client_int int4
, first_name text
, transactions array<struct<trans_date:timestamp,trans_amount:int4>>
, web_vists array<struct<page:text,visits:int4>>
, .. .
)
stored as parquet;{quote}
Then for the "B" functionality (minimal support for nested structures), we
could call sqoop as e.g.:
{quote}sqoop export ... \
--nested-collection transactions
--columns client_int
{quote}
so it would flatten nested collection "transactions" to a set of following
columns: client_int, trans_date, trans_amount and sqoop as a regular table 's
dataset.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)