Ruslan Dautkhanov created SQOOP-2921:
----------------------------------------

             Summary: support for nested data types
                 Key: SQOOP-2921
                 URL: https://issues.apache.org/jira/browse/SQOOP-2921
             Project: Sqoop
          Issue Type: New Feature
          Components: build, codegen, connectors
    Affects Versions: 1.4.5, 2.0.0, 1.99.7
            Reporter: Ruslan Dautkhanov


It would be great if sqoop export and sqoop import would support 
exporting and importing nested collections natively.

For example, Oracle supports nested data types directly, e.g.:
http://www.orafaq.com/wiki/NESTED_TABLE

Hive/Impala/Spark also support nested collection data types, i.e. in Hive - 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-ComplexTypeConstructors

We currently have to export base table, and then create for each nested 
collection staging tables in Hive, then sqoop all of them separately.

A)
- At best, it would be great if sqoop could export base table and nested 
collections at once;

B)
- As a minimum, it would be awesome if sqoop could at least export a given one 
nested collection plus a few columns from the base table, e.g:

Let's say we have following table

{quote}TABLE client_transactions
(
.. .
, client_int int4
, first_name text
, transactions array<struct<trans_date:timestamp,trans_amount:int4>>
, web_vists array<struct<page:text,visits:int4>>
, .. .
)
stored as parquet;{quote}

Then for the "B" functionality (minimal support for nested structures), we 
could call sqoop as e.g.:

{quote}sqoop export ... \
   --nested-collection transactions 
   --columns client_int 
{quote}
so it would flatten nested collection "transactions" to a set of following 
columns: client_int, trans_date, trans_amount and sqoop as a regular table 's 
dataset.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to