[
https://issues.apache.org/jira/browse/SQOOP-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Qian Xu updated SQOOP-1588:
---------------------------
Description:
Create a basic Kite connector that can write data (i.e. from a jdbc connection)
to HDFS.
The scope is defined as follows:
- Destination: HDFS
- File Format: Avro Parquet and CSV.
- Compression Codec: Use default
- Partitioner Strategy: Not supported
- Column Mapping: Not supported
Exposed Configuration:
- [Link] File Format (Enum)
- [To] Dataset URI (String, has a validation check)
Workflow:
- Create a link to Kite Connector
- Create a job with valid configuration (see above)
- Start a job {{KiteToInitializer}} will check dataset existence
- Sqoop will create N {{KiteLoader}} instances.
- Kite requires an Avro schema for data manipulation, {{KiteLoader}} will
create an Avro schema from Sqoop schema provided by {{LoaderContext}}. As Sqoop
schema types are not identical to Avro types, some types will be mapped. The
original Sqoop type information will be kept as {{SqoopType}} in schema field,
which can be used for a reversed type mapping.
- {{KiteLoader}} will create a temporary dataset and writes data records into
it. If any error occurs, the dataset will be deleted.
- {{KiteToDestroy}} will merge all temporary datasets as one dataset.
Further features will be implemented in follow-up JIRAs.
was:
Create a basic Kite connector that can write data (i.e. from a jdbc connection)
to HDFS.
The scope is defined as follows:
- Destination: HDFS
- File Format: Avro Parquet and CSV.
- Compression Codec: Use default
- Partitioner Strategy: Not supported
- Column Mapping: Not supported
Exposed Configuration:
- [Link] File Format (Enum)
- [To] Dataset URI (String, has a validation check)
Workflow:
- Create a link to Kite Connector
- Create a job with valid configuration (see above)
- Start a job
- {{KiteToInitializer}} will check dataset existence
- Sqoop will create N {{KiteLoader}} instances.
- {{KiteLoader}} will create an Avro schema regarding the FROM-schema (sorry,
at runtime). As Schema types are not identical to Avro types, a type mapping
will happen in place. Original Sqoop type will be described in the Avro schema,
which can be used for reversed type mapping for data export.
- {{KiteLoader}} will create a temporary dataset and writes allocated data
records to it. In case of any error, the dataset will be deleted.
- {{KiteToDestroy}} will merge all temporary datasets to be one dataset.
Further features will be implemented in follow-up JIRAs.
> TO-side: Write data to HDFS
> ---------------------------
>
> Key: SQOOP-1588
> URL: https://issues.apache.org/jira/browse/SQOOP-1588
> Project: Sqoop
> Issue Type: Sub-task
> Components: connectors
> Reporter: Qian Xu
> Assignee: Qian Xu
>
> Create a basic Kite connector that can write data (i.e. from a jdbc
> connection) to HDFS.
> The scope is defined as follows:
> - Destination: HDFS
> - File Format: Avro Parquet and CSV.
> - Compression Codec: Use default
> - Partitioner Strategy: Not supported
> - Column Mapping: Not supported
> Exposed Configuration:
> - [Link] File Format (Enum)
> - [To] Dataset URI (String, has a validation check)
> Workflow:
> - Create a link to Kite Connector
> - Create a job with valid configuration (see above)
> - Start a job {{KiteToInitializer}} will check dataset existence
> - Sqoop will create N {{KiteLoader}} instances.
> - Kite requires an Avro schema for data manipulation, {{KiteLoader}} will
> create an Avro schema from Sqoop schema provided by {{LoaderContext}}. As
> Sqoop schema types are not identical to Avro types, some types will be
> mapped. The original Sqoop type information will be kept as {{SqoopType}} in
> schema field, which can be used for a reversed type mapping.
> - {{KiteLoader}} will create a temporary dataset and writes data records into
> it. If any error occurs, the dataset will be deleted.
> - {{KiteToDestroy}} will merge all temporary datasets as one dataset.
> Further features will be implemented in follow-up JIRAs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)