[
https://issues.apache.org/jira/browse/GOBBLIN-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296746#comment-16296746
]
Tilak Patidar commented on GOBBLIN-249:
---------------------------------------
Here is the PR
[https://github.com/apache/incubator-gobblin/pull/2221|https://github.com/apache/incubator-gobblin/pull/2221]
> Documenting source schema specification
> ---------------------------------------
>
> Key: GOBBLIN-249
> URL: https://issues.apache.org/jira/browse/GOBBLIN-249
> Project: Apache Gobblin
> Issue Type: Wish
> Components: gobblin-core
> Reporter: Tilak Patidar
> Assignee: Abhishek Tiwari
>
> Various converters are using the source.schema value to convert source record
> into respective data formats with support for data types both primitive and
> complex. It seems like we should write down a specification for defining a
> source.schema. The specification should include instructions on:
> * Converters and their use case <Source, Target>.
> * Converters and the data types supported by them.
> * List of data types and their properties.
> * Examples of writing schema both nested and simple.
> * List of configuration values used by converters.
> * List of various options available for defining the schema of a field.
> (size, nullable etc)
> This source.schema would act as an abstraction over the underlying schemas
> and data types of different formats such as Avro, Parquet, ORC etc. The user
> will define the source.schema adhering to our specification and can convert
> and write to different data format without worrying about target data format
> schema.
> Data type abstraction
> For example, Parquet does not have MAP type, but a map can be created by
> using a repeatable group in parquet. If the user defines a MAP on source
> schema we can do the necessary conversion and provide him with a MAP like
> structure in Parquet. In this way, the user is freed from the concern of type
> conversion and target schema. And maybe the converters can be made a separate
> module acting as conversion library for different data formats.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)