[ 
https://issues.apache.org/jira/browse/GOBBLIN-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296746#comment-16296746
 ] 

Tilak Patidar commented on GOBBLIN-249:
---------------------------------------

Here is the PR
[https://github.com/apache/incubator-gobblin/pull/2221|https://github.com/apache/incubator-gobblin/pull/2221]

> Documenting source schema specification
> ---------------------------------------
>
>                 Key: GOBBLIN-249
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-249
>             Project: Apache Gobblin
>          Issue Type: Wish
>          Components: gobblin-core
>            Reporter: Tilak Patidar
>            Assignee: Abhishek Tiwari
>
> Various converters are using the source.schema value to convert source record 
> into respective data formats with support for data types both primitive and 
> complex. It seems like we should write down a specification for defining a 
> source.schema. The specification should include instructions on:
> * Converters and their use case <Source, Target>.
> * Converters and the data types supported by them.
> * List of data types and their properties.
> * Examples of writing schema both nested and simple.
> * List of configuration values used by converters.
> * List of various options available for defining the schema of a field. 
> (size, nullable etc)
> This source.schema would act as an abstraction over the underlying schemas 
> and data types of different formats such as Avro, Parquet, ORC etc. The user 
> will define the source.schema adhering to our specification and can convert 
> and write to different data format without worrying about target data format 
> schema.
> Data type abstraction
> For example, Parquet does not have MAP type, but a map can be created by 
> using a repeatable group in parquet. If the user defines a MAP on source 
> schema we can do the necessary conversion and provide him with a MAP like 
> structure in Parquet. In this way, the user is freed from the concern of type 
> conversion and target schema. And maybe the converters can be made a separate 
> module acting as conversion library for different data formats.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to