[
https://issues.apache.org/jira/browse/SQOOP-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075452#comment-14075452
]
Gwen Shapira commented on SQOOP-1378:
-------------------------------------
Current thoughts are:
* Every connector needs to support getSchema and return a schema object with
collection of columns. These can be defined in any way that makes sense to
those writing the connector. (I.e. HDFS schema can be a single column
representing a record).
* Users should also be able to define a job with a fromSchema, toSchema and a
transformation (as part of the connector and framework forms). User schemas can
be defined in JSON (we already support loading schema from JSON), and the
transformations a triads:
{ toColumn: column name,
fromColumn: XPATH expression describing how to get the value of the column
from the fromSchema,
cast: optional explicit data type casting
}
* Users either supply both schemas and a transformation, or nothing at all and
we'll use defaults (matching by column names? dumping entire row as text to a
single record? We need to figure out sensible defaults and who controls them)
* If users supply schemas and transformations we should be able to run some
validation and confirm they make sense. This should happen when a job is first
defined.
> Sqoop2: From/To: Refactor schema
> --------------------------------
>
> Key: SQOOP-1378
> URL: https://issues.apache.org/jira/browse/SQOOP-1378
> Project: Sqoop
> Issue Type: Sub-task
> Reporter: Abraham Elmahrek
> Assignee: Gwen Shapira
>
> Relational database systems, hierarchical databases, etc. tend to have a well
> defined schema. Key-value DBs, BigTable clones, etc. tend to have weakly
> defined schemas. In fact, a key-value datastore may not have any kind of
> schema (other than the fact is is key-value).
> Schemas seem like they are local to the connector and should not be needed by
> the framework. Or, there should be a common Schema format that every
> connector knows how to decipher.
--
This message was sent by Atlassian JIRA
(v6.2#6252)