[ 
https://issues.apache.org/jira/browse/SQOOP-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075452#comment-14075452
 ] 

Gwen Shapira commented on SQOOP-1378:
-------------------------------------

Current thoughts are:

* Every connector needs to support getSchema and return a schema object with 
collection of columns. These can be defined in any way that makes sense to 
those writing the connector. (I.e. HDFS schema can be a single column 
representing a record).

* Users should also be able to define a job with a fromSchema, toSchema and a 
transformation (as part of the connector and framework forms). User schemas can 
be defined in JSON (we already support loading schema from JSON), and the 
transformations a triads:
{ toColumn: column name,
  fromColumn: XPATH expression describing how to get the value of the column 
from the fromSchema,
  cast: optional explicit data type casting 
}

* Users either supply both schemas and a transformation, or nothing at all and 
we'll use defaults (matching by column names? dumping entire row as text to a 
single record? We need to figure out sensible defaults and who controls them)

* If users supply schemas and transformations we should be able to run some 
validation and confirm they make sense. This should happen when a job is first 
defined.

> Sqoop2: From/To: Refactor schema
> --------------------------------
>
>                 Key: SQOOP-1378
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1378
>             Project: Sqoop
>          Issue Type: Sub-task
>            Reporter: Abraham Elmahrek
>            Assignee: Gwen Shapira
>
> Relational database systems, hierarchical databases, etc. tend to have a well 
> defined schema. Key-value DBs, BigTable clones, etc. tend to have weakly 
> defined schemas. In fact, a key-value datastore may not have any kind of 
> schema (other than the fact is is key-value).
> Schemas seem like they are local to the connector and should not be needed by 
> the framework. Or, there should be a common Schema format that every 
> connector knows how to decipher.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to