[ 
https://issues.apache.org/jira/browse/SQOOP-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209410#comment-14209410
 ] 

Qian Xu commented on SQOOP-1719:
--------------------------------

As the name implies, {{Matcher}} is actually used to determine which schema 
should be used for the input.

Good point [~vybs]] There should be a real validator to check schema 
compatibility for incremental import.

> Schema Validation Rules between From and To Schema
> --------------------------------------------------
>
>                 Key: SQOOP-1719
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1719
>             Project: Sqoop
>          Issue Type: New Feature
>          Components: sqoop2-framework
>            Reporter: Veena Basavaraj
>            Assignee: Veena Basavaraj
>             Fix For: 1.99.5
>
>
> Today we have a Matcher code that checks for existence of atleast one schema.
> {code}
> public Matcher(Schema fromSchema, Schema toSchema) {
>     if (fromSchema.isEmpty() && toSchema.isEmpty()) {
>       throw new SqoopException(MatcherError.MATCHER_0000, "Neither a FROM or 
> TO schemas been provided.");
>     } else if (toSchema.isEmpty()) {
>       this.fromSchema = fromSchema;
>       this.toSchema = fromSchema;
>     } else if (fromSchema.isEmpty()) {
>       this.fromSchema = toSchema;
>       this.toSchema = toSchema;
>     } else {
>       this.fromSchema = fromSchema;
>       this.toSchema = toSchema;
>     }
>   }
> {code}
> if both exist, then in addition to this we need to validate that they both 
> are compatible.
> For instance if we have From schema with a one column of type String and then 
> a To schema with one column of type INTEGER, then we should warn/ fail to 
> even start the JOB since it might not be recommended . These validation rules 
> are not documented in Sqoop and if implemented should be configurable if 
> possible externally per job.
> Second, such validation should happen before the job is submitted. But for 
> that we need to get the schemas. so It may not be not be possible to avoid 
> starting the job.
> NOTE: In 1.99.5 we do not yet support transformations, hence the schema's for 
> the FROM and TO are static, i,e there is no way during the job execution for 
> the TO source to tell that it would like to store the varchar data as binary. 
>  If the FROM source has the "varchar" type, we will validate that the "TO" 
> source is storing this in varchar or a suitable compatible type in the "TO" 
> data source as well.
> If we allow transformation layer in between post 1.99.5 then we potentially 
> can have the schema change during the job execution phase i.e have dynamic 
> schema's created per job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to