[
https://issues.apache.org/jira/browse/SQOOP-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Veena Basavaraj updated SQOOP-1719:
-----------------------------------
Fix Version/s: (was: 1.99.5)
2.0.0
> Schema Validation Rules between From and To Schema
> --------------------------------------------------
>
> Key: SQOOP-1719
> URL: https://issues.apache.org/jira/browse/SQOOP-1719
> Project: Sqoop
> Issue Type: New Feature
> Components: sqoop2-framework
> Reporter: Veena Basavaraj
> Assignee: Veena Basavaraj
> Fix For: 2.0.0
>
>
> Today we have a Matcher code that checks for existence of atleast one schema.
> {code}
> public Matcher(Schema fromSchema, Schema toSchema) {
> if (fromSchema.isEmpty() && toSchema.isEmpty()) {
> throw new SqoopException(MatcherError.MATCHER_0000, "Neither a FROM or
> TO schemas been provided.");
> } else if (toSchema.isEmpty()) {
> this.fromSchema = fromSchema;
> this.toSchema = fromSchema;
> } else if (fromSchema.isEmpty()) {
> this.fromSchema = toSchema;
> this.toSchema = toSchema;
> } else {
> this.fromSchema = fromSchema;
> this.toSchema = toSchema;
> }
> }
> {code}
> if both exist, then in addition to this we need to validate that they both
> are compatible.
> Today we have some logic around matchers to use based on the presense and
> absence of the from and to schemas
> {code}
> public class MatcherFactory {
> public static Matcher getMatcher(Schema fromSchema, Schema toSchema) {
> if (toSchema.isEmpty() || fromSchema.isEmpty()) {
> return new LocationMatcher(fromSchema, toSchema);
> } else {
> return new NameMatcher(fromSchema, toSchema);
> }
> }
> }
> {code}
> But the above can be extended to further elaborate the rules and the order in
> while these rules will and should be applied. Having this in Sqoop internals
> means we better have a good story on how schema matching works
> For instance if we have From schema with a one column of type String and then
> a To schema with one column of type INTEGER, then we should warn/ fail to
> even start the JOB since it might not be recommended . These validation rules
> are not documented in Sqoop and if implemented should be configurable if
> possible externally per job.
> Second, such validation should happen before the job is submitted. But for
> that we need to get the schemas. so It may not be not be possible to avoid
> starting the job.
> NOTE: In 1.99.5 we do not yet support transformations, hence the schema's for
> the FROM and TO are static, i,e there is no way during the job execution for
> the TO source to tell that it would like to store the varchar data as binary.
> If the FROM source has the "varchar" type, we will validate that the "TO"
> source is storing this in varchar or a suitable compatible type in the "TO"
> data source as well.
> If we allow transformation layer in between post 1.99.5 then we potentially
> can have the schema change during the job execution phase i.e have dynamic
> schema's created per job
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)