[
https://issues.apache.org/jira/browse/SQOOP-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14210081#comment-14210081
]
Veena Basavaraj edited comment on SQOOP-1719 at 11/13/14 6:05 PM:
------------------------------------------------------------------
validation is also required without the incremental import. else we dont warn
the users about the data loss that can occur with the type conversion between
FROM and TO
Matcher as a name is not the point of contention. The rules of which matcher to
use are not comprehensive to all the use cases we can encounter, this certainly
needs more thought.
Also more importantly we should think about how early this matching process /
validation can happen. Do we need to do it in the SqoopMapper after starting
the job? I would rather have another step in this chain where schema
negotation/ validation is done. So custom matchers can be given by the
connector developers as well.
was (Author: vybs):
validation is also required without the incremental import. else we dont warn
the users about the data loss that can occur with the type conversion between
FROM and TO
Matcher as a name is not the point of contention. The rules of which matcher to
use are not comprehensive to all the use cases we can encounter, this certainly
needs more thought.
> Schema Validation Rules between From and To Schema
> --------------------------------------------------
>
> Key: SQOOP-1719
> URL: https://issues.apache.org/jira/browse/SQOOP-1719
> Project: Sqoop
> Issue Type: New Feature
> Components: sqoop2-framework
> Reporter: Veena Basavaraj
> Assignee: Veena Basavaraj
> Fix For: 1.99.5
>
>
> Today we have a Matcher code that checks for existence of atleast one schema.
> {code}
> public Matcher(Schema fromSchema, Schema toSchema) {
> if (fromSchema.isEmpty() && toSchema.isEmpty()) {
> throw new SqoopException(MatcherError.MATCHER_0000, "Neither a FROM or
> TO schemas been provided.");
> } else if (toSchema.isEmpty()) {
> this.fromSchema = fromSchema;
> this.toSchema = fromSchema;
> } else if (fromSchema.isEmpty()) {
> this.fromSchema = toSchema;
> this.toSchema = toSchema;
> } else {
> this.fromSchema = fromSchema;
> this.toSchema = toSchema;
> }
> }
> {code}
> if both exist, then in addition to this we need to validate that they both
> are compatible.
> Today we have some logic around matchers to use based on the presense and
> absence of the from and to schemas
> {code}
> public class MatcherFactory {
> public static Matcher getMatcher(Schema fromSchema, Schema toSchema) {
> if (toSchema.isEmpty() || fromSchema.isEmpty()) {
> return new LocationMatcher(fromSchema, toSchema);
> } else {
> return new NameMatcher(fromSchema, toSchema);
> }
> }
> }
> {code}
> But the above can be extended to further elaborate the rules and the order in
> while these rules will and should be applied. Having this in Sqoop internals
> means we better have a good story on how schema matching works
> For instance if we have From schema with a one column of type String and then
> a To schema with one column of type INTEGER, then we should warn/ fail to
> even start the JOB since it might not be recommended . These validation rules
> are not documented in Sqoop and if implemented should be configurable if
> possible externally per job.
> Second, such validation should happen before the job is submitted. But for
> that we need to get the schemas. so It may not be not be possible to avoid
> starting the job.
> NOTE: In 1.99.5 we do not yet support transformations, hence the schema's for
> the FROM and TO are static, i,e there is no way during the job execution for
> the TO source to tell that it would like to store the varchar data as binary.
> If the FROM source has the "varchar" type, we will validate that the "TO"
> source is storing this in varchar or a suitable compatible type in the "TO"
> data source as well.
> If we allow transformation layer in between post 1.99.5 then we potentially
> can have the schema change during the job execution phase i.e have dynamic
> schema's created per job
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)