[ 
https://issues.apache.org/jira/browse/SQOOP-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277389#comment-14277389
 ] 

Veena Basavaraj commented on SQOOP-1168:
----------------------------------------

For Question 3:
I have explained this before, but I will try to rehash the same information 
this time with an example.

In case of a JDBC connector( one of our favorite connectors), it is upto the 
connector to say how it wants to ask what rows to read. Here are 2 ways as I 
stated in my wiki.

{code}
// In FromJobConfiguration
@ConfigClass(validators = { 
@Validator(DeltaFetchConfig1.DeltaFetchConfig1Validator.class)})
public class DeltaFetchConfig1 {
 
  @Input(size = 255) public String column;
  // validate supported oeprators, can provide a default value etc...
  @Input public String operator;
  @Input public String value;
}
or
@ConfigClass(validators = { 
@Validator(DeltaFetchConfig2.DeltaFetchConfig2Validator.class)})
public class DeltaFetchConfig2 {
  @Input(size = 255) public String deltaFetchQuery;
}
{code}

When it processes the records, it wants to store state, at this point it can 
called last_value, row_key_last_processed, current_end_point, whatever relevant 
to it, Note the type of the input was a query - and the type of state in a long 
value or may be a timestamp, it can be anything.

So next question, what happens when the user wants to do the next run? 

Very well, he will issue the command start job --j 1, at this point there is no 
room for him/her to reset the config nor state.
But there are cases we want to provide this ability, so we provide a show 
config --jid 1 --type from  or show state --jid 1--type from then we can also 
have update config --jid  --type from and display all the configs that they can 
edit.

Independently whether a input is editable or not, an annotation can be added 
and it is a enhancement we can do as part for SQOOP-1648 ticket for sure. Same 
with state.

Next question, do we want to correlate a input and state, I do not think so, 
because it one to many, a config input in a connectors might result in multiple 
state values. But if we want to provide a means to associate it, we can do it, 
so in the above case, the state key/name  prev_value will be associated with 
the input query

So with this, we have not tampered with user inputs, we let them edit it if 
they want to

Lastly, the most important question how does connectors decide whether to use 
the input or state. I would say it is obvious that connectors have to give 
importance to latest input value ( if override/ update flag was yes) than the 
prev run value. But a connector may choose not to do it, it is the custom logic 
of connector at that point. But in the submission history the connector should 
be able to tell the user what value was used for the query ( another 
information it can write back for user to see what was done), its that flexible.




https://issues.apache.org/jira/browse/SQOOP-1804, has more details  if we want 
to have the state variable explicitly exposed by connector in the config class 
as well.

> Sqoop2: Delta Fetch/ Merge ( formerly called Incremental Import )
> -----------------------------------------------------------------
>
>                 Key: SQOOP-1168
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1168
>             Project: Sqoop
>          Issue Type: Bug
>            Reporter: Hari Shreedharan
>            Assignee: Veena Basavaraj
>             Fix For: 1.99.6
>
>
> The formal design wiki is here 
> https://cwiki.apache.org/confluence/display/SQOOP/Delta+Fetch+and+Merge+Design



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to