[ 
https://issues.apache.org/jira/browse/SQOOP-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085527#comment-14085527
 ] 

Gwen Shapira commented on SQOOP-1168:
-------------------------------------

Here's the approach I'm thinking of taking:


Incremental will be supported at the Connector level.
I.e. connectors can decide whether or not to support incremental. For HDFS for 
example, incremental does’t make sense. For MongoDB, incremental if supported 
will look very different than for JDBC.

Incremental will be supported for Extract part of the job only.

To support incremental queries in JDBC connector we need few new values in the 
ImportTableForm (part of the ImportJobConfiguration):
* isIncremental (Y/N) — not sure its actually needed, maybe enough if 
checkColumn exists?
* incrementalColumn — hope to support expressions / functions as well as actual 
columns
* lastValue — First time can be given by user or we can have a default (get 
everything? get nothing?). Later runs should be captured from output.

There’s obviously number of verifications and display-conditions we can 
implement here.

The change should include —

On connector side:
- If job is incremental (or perhaps if incrementalColumn is not null):
     - maximum value of incrementalColumn should be captured from the DB before 
the execution starts (select max(incrementalColumn) from table where 1=1) and 
stored in repository for reuse in next execution.
      - the extract query should have “incrementalColumn > lastValue and 
incrementalColumn < last” condition

lastValue can be stored in the job (as part of the form. or in submission, if 
we give a job way to get the last submission)
we don’t want it to be a fixed field, since who knows what else connectors will 
need.

I think the best option is if connectors will modify the job-connector form, 
and update the lastValue field which it looks like they can do.

The main downside of this approach is that each connector can have slightly 
different parameter names for a feature that does basically the same thing, 
which will be pretty confusing for the users. But we already have this issue 
for common terms like "table"...

> Sqoop2: Incremental Import
> --------------------------
>
>                 Key: SQOOP-1168
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1168
>             Project: Sqoop
>          Issue Type: Bug
>            Reporter: Hari Shreedharan
>            Assignee: Hari Shreedharan
>
> Initial plan is to follow roughly the same design as Sqoop 1, except provide 
> pluggability to start this through a REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to