[ 
https://issues.apache.org/jira/browse/SQOOP-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218035#comment-14218035
 ] 

Veena Basavaraj commented on SQOOP-1168:
----------------------------------------

[~vinothchandar] 

Some thoughts, I will have a design wiki sometime by end of this week.

last_primary_key ( which is referred to as append mode in the Sqoop1 ) or 
last_modified will both be supported. The latter is more work when writing the 
same to "HDFS" like data source, since we have to scan all records that have 
been written before and then modify them, The former is simple as it gets and 
more performant.

The latter is probably more useful than the former since I am assuming most of 
the use cases will have mutable FROM data sources and it is wise to update any 
modified record incrementally.

Second, as far as how we provide the incremental reading from the FROM source.
1. We can specify these attributes of incremental and type( since_primary_key , 
since_last_modified)  in the {code}FromJobConfiguration{code}

As far as storing state:

Storing state across runs we do it already to some extent in the submissions 
table in the Sqoop Repository. So that should be fairy easy to extend to store 
this "last" or  "since" so and so  marker, we could also support more complex 
markers in future, so that can be even a query to scan for only certain records 
in that run. 

I do think the FromStateObject/ToState is pretty neat to have in the repo as 
well so that we have more visibility into what went on in each run. Submission 
today represents the end result of the sqoop job and is geared more towards the 
Execution engine stats. But we churn out more details of the From/To state 
objects

> Sqoop2: Incremental Import
> --------------------------
>
>                 Key: SQOOP-1168
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1168
>             Project: Sqoop
>          Issue Type: Bug
>            Reporter: Hari Shreedharan
>            Assignee: Veena Basavaraj
>             Fix For: 1.99.5
>
>
> Initial plan is to follow roughly the same design as Sqoop 1, except provide 
> pluggability to start this through a REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to