[
https://issues.apache.org/jira/browse/SQOOP-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218035#comment-14218035
]
Veena Basavaraj commented on SQOOP-1168:
----------------------------------------
[~vinothchandar]
Some thoughts, I will have a design wiki sometime by end of this week.
last_primary_key ( which is referred to as append mode in the Sqoop1 ) or
last_modified will both be supported. The latter is more work when writing the
same to "HDFS" like data source, since we have to scan all records that have
been written before and then modify them, The former is simple as it gets and
more performant.
The latter is probably more useful than the former since I am assuming most of
the use cases will have mutable FROM data sources and it is wise to update any
modified record incrementally.
Second, as far as how we provide the incremental reading from the FROM source.
1. We can specify these attributes of incremental and type( since_primary_key ,
since_last_modified) in the {code}FromJobConfiguration{code}
As far as storing state:
Storing state across runs we do it already to some extent in the submissions
table in the Sqoop Repository. So that should be fairy easy to extend to store
this "last" or "since" so and so marker, we could also support more complex
markers in future, so that can be even a query to scan for only certain records
in that run.
I do think the FromStateObject/ToState is pretty neat to have in the repo as
well so that we have more visibility into what went on in each run. Submission
today represents the end result of the sqoop job and is geared more towards the
Execution engine stats. But we churn out more details of the From/To state
objects
> Sqoop2: Incremental Import
> --------------------------
>
> Key: SQOOP-1168
> URL: https://issues.apache.org/jira/browse/SQOOP-1168
> Project: Sqoop
> Issue Type: Bug
> Reporter: Hari Shreedharan
> Assignee: Veena Basavaraj
> Fix For: 1.99.5
>
>
> Initial plan is to follow roughly the same design as Sqoop 1, except provide
> pluggability to start this through a REST API.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)