[
https://issues.apache.org/jira/browse/SQOOP-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257632#comment-14257632
]
Veena Basavaraj commented on SQOOP-1804:
----------------------------------------
Also, you may see it as #1 and # and #3, I see it as one way, ability to
support the output from the job execution, there are 2 parts to the current
sqoop job, reading and writing and they are called by various names Extractor/
Loader , From/To ...etc, but overall it is pretty much these two things. A
sqoop job is considered a success when both this succeed., but these 2 parts
operate independently.
Second, doing delta fetch and merge means there is some STATE information we
need to store across the job runs, if output is misleading to you, STATE is
also something I originally considered having, [~vinothchandar] mentioned in
his first comment.
Third, How we expose this state/output information is different from how we
store. If I want to FROM state in the TO part in future, it is matter of
querying for that info and adding it to the Loader context. So there is nothing
limiting on that front. Right now I did not see a requirement is shoving
everything everywhere hence the direction field in the TABLE_SQ_JOB_OUTPUT (
TABLE_SQ_JOB_STATE) table, to basically send what we need.
Hope this answers you question on been elaborate.
> Respository Structure + API: Storing/Retrieving the From/To state of the
> incremental read/ write
> ------------------------------------------------------------------------------------------------
>
> Key: SQOOP-1804
> URL: https://issues.apache.org/jira/browse/SQOOP-1804
> Project: Sqoop
> Issue Type: Sub-task
> Reporter: Veena Basavaraj
> Assignee: Veena Basavaraj
> Fix For: 1.99.5
>
>
> Details of this proposal are in the wiki.
> https://cwiki.apache.org/confluence/display/SQOOP/Delta+Fetch+And+Merge+Design#DeltaFetchAndMergeDesign-Wheretostoretheoutputinsqoop?
> Update: The above highlights the pros and cons of each approach.
> #4 is chosen, since it is less intrusive, more clean and allows U/Edit per
> value in the output easily.
> Will use this ticket for more detailed discussion on storage options for the
> output from connectors
> 1.
> {code}
> // will have FK to submission
> public static final String QUERY_CREATE_TABLE_SQ_JOB_OUTPUT_SUBMISSION =
> "CREATE TABLE " + TABLE_SQ_JOB_OUTPUT + " ("
> + COLUMN_SQ_JOB_OUT_ID + " BIGINT GENERATED ALWAYS AS IDENTITY (START
> WITH 1, INCREMENT BY 1), "
> + COLUMN_SQ_JOB_OUT_KEY + " VARCHAR(32), "
> + COLUMN_SQ_JOB_OUT_VALUE + " LONG VARCHAR,"
> + COLUMN_SQ_JOB_OUT_TYPE + " VARCHAR(32),"
> + COLUMN_SQD_ID + " VARCHAR(32)," // FK to the direction table, since
> this allows to distinguish output from FROM/ TO part of the job
> + COLUMN_SQRS_SUBMISSION + " BIGINT, "
> + "CONSTRAINT " + CONSTRAINT_SQRS_SQS + " "
> + "FOREIGN KEY (" + COLUMN_SQRS_SUBMISSION + ") "
> + "REFERENCES " + TABLE_SQ_SUBMISSION + "(" + COLUMN_SQS_ID + ") ON
> DELETE CASCADE "
> {code}
> 2.
> At the code level, we will define MOutputType, one of the types can be BLOB
> as well, if a connector decides to store the value as a BLOB
> {code}
> class JobOutput {
> String key;
> Object value;
> MOutputType type;
> }
> {code}
> 3.
> At the repository API, add a new API to get job output for a particular
> submission Id and allow updates on values.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)