[
https://issues.apache.org/jira/browse/SQOOP-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14258388#comment-14258388
]
Gwen Shapira commented on SQOOP-1804:
-------------------------------------
To clarify what I mean when I said "declare keys in advance":
If we decide to support arbitrary state by connectors, I'd like to see
connectors create the state keys when registering and before any submissions
happen. This will allow us to enforce consistency of the state between job
executions.
I don't see this requirement as limiting the connectors - since the person
writing the connector will know if advance what information they need to store
in the state (after all, they will need to collect it and later use it), its
just a matter of registering it the way configs are registered.
> Repository Structure + API: Storing/Retrieving the From/To state of the
> incremental read/ write
> -----------------------------------------------------------------------------------------------
>
> Key: SQOOP-1804
> URL: https://issues.apache.org/jira/browse/SQOOP-1804
> Project: Sqoop
> Issue Type: Sub-task
> Reporter: Veena Basavaraj
> Assignee: Veena Basavaraj
> Fix For: 1.99.5
>
>
> Details of this proposal are in the wiki.
> https://cwiki.apache.org/confluence/display/SQOOP/Delta+Fetch+And+Merge+Design#DeltaFetchAndMergeDesign-Wheretostoretheoutputinsqoop?
> Update: The above highlights the pros and cons of each approach.
> #4 is chosen, since it is less intrusive, more clean and allows U/Edit per
> value in the output easily.
> Will use this ticket for more detailed discussion on storage options for the
> output from connectors
> 1.
> {code}
> // will have FK to submission
> public static final String QUERY_CREATE_TABLE_SQ_JOB_OUTPUT_SUBMISSION =
> "CREATE TABLE " + TABLE_SQ_JOB_OUTPUT + " ("
> + COLUMN_SQ_JOB_OUT_ID + " BIGINT GENERATED ALWAYS AS IDENTITY (START
> WITH 1, INCREMENT BY 1), "
> + COLUMN_SQ_JOB_OUT_KEY + " VARCHAR(32), "
> + COLUMN_SQ_JOB_OUT_VALUE + " LONG VARCHAR,"
> + COLUMN_SQ_JOB_OUT_TYPE + " VARCHAR(32),"
> + COLUMN_SQD_ID + " VARCHAR(32)," // FK to the direction table, since
> this allows to distinguish output from FROM/ TO part of the job
> + COLUMN_SQRS_SUBMISSION + " BIGINT, "
> + "CONSTRAINT " + CONSTRAINT_SQRS_SQS + " "
> + "FOREIGN KEY (" + COLUMN_SQRS_SUBMISSION + ") "
> + "REFERENCES " + TABLE_SQ_SUBMISSION + "(" + COLUMN_SQS_ID + ") ON
> DELETE CASCADE "
> {code}
> 2.
> At the code level, we will define MOutputType, one of the types can be BLOB
> as well, if a connector decides to store the value as a BLOB
> {code}
> class JobOutput {
> String key;
> Object value;
> MOutputType type;
> }
> {code}
> 3.
> At the repository API, add a new API to get job output for a particular
> submission Id and allow updates on values.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)