[
https://issues.apache.org/jira/browse/SQOOP-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257595#comment-14257595
]
Gwen Shapira commented on SQOOP-1804:
-------------------------------------
OK, I get the connector-data in context design.
Agree that the exact look of submission is not important at this stage.
The case where I see TO needing to know about data in FROM is things like
validations - for example, if FROM says "last_value is N" but TO didn't
actually manage to write N, I can see us wanting to allow TO to update the
"last_value" and essentially roll back the job execution.
More generally, if we have few keys that FROM, TO and our REST API will know
exist and can look for them specifically (last_value being the current
example), we can expose a bit more functionality.
I'd say there are 3 different designs possible here:
1. Support only things we absolutely know we need (just last_value)
2. Be more generic and allow connectors to store whatever they want (at the
cost of not being able to use these values between connectors)
3. Be more generic in a different direction and support future use of these
values in multiple connectors and also by user-facing apps
It sounds like you chose #2 (did I get it right?) and I'm interesting in more
details on why you prefer this design.
Also, the design doc is not very explicit about this point ("connectors can
only access and use outputs that they put there themselves"), which is why I
got confused (my first comments assumes option #3), perhaps you want to be more
explicit about the choice and the trade-offs in the wiki itself.
> Respository Structure + API: Storing/Retrieving the From/To state of the
> incremental read/ write
> ------------------------------------------------------------------------------------------------
>
> Key: SQOOP-1804
> URL: https://issues.apache.org/jira/browse/SQOOP-1804
> Project: Sqoop
> Issue Type: Sub-task
> Reporter: Veena Basavaraj
> Assignee: Veena Basavaraj
> Fix For: 1.99.5
>
>
> Details of this proposal are in the wiki.
> https://cwiki.apache.org/confluence/display/SQOOP/Delta+Fetch+And+Merge+Design#DeltaFetchAndMergeDesign-Wheretostoretheoutputinsqoop?
> Update: The above highlights the pros and cons of each approach.
> #4 is chosen, since it is less intrusive, more clean and allows U/Edit per
> value in the output easily.
> Will use this ticket for more detailed discussion on storage options for the
> output from connectors
> 1.
> {code}
> // will have FK to submission
> public static final String QUERY_CREATE_TABLE_SQ_JOB_OUTPUT_SUBMISSION =
> "CREATE TABLE " + TABLE_SQ_JOB_OUTPUT + " ("
> + COLUMN_SQ_JOB_OUT_ID + " BIGINT GENERATED ALWAYS AS IDENTITY (START
> WITH 1, INCREMENT BY 1), "
> + COLUMN_SQ_JOB_OUT_KEY + " VARCHAR(32), "
> + COLUMN_SQ_JOB_OUT_VALUE + " LONG VARCHAR,"
> + COLUMN_SQ_JOB_OUT_TYPE + " VARCHAR(32),"
> + COLUMN_SQD_ID + " VARCHAR(32)," // FK to the direction table, since
> this allows to distinguish output from FROM/ TO part of the job
> + COLUMN_SQRS_SUBMISSION + " BIGINT, "
> + "CONSTRAINT " + CONSTRAINT_SQRS_SQS + " "
> + "FOREIGN KEY (" + COLUMN_SQRS_SUBMISSION + ") "
> + "REFERENCES " + TABLE_SQ_SUBMISSION + "(" + COLUMN_SQS_ID + ") ON
> DELETE CASCADE "
> {code}
> 2.
> At the code level, we will define MOutputType, one of the types can be BLOB
> as well, if a connector decides to store the value as a BLOB
> {code}
> class JobOutput {
> String key;
> Object value;
> MOutputType type;
> }
> {code}
> 3.
> At the repository API, add a new API to get job output for a particular
> submission Id and allow updates on values.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)