write

Veena Basavaraj (JIRA) Tue, 23 Dec 2014 14:46:04 -0800

    [ 
https://issues.apache.org/jira/browse/SQOOP-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257632#comment-14257632
 ]


Veena Basavaraj commented on SQOOP-1804:
----------------------------------------

Also, you may see it as #1 and # and #3, I see it as one way, ability to 
support the output from the job execution, there are 2 parts to the current 
sqoop job, reading and writing and they are called by various names Extractor/ 
Loader , From/To ...etc, but overall it is pretty much these two things. A 
sqoop job is considered a success when both this succeed., but these 2 parts 
operate independently. 

Second, doing delta fetch and merge means there is some STATE information we 
need to store across the job runs, if output is misleading to you, STATE is 
also something I originally considered having, [~vinothchandar] mentioned in 
his first comment. 

Third, How we expose this state/output information is different from how we 
store. If I want to FROM state in the TO part in future, it is matter of 
querying for that info and adding it to the Loader context. So there is nothing 
limiting on that front. Right now I did not see a requirement is shoving 
everything everywhere hence the direction field in the TABLE_SQ_JOB_OUTPUT ( 
TABLE_SQ_JOB_STATE) table, to basically send what we need.

Hope this answers you question on been elaborate. 

> Respository Structure + API: Storing/Retrieving the From/To state of the 
> incremental read/ write
> ------------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-1804
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1804
>             Project: Sqoop
>          Issue Type: Sub-task
>            Reporter: Veena Basavaraj
>            Assignee: Veena Basavaraj
>             Fix For: 1.99.5
>
>
> Details of this proposal are in the wiki.
> https://cwiki.apache.org/confluence/display/SQOOP/Delta+Fetch+And+Merge+Design#DeltaFetchAndMergeDesign-Wheretostoretheoutputinsqoop?
> Update: The above highlights the pros and cons of each approach. 
> #4 is chosen, since it is less intrusive, more clean and allows U/Edit per 
> value in the output easily.
> Will use this ticket for more detailed discussion on storage options for the 
> output from connectors
> 1. 
> {code}
> // will have FK to submission
>  public static final String QUERY_CREATE_TABLE_SQ_JOB_OUTPUT_SUBMISSION =
>      "CREATE TABLE " + TABLE_SQ_JOB_OUTPUT + " ("
>      + COLUMN_SQ_JOB_OUT_ID + " BIGINT GENERATED ALWAYS AS IDENTITY (START 
> WITH 1, INCREMENT BY 1), "
>      + COLUMN_SQ_JOB_OUT_KEY + " VARCHAR(32), "
>      + COLUMN_SQ_JOB_OUT_VALUE + " LONG VARCHAR,"
>      + COLUMN_SQ_JOB_OUT_TYPE + " VARCHAR(32),"
>      + COLUMN_SQD_ID + " VARCHAR(32)," // FK to the direction table, since 
> this allows to distinguish output from FROM/ TO part of the job
>    + COLUMN_SQRS_SUBMISSION + " BIGINT, "
>    + "CONSTRAINT " + CONSTRAINT_SQRS_SQS + " "
>      + "FOREIGN KEY (" + COLUMN_SQRS_SUBMISSION + ") "
>        + "REFERENCES " + TABLE_SQ_SUBMISSION + "(" + COLUMN_SQS_ID + ") ON 
> DELETE CASCADE "
> {code}
> 2.
> At the code level, we will define  MOutputType, one of the types can be BLOB 
> as well, if a connector decides to store the value as a BLOB
> {code}
> class JobOutput {
> String key;
> Object value;
> MOutputType type;
> }
> {code}
> 3. 
> At the repository API, add a new API to get job output for a particular 
> submission Id and allow updates on values. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SQOOP-1804) Respository Structure + API: Storing/Retrieving the From/To state of the incremental read/ write

Reply via email to