Veena Basavaraj created SQOOP-2025:
--------------------------------------

             Summary: Input/State history per job run / submission
                 Key: SQOOP-2025
                 URL: https://issues.apache.org/jira/browse/SQOOP-2025
             Project: Sqoop
          Issue Type: Sub-task
            Reporter: Veena Basavaraj


As per SQOOP-1804, we will be storing both treating both the config inputs and 
intermediate state generated as part of the job run in the config object. 

Currently the config object is stored in the repository model under 
{code}SQ_CONFIG{code} table. It is per SQ_CONFIGURABLE. 

The inputs within the Config class  and its attirbutes are stored in the 
{code}SQ_INPUT{code}

i,e the columns in the SQ_INPUT map to the attributed of the config @Input 
annotation
{code}
 @Input(size = 50)
  public String schemaName;

  @Input(size = 50)
  public String tableName;

{code}

The actual values for the SQ_INPUT keys per sqoop job are stored in
SQ_JOB_INPUT and SQ_LINK_INPUT 

So this means we overwrite the config input values for every job run. . Lets 
take an example.

if a job is started with config value for key "test" as foo, the first job run 
the SQ_INPUT will reflect the value foo. Before the second run, say the value 
was modified to "bar" then the SQ_INPUT table will reflect the value "bar", if 
the user were supposed to query the config values based on the job Id, they 
will only see the last value modified, it does not tell the user the value that 
was used before and job run started and the value the job run / submission 
ended.

The proposal is to provide this history so that the user can track per job run 
the config input values.

A simple proposal is to have a submission_id in the SQ_JOB_INPUT table,
and SQ_LINK_INPUT table.

[~anandriyer] also suggested we store before/ after config state if possible

To do the BEFORE/AFTER config history, 
1. We will create a new set of values for each config inputs for every job run, 
based on the prev state ( or ) if the user edits the configs while the prev job 
is running, create new ones with null submissionId, and associate it will the 
submission Id once the job run starts. Once the job run finishes, we will write 
the config values again.

2. We will need to store the BEFORE/AFTER indicator in another column. 

3. We will make only the last run config input values editable if the job has 
not yet started.

 

Pros:
We have a history



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to