[jira] [Commented] (SQOOP-1086) Running multiple incremental sqoop jobs in parallel resets the first sqoop job's --last-value

Boglarka Egyed (JIRA) Tue, 26 Jul 2016 06:07:19 -0700

    [ 
https://issues.apache.org/jira/browse/SQOOP-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393756#comment-15393756
 ]


Boglarka Egyed commented on SQOOP-1086:
---------------------------------------

We have reviewed the situation with Attila Szabo and Szabolcs Vasas and have 
agreed on that there is no code fix option for this. The problem is caused 
because during the usage of the built-in metasore Sqoop writes the 
INSERT/UPDATE related information into the 
/var/lib/hadoop-hdfs/.sqoop/metastore.db.script HSQL dump-like file thus 
parallel job execution can not be handled properly. We suggest to open a 
Documentation JIRA ticket instead with a recommendation to use shared metastore.

> Running multiple incremental sqoop jobs in parallel resets the first sqoop 
> job's --last-value
> ---------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-1086
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1086
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.0-incubating
>         Environment: Ubuntu 12.04.2
>            Reporter: Byron
>            Assignee: Boglarka Egyed
>            Priority: Critical
>              Labels: import, incremental, job, parallel
>
> I've created 2 jobs (different names) that pull from the same 
> database(MSSQL), but 2 different tables.
> They both use incremental append.
> If I run the jobs in sequence, I got no issue and the meta store for both 
> jobs remembers the --last-value per job.
> If I run the jobs in parallel, when the 1st job finished the meta is updated 
> with the --last-value correctly, but once the 2nd job finished the 1st job's 
> meta --last-value is reset.
> First Job
> # create the import job into the incremental table
> $ENV_SQOOP_HOME/bin/sqoop job -D mapred.job.name="Job 1" --create 
> "import-events" -- import --connect "$ENV_TRACKING_CONNECTION" --table 
> "$TABLE1" --split-by "dtmDBDateTime" --target-dir "$OUTPUT1" --incremental 
> append --check-column "dtmDBDateTime" --last-value "2012-01-01 00:00:00.000" 
> --fields-terminated-by \\t --null-string '' --null-non-string '';
> Second Job
> # create the import job into the table
> $ENV_SQOOP_HOME/bin/sqoop job -D mapred.job.name="Job 2" --create 
> "import-impressions" -- import --connect "$ENV_TRACKING_CONNECTION" --table 
> "$TABLE2" --split-by "dtmDBDateTime" --target-dir "$OUTPUT2" --incremental 
> append --check-column "dtmDBDateTime" --last-value "2012-01-01 00:00:00.000" 
> --fields-terminated-by \\t --null-string '' --null-non-string '';



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SQOOP-1086) Running multiple incremental sqoop jobs in parallel resets the first sqoop job's --last-value

Reply via email to