[
https://issues.apache.org/jira/browse/SQOOP-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393756#comment-15393756
]
Boglarka Egyed commented on SQOOP-1086:
---------------------------------------
We have reviewed the situation with Attila Szabo and Szabolcs Vasas and have
agreed on that there is no code fix option for this. The problem is caused
because during the usage of the built-in metasore Sqoop writes the
INSERT/UPDATE related information into the
/var/lib/hadoop-hdfs/.sqoop/metastore.db.script HSQL dump-like file thus
parallel job execution can not be handled properly. We suggest to open a
Documentation JIRA ticket instead with a recommendation to use shared metastore.
> Running multiple incremental sqoop jobs in parallel resets the first sqoop
> job's --last-value
> ---------------------------------------------------------------------------------------------
>
> Key: SQOOP-1086
> URL: https://issues.apache.org/jira/browse/SQOOP-1086
> Project: Sqoop
> Issue Type: Bug
> Affects Versions: 1.4.0-incubating
> Environment: Ubuntu 12.04.2
> Reporter: Byron
> Assignee: Boglarka Egyed
> Priority: Critical
> Labels: import, incremental, job, parallel
>
> I've created 2 jobs (different names) that pull from the same
> database(MSSQL), but 2 different tables.
> They both use incremental append.
> If I run the jobs in sequence, I got no issue and the meta store for both
> jobs remembers the --last-value per job.
> If I run the jobs in parallel, when the 1st job finished the meta is updated
> with the --last-value correctly, but once the 2nd job finished the 1st job's
> meta --last-value is reset.
> First Job
> # create the import job into the incremental table
> $ENV_SQOOP_HOME/bin/sqoop job -D mapred.job.name="Job 1" --create
> "import-events" -- import --connect "$ENV_TRACKING_CONNECTION" --table
> "$TABLE1" --split-by "dtmDBDateTime" --target-dir "$OUTPUT1" --incremental
> append --check-column "dtmDBDateTime" --last-value "2012-01-01 00:00:00.000"
> --fields-terminated-by \\t --null-string '' --null-non-string '';
> Second Job
> # create the import job into the table
> $ENV_SQOOP_HOME/bin/sqoop job -D mapred.job.name="Job 2" --create
> "import-impressions" -- import --connect "$ENV_TRACKING_CONNECTION" --table
> "$TABLE2" --split-by "dtmDBDateTime" --target-dir "$OUTPUT2" --incremental
> append --check-column "dtmDBDateTime" --last-value "2012-01-01 00:00:00.000"
> --fields-terminated-by \\t --null-string '' --null-non-string '';
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)