Jarek Jarcec Cecho created SQOOP-2055:
-----------------------------------------
Summary: Run only one map task attempt during export
Key: SQOOP-2055
URL: https://issues.apache.org/jira/browse/SQOOP-2055
Project: Sqoop
Issue Type: Bug
Affects Versions: 1.4.5
Reporter: Jarek Jarcec Cecho
Assignee: Jarek Jarcec Cecho
Fix For: 1.4.6
While investigating several user issues, I've noticed that our [documentation
is stating that on export mapper failure we fail the entire
job|http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_failed_exports]:
{quote}
If an export map task fails due to these or other reasons, it will cause the
export job to fail. The results of a failed export are undefined. Each export
map task operates in a separate transaction. Furthermore, individual map tasks
commit their current transaction periodically. If a task fails, the current
transaction will be rolled back. Any previously-committed transactions will
remain durable in the database, leading to a partially-complete export.
{quote}
This is however not the observed behavior as mapreduce will re-run failed
mapper again (up to 3 times) before failing the job. This is confusing while
investigating failures because most often one have to go to the first failed
attempt and ignore the rest as they are usually failing on unrelated issues
(key constraints).
It seems that some of the connectors are smart enough to either suggest user to
configure MR or do it automatically
([PGDump|https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/mapreduce/postgresql/PGBulkloadExportJob.java#L139],
[OraOop|https://github.com/apache/sqoop/blob/trunk/src/docs/user/connectors.txt#L831]).
I would like to propose to apply this behavior on every export job as that
seem as a more reasonable default for export job.
Doing this might have a side effect on more advanced connectors that have each
mapper attempt idempotent (e.g. they are using temporary tables per map attempt
or similar facility) in the sense that we stop re-running their failed attempts
automatically and those connectors will have to re-enable this behavior on
their own.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)