[jira] [Commented] (SQOOP-2055) Run only one map task attempt during export

Hudson (JIRA) Wed, 28 Jan 2015 23:06:49 -0800

    [ 
https://issues.apache.org/jira/browse/SQOOP-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296490#comment-14296490
 ]


Hudson commented on SQOOP-2055:
-------------------------------

SUCCESS: Integrated in Sqoop-ant-jdk-1.6-hadoop200 #969 (See 
[https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop200/969/])
SQOOP-2055:  Run only one map task attempt during export (venkat: 
https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=420fc3d53f1db62710710b93b9801cff5e4d1b53)
* src/java/org/apache/sqoop/mapreduce/ExportJobBase.java


> Run only one map task attempt during export
> -------------------------------------------
>
>                 Key: SQOOP-2055
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2055
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.5
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Jarek Jarcec Cecho
>             Fix For: 1.4.6
>
>         Attachments: SQOOP-2055.patch, SQOOP-2055.patch
>
>
> While investigating several user issues, I've noticed that our [documentation 
> is stating that on export mapper failure we fail the entire 
> job|http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_failed_exports]:
> {quote}
> If an export map task fails due to these or other reasons, it will cause the 
> export job to fail. The results of a failed export are undefined. Each export 
> map task operates in a separate transaction. Furthermore, individual map 
> tasks commit their current transaction periodically. If a task fails, the 
> current transaction will be rolled back. Any previously-committed 
> transactions will remain durable in the database, leading to a 
> partially-complete export.
> {quote}
> This is however not the observed behavior as mapreduce will re-run failed 
> mapper again (up to 3 times) before failing the job. This is confusing while 
> investigating failures because most often one have to go to the first failed 
> attempt and ignore the rest as they are usually failing on unrelated issues 
> (key constraints).
> It seems that some of the connectors are smart enough to either suggest user 
> to configure MR or do it automatically 
> ([PGDump|https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/mapreduce/postgresql/PGBulkloadExportJob.java#L139],
>  
> [OraOop|https://github.com/apache/sqoop/blob/trunk/src/docs/user/connectors.txt#L831]).
>  I would like to propose to apply this behavior on every export job as that 
> seem as a more reasonable default for export job.
> Doing this might have a side effect on more advanced connectors that have 
> each mapper attempt idempotent (e.g. they are using temporary tables per map 
> attempt or similar facility) in the sense that we stop re-running their 
> failed attempts automatically and those connectors will have to re-enable 
> this behavior on their own.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SQOOP-2055) Run only one map task attempt during export

Reply via email to