[ https://issues.apache.org/jira/browse/MAPREDUCE-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798810#action_12798810 ]
Philip Zeyliger commented on MAPREDUCE-1368: -------------------------------------------- Sorry, I wasn't clear. I think that even if you had transactions, you could still have data inserted twice. A map task looks like: (1) start map task, (2) begin transaction, (3) insert many rows, (4) commit transaction, (5) end map task. If you crash between (4) and (5), MapReduce will schedule another worker. > Vertica adapter doesn't use explicity transactions or report progress > --------------------------------------------------------------------- > > Key: MAPREDUCE-1368 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1368 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 0.21.0 > Reporter: Omer Trajman > Assignee: Omer Trajman > Fix For: 0.21.0 > > > The vertica adapter doesn't use explicit transactions, so speculative tasks > can result in duplicate loads. The JDBC driver supports it so the fix is > pretty minor. Also the JDBC driver commits synchronously and the adapter > needs to report progress even if it takes longer than the timeout. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.