[
https://issues.apache.org/jira/browse/SQOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184255#comment-14184255
]
Masatake Iwasaki commented on SQOOP-1622:
-----------------------------------------
If there are 4 map tasks and 1 reduce task, PGBulkloadExportReducer#reduce is
called 4 times in current implementation, we have 4 transaction for copying
from staging to destination as a result. If failover occurs while reduce task
is running on HA-enabled cluster, job is automatically resubmitted and data
exported to db may be duplicate.
Making reduce tasks do work in single transaction and setting number of reduce
tasks to 1 avoid this duplication. You can still set number of reduces more
than 1 if you like. The patch does not hardcode the number.
> Copying from staging table should be in single transaction for pg_bulkload
> connector
> ------------------------------------------------------------------------------------
>
> Key: SQOOP-1622
> URL: https://issues.apache.org/jira/browse/SQOOP-1622
> Project: Sqoop
> Issue Type: Improvement
> Components: connectors/postgresql
> Affects Versions: 1.4.5
> Reporter: Masatake Iwasaki
> Assignee: Masatake Iwasaki
>
> PGBulkloadExportReducer#reduce may be called per map task because the map
> output key is unique ID of map task. Each reduce tasks should do copying from
> staging table in single transaction.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)