[ 
https://issues.apache.org/jira/browse/HADOOP-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637120#action_12637120
 ] 

Enis Soztutar commented on HADOOP-4331:
---------------------------------------

I am not convinced that further splitting the batch in reduces is the right 
way. It is better to add all the values in the reduce once to keep atomicity. 
If some error occurs in the transaction, none of the records in the reduce 
should be inserted, otherwise when the reduce is restarted, some of the records 
might be duplicated. 

Is there a specific performance/driver-related reason to add batch sizes? 

> DBOutputFormat: add batch size support for JDBC and recieve  DBWritable 
> object in value not in key
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4331
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4331
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: Alexander Schwid
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: patch.txt
>
>
> package mapred.lib.db
> added batch size support for JDBC in DBOutputFormat 
> recieve  DBWritable object in value not in key in DBOutputFormat

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to