[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173166#comment-15173166
 ] 

Shyam Gavulla commented on MAPREDUCE-4522:
------------------------------------------

[~ozawa] I made the configuration change. Added a property  in 
mapred-default.xml 
<property>
  <name>mapreduce.output.dboutputformat.batch-size</name>
  <value>1000</value>
  <description>The batch size of SQL statements that will be executed before 
reporting progress. Default is 1000</description>
</property>

Added a constant in MRJobConfig.java - public static final String 
MR_DBOUTPUTFORMAT_BATCH_SIZE="mapreduce.output.dboutputformat.batch-size";

Let me know if this good and I will create a patch. 

> DBOutputFormat Times out on large batch inserts
> -----------------------------------------------
>
>                 Key: MAPREDUCE-4522
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4522
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task-controller
>    Affects Versions: 0.20.205.0
>            Reporter: Nathan Jarus
>              Labels: newbie
>
> In DBRecordWriter#close(), progress is never updated. In large batch inserts, 
> this can cause the reduce task to time out due to the amount of time it takes 
> the SQL engine to process that insert. 
> Potential solutions I can see:
> Don't batch inserts; do the insert when DBRecordWriter#write() is called 
> (awful)
> Spin up a thread in DBRecordWriter#close() and update progress in that. 
> (gross)
> I can provide code for either if you're interested. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to