[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170675#comment-15170675
 ] 

Shyam Gavulla commented on MAPREDUCE-4522:
------------------------------------------

Re-posting to all users

I am a newbie and this is my first post here. I looked at the issue and code 
and I want to propose another solution.
We could accumulate the keys in DBRecordWriter#write() into a collection and in 
DBRecordWriter#close() we could do a batch insert of 500 or 1000 keys in single 
batch. This is not ideal, but might help with large inserts and update the 
progress.
I would like to provide a patch if this solution is acceptable.

> DBOutputFormat Times out on large batch inserts
> -----------------------------------------------
>
>                 Key: MAPREDUCE-4522
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4522
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task-controller
>    Affects Versions: 0.20.205.0
>            Reporter: Nathan Jarus
>              Labels: newbie
>
> In DBRecordWriter#close(), progress is never updated. In large batch inserts, 
> this can cause the reduce task to time out due to the amount of time it takes 
> the SQL engine to process that insert. 
> Potential solutions I can see:
> Don't batch inserts; do the insert when DBRecordWriter#write() is called 
> (awful)
> Spin up a thread in DBRecordWriter#close() and update progress in that. 
> (gross)
> I can provide code for either if you're interested. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to