Thanks for the alternatives, but I'd ideally like to do all this inside the MR 
job itself as I want to be able to programmatically run it regularly, and any 
additional steps just add complexity.

Looking through sample code on Google, I never see anybody using the 
Progressable passed in to the output format, and pretty much every time someone 
has a problem with job timeouts they're just told to increase the timeout. This 
seems to me like curing the symptoms but not the actual problem. Does the 
progressable actually do anything?

From: Sonal Goyal [mailto:sonalgoy...@gmail.com]
Sent: Thursday, August 02, 2012 10:35 PM
To: Jarus, Nathan
Subject: Re: DBOutputWriter timing out writing to database

Hi Nathan,

I saw your question on the mailing list. If your target database is MySQL, HIHO 
at https://github.com/sonalgoyal/hiho is an open source tool which provides a 
highly optimized write operation to the db. The tool is open source, please 
feel free to try out and let me know if you see any issues.

Best Regards,
Sonal
Crux: Reporting for HBase<https://github.com/sonalgoyal/crux>
Nube Technologies<http://www.nubetech.co>






On Fri, Aug 3, 2012 at 12:34 AM, Jarus, Nathan 
<jar...@amazon.com<mailto:jar...@amazon.com>> wrote:
Hey,

I'm running Hadoop 0.20.205 and am using the DBOutputFormat to write to a 
database. For small datasets, my jobs work perfectly, but for larger jobs, 
writing to the database takes longer than 600 seconds and Hadoop times out my 
reduce tasks. Looking at the source for DBOutputFormat, it seems the 
Progressable never gets updated while the insert query is being run. How do I 
modify/subclass DBOutputFormat to update this so my jobs can finish?

Thanks
Nathan

Reply via email to