Thanks for the alternatives, but I'd ideally like to do all this inside the MR job itself as I want to be able to programmatically run it regularly, and any additional steps just add complexity.
Looking through sample code on Google, I never see anybody using the Progressable passed in to the output format, and pretty much every time someone has a problem with job timeouts they're just told to increase the timeout. This seems to me like curing the symptoms but not the actual problem. Does the progressable actually do anything? From: Sonal Goyal [mailto:sonalgoy...@gmail.com] Sent: Thursday, August 02, 2012 10:35 PM To: Jarus, Nathan Subject: Re: DBOutputWriter timing out writing to database Hi Nathan, I saw your question on the mailing list. If your target database is MySQL, HIHO at https://github.com/sonalgoyal/hiho is an open source tool which provides a highly optimized write operation to the db. The tool is open source, please feel free to try out and let me know if you see any issues. Best Regards, Sonal Crux: Reporting for HBase<https://github.com/sonalgoyal/crux> Nube Technologies<http://www.nubetech.co> On Fri, Aug 3, 2012 at 12:34 AM, Jarus, Nathan <jar...@amazon.com<mailto:jar...@amazon.com>> wrote: Hey, I'm running Hadoop 0.20.205 and am using the DBOutputFormat to write to a database. For small datasets, my jobs work perfectly, but for larger jobs, writing to the database takes longer than 600 seconds and Hadoop times out my reduce tasks. Looking at the source for DBOutputFormat, it seems the Progressable never gets updated while the insert query is being run. How do I modify/subclass DBOutputFormat to update this so my jobs can finish? Thanks Nathan