Nick, I'm afraid that right now the only available OutputFormat for JDBC is that one. You'll note that DBOutputFormat doesn't really include much support for special-casing to MySQL or other targets.
Your best bet is to probably copy the code from DBOutputFormat and DBConfiguration into some other class (e.g. MySQLDBOutputFormat) and modify the code in the RecordWriter to generate PreparedStatements containing batched insert statements. If you arrive at a solution which is pretty general-purpose/robust, please consider contributing it back to the Hadoop project :) If you do so, send me an email off-list; I'm happy to help with advice on developing better DB integration code, reviewing your work, etc. Also on the input side, you should really be using DataDrivenDBInputFormat instead of the older DBIF :) Sqoop (in src/contrib/sqoop on Apache 0.21 / CDH 0.20) has pretty good support for parallel imports, and uses this InputFormat instead. - Aaron On Thu, Jan 28, 2010 at 11:39 AM, Nick Jones <[email protected]> wrote: > Hi all, > I have a use case for collecting several rows from MySQL of > compressed/unstructured data (n rows), expanding the data set, and storing > the expanded results back into a MySQL DB (100,000n rows). DBInputFormat > seems to perform reasonably well but DBOutputFormat is inserting rows > one-by-one. How can I take advantage of MySQL's support of generating fewer > insert statements with more values within each one? > > Thanks. > -- > Nick Jones > >
