Sonal, I'll try and put together something soon. I haven't located documentation on DataDrivenInputFormat yet to implement it. :)
Nick Jones -----Original Message----- From: Sonal Goyal [mailto:[email protected]] Sent: Monday, February 01, 2010 10:03 AM To: [email protected] Subject: Re: DBOutputFormat Speed Issues Hi Nick, If you dont mind, can you please share your performance benchmarks of using DataDrivenInputFormat/DBInputFormat and MySQL? Thanks and Regards, Sonal On Mon, Feb 1, 2010 at 3:33 AM, Aaron Kimball <[email protected]> wrote: > Nick, > > I'm afraid that right now the only available OutputFormat for JDBC is that > one. You'll note that DBOutputFormat doesn't really include much support > for > special-casing to MySQL or other targets. > > Your best bet is to probably copy the code from DBOutputFormat and > DBConfiguration into some other class (e.g. MySQLDBOutputFormat) and modify > the code in the RecordWriter to generate PreparedStatements containing > batched insert statements. > > If you arrive at a solution which is pretty general-purpose/robust, please > consider contributing it back to the Hadoop project :) If you do so, send > me > an email off-list; I'm happy to help with advice on developing better DB > integration code, reviewing your work, etc. > > Also on the input side, you should really be using DataDrivenDBInputFormat > instead of the older DBIF :) Sqoop (in src/contrib/sqoop on Apache 0.21 / > CDH 0.20) has pretty good support for parallel imports, and uses this > InputFormat instead. > > - Aaron > > On Thu, Jan 28, 2010 at 11:39 AM, Nick Jones <[email protected]> wrote: > > > Hi all, > > I have a use case for collecting several rows from MySQL of > > compressed/unstructured data (n rows), expanding the data set, and > storing > > the expanded results back into a MySQL DB (100,000n rows). DBInputFormat > > seems to perform reasonably well but DBOutputFormat is inserting rows > > one-by-one. How can I take advantage of MySQL's support of generating > fewer > > insert statements with more values within each one? > > > > Thanks. > > -- > > Nick Jones > > > > >
smime.p7s
Description: S/MIME cryptographic signature
