I should take a look at Vaidya for performance analysis, but yes, inserting using LOAD DATA LOCAL INFILE is got to be the fastest. My numbers show its atleast 9-10 times faster than the next mechanism. I am loading to an empty table, without much indexes; otherwise, I might need to disable indexing and re-enable indexing after the load data.
I am yet to check out Hivo; and got a couple of questions. For using inload functionality of MySQL, I had to copy the results to local and sequentially loaded it to MYSQL. Does Hivo perform a parallel Load Data Local? Does the reducers perform this task upon close? That would mean multiple connections to the DB and could be faster. Thanks! --- Gautam On Fri, Apr 23, 2010 at 12:29 AM, Eric Sammer <[email protected]> wrote: > In general, you'll want to avoid tunneling permanent production code > over ssh tunnels. They're flaky and do not recover from network > interruption in any reasonable way. If you need to do this, a vpn is > the correct approach. Linux easily will do ipsec p2p tunnels that are > reasonably secure. If you really only have port 22 then I suppose > that's your only option but I really would reevaluate the security > policy. > > Either way, it's going to be slow due to the encryption overhead but > if it's a small amount of data, that may be fine. > > On Fri, Apr 23, 2010 at 12:18 AM, Gautam Singaraju > <[email protected]> wrote: >> All, >> >> I have a use-case where I need to crunch a large amount of data and >> push to the results (comparatively a smaller set) to a mysql db at a >> remote location. As per security concerns, only SSH ports are open. I >> tried using Java Secure Channel [1] in combination with some custom >> JDBC code from the reducers. >> >> Can anyone comment on the performance of DBOutputFormat? Have there >> been any efforts to tunnel this through SSH? This is going to be an >> expensive operation; any suggestions would be welcome. >> >> [1] http://www.jcraft.com/jsch/ >> --- >> Gautam Singaraju >> > > > > -- > Eric Sammer > phone: +1-917-287-2675 > twitter: esammer > data: www.cloudera.com >
