Andrey Hristov wrote:
 Hi Brian,
there could be problems as the client just pumps data to the server and reads a response after sending everything. Similar to STMT_SEND_LONG_DATA which is pretty faulty in design (doesn't send back ACK after receiving a chunk of data).


Funny, I just took this problem on in Nimbus. What I really wanted to do was compare load times for the dbt2 benchmark databases between Nimbus, Falcon, and InnoDB.

The MySQL dbt2 scripts use LOAD DATA INFILE. I wasn't going to put something that warty into Nimbus, even for an apples to apples comparison, so an alternative. I settled on the JDBC addBatch/executeBatch mechanism. For prepared statements it allows any number of rows to be "added" on the client. The work happens on the executeBatch, which returns a vector of result codes indicating what actually happened plus a code to say whether the server stopped on an error or finished the batch. It's almost nothing to implement, layers on existing protocol nicely, and is better in all respects that LOAD DATA.

And, by the way, LOAD DATA doesn't check for nulls on fields declared as non null. The dbt2 data is crap, of course, so my load app had to work around this. Should drizzle be bug for bug compatible with MySQL on this?

On the engine side, LOAD data sets the handle "batch load" flag that lets transactional storage engines cheat on transactions. Both InnoDB and Falcon do an implicit commit every 10,000 records, for example. (Nimbus, of course, is much, much too pure to stoop to hacks like this.)

The comparison wasn't apples to apples since LOAD DATA INFILE doesn't go through the same execution path as SQL statements. That said, Falcon, InnoDB, and Nimbus where within 5% (Nimbus loading on a single SQL node). The actual order of finish:

  1. Falcon with file pre-extended (i.e. second run)
  2. Nimbus
  3. InnoDB (no different between first and second run)

All systems were loading around 50,000 records per second, which, in my humble opinion, is pretty damn fast.

'Twere I doing drizzle, I'd drop the LOAD DATA INFILE as a lipstick-less pig, put in an analog to addBatch/executeBatch, put in a client library a) parse separator delimited files (and maybe others, like the native black hole storage format), an a load utility using the library and extended API.

But then, I'm not doing drizzle, so...

_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Reply via email to