Andrey Hristov wrote:
Hi Brian,
there could be problems as the client just pumps data to the server
and reads a response after sending everything. Similar to
STMT_SEND_LONG_DATA which is pretty faulty in design (doesn't send
back ACK after receiving a chunk of data).
Funny, I just took this problem on in Nimbus. What I really wanted to
do was compare load times for the dbt2 benchmark databases between
Nimbus, Falcon, and InnoDB.
The MySQL dbt2 scripts use LOAD DATA INFILE. I wasn't going to put
something that warty into Nimbus, even for an apples to apples
comparison, so an alternative. I settled on the JDBC
addBatch/executeBatch mechanism. For prepared statements it allows any
number of rows to be "added" on the client. The work happens on the
executeBatch, which returns a vector of result codes indicating what
actually happened plus a code to say whether the server stopped on an
error or finished the batch. It's almost nothing to implement, layers
on existing protocol nicely, and is better in all respects that LOAD DATA.
And, by the way, LOAD DATA doesn't check for nulls on fields declared as
non null. The dbt2 data is crap, of course, so my load app had to work
around this. Should drizzle be bug for bug compatible with MySQL on this?
On the engine side, LOAD data sets the handle "batch load" flag that
lets transactional storage engines cheat on transactions. Both InnoDB
and Falcon do an implicit commit every 10,000 records, for example.
(Nimbus, of course, is much, much too pure to stoop to hacks like this.)
The comparison wasn't apples to apples since LOAD DATA INFILE doesn't go
through the same execution path as SQL statements. That said, Falcon,
InnoDB, and Nimbus where within 5% (Nimbus loading on a single SQL
node). The actual order of finish:
1. Falcon with file pre-extended (i.e. second run)
2. Nimbus
3. InnoDB (no different between first and second run)
All systems were loading around 50,000 records per second, which, in my
humble opinion, is pretty damn fast.
'Twere I doing drizzle, I'd drop the LOAD DATA INFILE as a lipstick-less
pig, put in an analog to addBatch/executeBatch, put in a client library
a) parse separator delimited files (and maybe others, like the native
black hole storage format), an a load utility using the library and
extended API.
But then, I'm not doing drizzle, so...
_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help : https://help.launchpad.net/ListHelp