Sedillo, Derek (Mission Systems) wrote:
Dyre,
The goal is to find the most efficient/optimized way to insert large
amounts of data into Derby. For example in working as an Oracle DBA I
have discovered that I can bulk load data from ProC using an array of C
Structures in one insert statement like this:
INSERT INTO SSJ
VALUES (:tmp_ssj_data); // Where tmp_ssj_data is an array (100s or
1000s) of structured records
This approach greatly enhances performance for large data inserts which
we perform regularly. My original question is how can I do something
similar with Derby.
While I realize there are 'basic' constructs for performing a task, they
are not 'normally' optimal. For example performing 1000 separate
transactions is less efficient than one. See Daniel's numbers below.
I assume that Daniel's non-batch numbers below are generated using
code similar to that at then end of this message, ie. set autocommit to
false, loop doing inserts, commit after some number of inserts. Derby
is different than other db's when using it in embedded mode, some
optimizations necessary in client/server applications are not necessary
in Derby.
As you can see in the results Derby currently does not optimize the
batch processing in embedded so if you are embedded I would use code
similar to
what is included below. The biggest performance win is going to be
to insert as many rows as possible between each commit. Each commit
will do a synchronous I/O and wait for it to hit disk, so on modern
processors you will quickly become I/O wait bound unless you make
commits big enough. "Big enough" depends on the data, for instance
below it looks like the optimal number may be bigger than 100000,
but the difference between 10k and 100k is not much (6250 rows/sec for
10k, and 6924 rows/sec).
Along these lines I did some testing for our own application.
Cached PreparedStatement but one transaction per insert:
100 tags added in 399ms
100 tags removed in 160ms
1000 tags added in 1163ms
1000 tags removed in 873ms
10000 tags added in 6094ms
10000 tags removed in 6840ms
100000 tags added in 58563ms
100000 tags removed in 67342ms
All in one transaction using executeUpdate():
100 tags added in 274ms
100 tags removed in 70ms
1000 tags added in 299ms
1000 tags removed in 250ms
10000 tags added in 1605ms
10000 tags removed in 1500ms
100000 tags added in 14441ms
100000 tags removed in 19721ms
All in one transaction using addBatch()/executeBatch():
100 tags added in 290ms
100 tags removed in 76ms
1000 tags added in 316ms
1000 tags removed in 258ms
10000 tags added in 1621ms
10000 tags removed in 1927ms
100000 tags added in 14971ms
100000 tags removed in 19320ms
- Derek
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Friday, March 16, 2007 9:00 AM
To: Derby Discussion
Subject: Re: Large multi-record insert performance
I haven't followed this thread closely, but is there a reason why you
cannot do:
PreparedStatement ps =
conn.prepareStatement("insert into T values (?, ?, ?)");
conn.setAutoCommit(false);
for (int i = 0; i < 1000; ++i) {
ps.setX(1, <whatever>);
ps.setX(2, <whatever>);
ps.setX(3, <whatever>);
ps.executeUpdate();
}
conn.commit();
this is the code I would start with for iterative inserts in embedded
systems, with the loop as big as makes sense for the application.
?
--
dt