Improving data insert performance

Nathan Boy Wed, 13 Jan 2010 08:02:02 -0800

Hello,
   I have an embedded database application that generally involves
inserting somewhere between 50k and 1000k rows of data into a
database, and then analyzing and querying that data afterwards.  The
data goes into about ten tables, but the bulk of the data is in just a
few of them.  I run my database with "test" durability, and I add all
of the primary key and foreign key constraints after the data is
inserted into the database.  Currently all of the data is inserted
using prepared statements executed in batches, and this gives me
between 10 and 20 row inserts per millisecond on average.  I have
spent quite a bit of time optimizing the insert step, and while I was
doing research I came across this discussion from last year:


http://www.mail-archive.com/[email protected]/msg10194.html

The discussion suggests using bulk import as a way to speed up this
initial insert step.  Unfortunately, I cannot use the built in import
functions, as my data includes Timestamps with nanosecond granularity.
 As far as I can tell, there is no way to convince derby to parse a
time specified down to the nanosecond.  In one of the emails, someone
suggested that you can get bulk import performance by using a table
function, and then running a query like "INSERT INTO MYTABLE (...)
SELECT S.* FROM TABLE (MYFUNC ()) S".  In my tests, however, this
doesn't seem to perform the insert any faster than simply inserting
the rows one at a time with a prepared statement.  I think this may be
because I don't have a way to set the 'insertMode=bulkImport'
property, which the bulk import system procedure is allowed to do.
Does anyone know of a way to work around this, or of a better way to
get my data into the database as quickly as possible?  Thanks in
advance for your time.

Cheers,
Nathan Boy

Improving data insert performance

Reply via email to