Re: faster inserts in big tables

Rick Hillegas Fri, 12 Dec 2008 09:10:49 -0800

Knut Anders Hatlen wrote:

Rick Hillegas <[email protected]> writes:

Knut Anders Hatlen wrote:

Rick Hillegas <[email protected]> writes:

Hi Brian,

In a previous response Peter Ondruška noted that you could use bulk
import to speed up your inserts if your source data lived in a
properly formatted file.

Even if your source data does not live in a file, you can still get
bulk import speed (and the benefits of your generated key) by using
table functions--provided that you upgrade to 10.4. If you wrap your
source data in a table function, then you can bulk insert your data as
follows:

insert into MY_TABLE( NON_GEN_COL1, ... NON_GEN_COL_N )
select * from table( MY_TABLE_FUNCTION() ) s

The internal SQL syntax allows you to add an optimizer override here
(--DERBY-PROPERTIES insertMode=bulkInsert) which I think speeds it up
even more, though I'm not sure exactly what it buys you. This override
is used in the import code and in the SYSCS_UTIL.SYSCS_BULK_INSERT
system procedure, but you're not allowed to use it in your own SQL
queries, it seems. The (undocumented) SYSCS_BULK_INSERT procedure does
exactly what Rick's example does, except that it also adds the optimizer
override.

The procedure only works on old-style table functions, though. Does
anyone know what performance gains one can expect by using
SYSCS_BULK_INSERT instead of INSERT INTO ... SELECT FROM? Would it be
worth the effort to create a similar mechanism for the new-style table
functions?

Hi Knut,

It appears to me that this setting (insertMode=bulkInsert) is
inspected when binding an INSERT statement. If the bind() logic finds
this setting, then it escalates the lock mode for the target table to
be a table level lock. Other than that, I cannot find any other use of
this setting. Comments around this code, however, indicate that there
might once have been other implications and that this setting, which
is now unconditionally accepted, used to be silently rejected in
certain situations, including insertion into a synchronized client
database, insertion into a table which has triggers, and deferred mode
insertion (see the comments on InsertNode.verifyBulkInsert).

It appears to me that the benefit now conferred by this setting can be
achieved by setting the lock mode on the table to be table level. So
you would preface the INSERT statement above with the following
statement:

lock MY_TABLE in exclusive mode

That, at least, is what I have gleaned from a code inspection.


Thanks Rick,

Did you also look at the code in InsertResultSet.open()? It looks like
it has some special handling of triggers, forreign keys and checking of
constraints when running in bulk insert mode.

Hi Knut,

Thanks for finding that. It looks like InsertResultSet optimizes theperformance of bulk inserts as follows:

1) Claims to turn off row logging. That is, performs the insert viaTransactionController.createAndLoadConglomerate() andrecreateAndLoadConglomerate(). Perhaps you could take a look at thosemethods and verify what they really do.

2) Bulk-enforces foreign keys defined on the table. This is an optimizedzippering together of the indexes which support a foreign/primary keypair. As I look at this code, however, it seems to me that you mayactually get worse performance if you are bulk inserting (not replacing)a small number of rows into a table which already has a lot of rows in it.

There is some additional trickiness dealing with the interaction ofBEFORE triggers and CHECK constrains, but (famous last words) it appearsto me that the processing is still row-by-row in the language layer so Idon't see any optimization involved here.

I guess the bottom line is that with bulk insert, you get betterperformance because logging is disabled. The bulk enforcement of foreignkeys may improve or degrade your performance.


So those are a few more clues.

Cheers,
-Rick

Re: faster inserts in big tables

Reply via email to