Knut Anders Hatlen wrote:
Rick Hillegas <[email protected]> writes:
Knut Anders Hatlen wrote:
Rick Hillegas <[email protected]> writes:
Hi Brian,
In a previous response Peter Ondruška noted that you could use bulk
import to speed up your inserts if your source data lived in a
properly formatted file.
Even if your source data does not live in a file, you can still get
bulk import speed (and the benefits of your generated key) by using
table functions--provided that you upgrade to 10.4. If you wrap your
source data in a table function, then you can bulk insert your data as
follows:
insert into MY_TABLE( NON_GEN_COL1, ... NON_GEN_COL_N )
select * from table( MY_TABLE_FUNCTION() ) s
The internal SQL syntax allows you to add an optimizer override here
(--DERBY-PROPERTIES insertMode=bulkInsert) which I think speeds it up
even more, though I'm not sure exactly what it buys you. This override
is used in the import code and in the SYSCS_UTIL.SYSCS_BULK_INSERT
system procedure, but you're not allowed to use it in your own SQL
queries, it seems. The (undocumented) SYSCS_BULK_INSERT procedure does
exactly what Rick's example does, except that it also adds the optimizer
override.
The procedure only works on old-style table functions, though. Does
anyone know what performance gains one can expect by using
SYSCS_BULK_INSERT instead of INSERT INTO ... SELECT FROM? Would it be
worth the effort to create a similar mechanism for the new-style table
functions?
Hi Knut,
It appears to me that this setting (insertMode=bulkInsert) is
inspected when binding an INSERT statement. If the bind() logic finds
this setting, then it escalates the lock mode for the target table to
be a table level lock. Other than that, I cannot find any other use of
this setting. Comments around this code, however, indicate that there
might once have been other implications and that this setting, which
is now unconditionally accepted, used to be silently rejected in
certain situations, including insertion into a synchronized client
database, insertion into a table which has triggers, and deferred mode
insertion (see the comments on InsertNode.verifyBulkInsert).
It appears to me that the benefit now conferred by this setting can be
achieved by setting the lock mode on the table to be table level. So
you would preface the INSERT statement above with the following
statement:
lock MY_TABLE in exclusive mode
That, at least, is what I have gleaned from a code inspection.
Thanks Rick,
Did you also look at the code in InsertResultSet.open()? It looks like
it has some special handling of triggers, forreign keys and checking of
constraints when running in bulk insert mode.
Hi Knut,
Thanks for finding that. It looks like InsertResultSet optimizes the
performance of bulk inserts as follows:
1) Claims to turn off row logging. That is, performs the insert via
TransactionController.createAndLoadConglomerate() and
recreateAndLoadConglomerate(). Perhaps you could take a look at those
methods and verify what they really do.
2) Bulk-enforces foreign keys defined on the table. This is an optimized
zippering together of the indexes which support a foreign/primary key
pair. As I look at this code, however, it seems to me that you may
actually get worse performance if you are bulk inserting (not replacing)
a small number of rows into a table which already has a lot of rows in it.
There is some additional trickiness dealing with the interaction of
BEFORE triggers and CHECK constrains, but (famous last words) it appears
to me that the processing is still row-by-row in the language layer so I
don't see any optimization involved here.
I guess the bottom line is that with bulk insert, you get better
performance because logging is disabled. The bulk enforcement of foreign
keys may improve or degrade your performance.
So those are a few more clues.
Cheers,
-Rick