[castor-dev] new feature: batch updates

Oleg Nitz Wed, 04 Dec 2002 02:50:55 -0800

Hi All, 
Hi Keith,

Let me announce a new feature that I have implemented and want to commit.
But first I'd like to ask Keith: when the next release of Castor is planned?
It would be better if I commit after the new release will be tagged in CVS.
Okay, now about the new feature.
I provide the implementation of the Connection interface, which wraps the real 
Connection instance. Most of methods just call the same method of the real 
Connection. The prepareStatement() method checks the sql string, if it starts 
with UPDATE or DELETE or it starts with INSERTS and "useBatchInserts" 
attribute is set to true, then it returns a special implementation of 
PreparedStatement, which remembers all method calls and their parameters, but 
doesn't change the database. The real PreparedStatement for each sql is 
created on commit(), all remembered methods are called for it, but instead of 
executeUpdate() it is called addBatch(), then executeBatch(). Thus all 
PreparedStatements with the same SQL string are authomatically gathered into 
one batch. This reduced the number of database server hits, which means less 
network hits and better performance, at least if database server runs on 
another machine. 
However there are some tricks:


1) INSERTs of independent objects should be executed immediately since later
in the same transaction user may execute OQLQuery that should find the
created objects in the database. So I set useBatchInserts to "false" for such
objects, and to "true" for dependent objects.

2) Some key generators (MAX, IDENTITY, SEQUENCE for oracle and sapdb) cannot 
work with batch. I added method isBatchInsertAllowed() to the KeyGenerator
interface. If you use one of these key generators, only UPDATEs and DELETEs 
will be gathered into batches. Here is the list of key generators that allows 
batch INSERTs: HIGH_LOW, UUID, SEQUENCE for postgresql, db2, interbase.
MAX key generator algorithm can't be used with batch INSERTs for the following 
reason: usually it works like this
  MaxKeyGenerator.generateKey() SELECT MAX(pk) -> 1
  INSERT ... 1
  MaxKeyGenerator.generateKey() SELECT MAX(pk) -> 2
  INSERT ... 2
With batch it will generate the same value every time
  MaxKeyGenerator.generateKey() SELECT MAX(pk) -> 1
  MaxKeyGenerator.generateKey() SELECT MAX(pk) -> 1
  INSERT ... bach {1, 1} -> DuplicateIdentityException
IDENTITY, SEQUENCE for oracle and sapdb works either AFTER_INSERT or 
DURING_INSERT, thus they require that INSERT is executed immediately.

3) ObjectModifiedException is not thrown during TransactionContext.prepare() 
as it was before, since UPDATEs are actually executed during commit(), 
so I added some code to TransactionContext.commit(): it rethrows the 
ObjectModifiedException without wrapping it by TransactionAbortedException 
and removes all modified objects from the cache, because at least one of them 
is "dirty" and I don't know which one. However there are no changes at the 
level of Database interface, ObjectModifiedException is still thrown during 
commit().

4) The order of execution and constraints. Assume that we have three tables: 
Master, Detail1 and Detail2. The usual order of INSERTs is:
INSERT INTO Master (id) VALUES (?) / params=[1]
INSERT INTO Detail1 (id, master_id) VALUES (?,?) /params=[1,1]
INSERT INTO Master (id) VALUES (?) / params=[2]
INSERT INTO Detail2 (id, master_id) VALUES (?,?) /params=[1,1]
For INSERTs and UPDATEs we take the order of first appearance of SQL string:
INSERT INTO Master (id) VALUES (?) / batch params={[1], [2]}
INSERT INTO Detail1 (id, master_id) VALUES (?,?) / batch params={[1,1]}
INSERT INTO Detail2 (id, master_id) VALUES (?,?) / batch params={[2,2]}
The usual order of DELETEs is
DELETE FROM Detail1 WHERE id=? / params=[1]
DELETE FROM Master WHERE id=? / params=[1]
DELETE FROM Detail2 WHERE id=? / params=[2]
DELETE FROM Master WHERE id=? / params=[2]
For DELETEs we take the order of last appearance of SQL string:
DELETE FROM Detail1 WHERE id=? / batch params={[1]}
DELETE FROM Detail2 WHERE id=? / batch params={[2]}
DELETE FROM Master WHERE id=? / batch params={[1], [2]}

Of course this stuff works only if JDBC driver supports batch operations.
Also I have added org.exolab.castor.jdo.batch parameter in castor.properties
(true by default) which allows to switch off batch updates.

Regards,
  Oleg

----------------------------------------------------------- 
If you wish to unsubscribe from this mailing, send mail to
[EMAIL PROTECTED] with a subject of:
        unsubscribe castor-dev

[castor-dev] new feature: batch updates

Reply via email to