Caveat lector: This is a lengthy email but I have put some thought into it so it could be rewarding.

I've been lately thinking about constraint enforcement.
Currently there is a logical problem concerning the CHECK constraint.

At the moment constraints belong to either a column or a table.
With UNIQUE or FOREIGN KEY this is not a problem because they naturally operate on a single table (well foreign key does have it's repercussions on the referred table, too).

But since CHECK constraint supports arbitrary expressions (which is a good thing) it doesn't just belong to a single column or table but more generically to the database in general (like table or view).

Thus the guarding of the constraint is only partial and leads inevitably to integrity violations.

The correct way to enforce the check constraint would be to associate it to all tables to which it refers. And everytime a table that belongs to a constraint is modified the constraint would be checked.

This got me also thinking that currently checking check constraint both at column and table level is duplication

For example the following two check constraints are exactly the same.

1) Column level:
CREATE TABLE T (COL INT CHECK COL > (SELECT SUM(COL2) FROM T2) )
2) Table level
CREATE TABLE T (COL INT, CONSTRAINT C CHECK (SELECT SUM(COL2) FROM T2) <ANY (SELECT COL FROM T))

And both of them fail to enforce the constraint when table T2 is modified.
From the second formulation we can see that the constraint depends on.
T.COL and T2.COL2 so whenever either one is modified the constraint should be checked.

I suppose these dependencies could be retrieved something like

HashSet<DbObject> deps = HashSet<DbObject>();
expression.isEverything(ExpressionVisitor.getDependenciesVisitor(deps));

But here we confront another big issue.

Constraints are currently checked before the modification is actually done.

For example with INSERT the column constraints are checked at
line 120: table.validateConvertUpdateSequence(session, newRow);
This call takes also care of of computed columns in the new row.
Then table constraints are checked right after it at
line 121: boolean done = table.fireBeforeRow(session, null, newRow);

In this sequence the constraint would not check the state of affairs after the modification but before. And therefore it would not catch a constraint violation with the new row but without it (which is pointless of course).

So I would suggest that the CHECK constraint checks should be moved after the modification.

Lines 124-126 in Insert are
table.addRow(session, newRow);
session.log(table, UndoLogRecord.INSERT, newRow);
table.fireAfterRow(session, null, newRow, false);

This can be done simply by changing isBefore() method in CheckConstraint to return false.

Which raises the question:
Why are other constraints before but referential constraint after?
Is it because of the same reason that I have been talking about?

And finally we come to the question of performance.

It is clear that when the constraint is defined at the column level (example #1) then we only need to run select
SELECT SUM(COL2) FROM T2
once for the inserted row.
But if the constraint is formulated like in example 2
CREATE TABLE T (COL INT, CONSTRAINT C CHECK (SELECT SUM(COL2) FROM T2) <ANY (SELECT COL FROM T)) then in order to run the constraint check as efficiently we would need to optimize the CHECK statement to (SELECT SUM(COL2) FROM T2) <ANY (SELECT COL FROM T WHERE COL=<THE_NEW_VALUE>)
But if the insert is done to table T2 then all rows in T must be checked.
There is no way around it.
The optimization in my particular example could be done by the user by formulating the constraint like this CREATE TABLE T (COL INT, CONSTRAINT C CHECK (SELECT SUM(COL2) FROM T2) < (SELECT MAX(COL) FROM T)) But there is no generic way to optimize all constraints in all cases and sometimes enforcing a constraint just happens to be costly.

Anyway the first optimization could be done in this case but if the constraint were CREATE TABLE T (COL INT, CONSTRAINT C CHECK (SELECT SUM(COL2) FROM T2) < (SELECT SUM(COL) FROM T))
there would not be any room for that kind of optimization.

One way to keep this optimization when T is updated would be to keep the constraint associated with the column but create another generic version of the constraint that is associated with other tables that are mentioned in the constraint. But this would complicate the code in my opinion and make it hard to understand.

A more generic way would be to try to optimize the query having the changed row at hand. If the row belongs to a subquery where there are NO AGGREGATIONS then the optimization could be inserted into the query. Like I just showed: (SELECT SUM(COL2) FROM T2) <ANY (SELECT COL FROM T WHERE COL=<THE_NEW_VALUE>)
The added portion being WHERE COL=<THE_NEW_VALUE>

But here we would need to pick a column from table T that is used.
If the table has a primary key (or any key) then the first key column(s) could be chosen. But if it does not have any key columns then I think this optimization becomes impossible. In that case the constraint query can still be executed. It would just run a bit slower.

Many details are still unclear to me but I wanted to share my ideas so that those interested can give me advice, help and opinions.

If this idea works well then my next development idea would be to include check constraints when turning off referential constraints in the database. The reason being that it is very difficult to import data to a database that has these kinds of complex constraints.

And finally I would try to improve the functionality of turning the constraints back on. There it would be a wonderful feature that when turning them on the DBMS would go over all the data and check that those constraints are respected. If not then it would return a result set that details all the constraint violations.

The user could fix these violations and then try turning on the constraints back on again.
This would make users more secure about the consistency of their data.

- Rami

--
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en.

Reply via email to