Check Constraint

Rami Ojares Fri, 26 Oct 2012 03:53:20 -0700

Caveat lector: This is a lengthy email but I have put some thought intoit so it could be rewarding.


I've been lately thinking about constraint enforcement.
Currently there is a logical problem concerning the CHECK constraint.


At the moment constraints belong to either a column or a table.

With UNIQUE or FOREIGN KEY this is not a problem because they naturallyoperate on a single table (well foreign key does have it's repercussionson the referred table, too).

But since CHECK constraint supports arbitrary expressions (which is agood thing)it doesn't just belong to a single column or table but more genericallyto the database in general (like table or view).

Thus the guarding of the constraint is only partial and leads inevitablyto integrity violations.

The correct way to enforce the check constraint would be to associate itto all tables to which it refers.And everytime a table that belongs to a constraint is modified theconstraint would be checked.

This got me also thinking that currently checking check constraint bothat column and table level is duplication


For example the following two check constraints are exactly the same.

1) Column level:
CREATE TABLE T (COL INT CHECK COL > (SELECT SUM(COL2) FROM T2) )
2) Table level

CREATE TABLE T (COL INT, CONSTRAINT C CHECK (SELECT SUM(COL2) FROM T2)<ANY (SELECT COL FROM T))


And both of them fail to enforce the constraint when table T2 is modified.
From the second formulation we can see that the constraint depends on.

T.COL and T2.COL2 so whenever either one is modified the constraintshould be checked.


I suppose these dependencies could be retrieved something like

HashSet<DbObject> deps = HashSet<DbObject>();
expression.isEverything(ExpressionVisitor.getDependenciesVisitor(deps));

But here we confront another big issue.

Constraints are currently checked before the modification is actually done.

For example with INSERT the column constraints are checked at
line 120: table.validateConvertUpdateSequence(session, newRow);
This call takes also care of of computed columns in the new row.
Then table constraints are checked right after it at
line 121: boolean done = table.fireBeforeRow(session, null, newRow);

In this sequence the constraint would not check the state of affairsafter the modification but before.And therefore it would not catch a constraint violation with the new rowbut without it (which is pointless of course).

So I would suggest that the CHECK constraint checks should be movedafter the modification.


Lines 124-126 in Insert are
table.addRow(session, newRow);
session.log(table, UndoLogRecord.INSERT, newRow);
table.fireAfterRow(session, null, newRow, false);

This can be done simply by changing isBefore() method in CheckConstraintto return false.


Which raises the question:
Why are other constraints before but referential constraint after?
Is it because of the same reason that I have been talking about?

And finally we come to the question of performance.

It is clear that when the constraint is defined at the column level(example #1) then we only need to run select

SELECT SUM(COL2) FROM T2
once for the inserted row.
But if the constraint is formulated like in example 2

CREATE TABLE T (COL INT, CONSTRAINT C CHECK (SELECT SUM(COL2) FROM T2)<ANY (SELECT COL FROM T))then in order to run the constraint check as efficiently we would needto optimize the CHECK statement to(SELECT SUM(COL2) FROM T2) <ANY (SELECT COL FROM T WHERECOL=<THE_NEW_VALUE>)

But if the insert is done to table T2 then all rows in T must be checked.
There is no way around it.

The optimization in my particular example could be done by the user byformulating the constraint like thisCREATE TABLE T (COL INT, CONSTRAINT C CHECK (SELECT SUM(COL2) FROM T2) <(SELECT MAX(COL) FROM T))But there is no generic way to optimize all constraints in all cases andsometimes enforcing a constraint just happens to be costly.

Anyway the first optimization could be done in this case but if theconstraint wereCREATE TABLE T (COL INT, CONSTRAINT C CHECK (SELECT SUM(COL2) FROM T2) <(SELECT SUM(COL) FROM T))

there would not be any room for that kind of optimization.

One way to keep this optimization when T is updated would be to keep theconstraint associated with the column but create another generic versionof the constraint that is associated with other tables that arementioned in the constraint. But this would complicate the code in myopinion and make it hard to understand.

A more generic way would be to try to optimize the query having thechanged row at hand. If the row belongs to a subquery where there are NOAGGREGATIONS then the optimization could be inserted into the query.Like I just showed:(SELECT SUM(COL2) FROM T2) <ANY (SELECT COL FROM T WHERECOL=<THE_NEW_VALUE>)

The added portion being WHERE COL=<THE_NEW_VALUE>

But here we would need to pick a column from table T that is used.

If the table has a primary key (or any key) then the first key column(s)could be chosen.But if it does not have any key columns then I think this optimizationbecomes impossible.In that case the constraint query can still be executed. It would justrun a bit slower.

Many details are still unclear to me but I wanted to share my ideas sothat those interested can give me advice, help and opinions.

If this idea works well then my next development idea would be toinclude check constraints when turning off referential constraints inthe database. The reason being that it is very difficult to import datato a database that has these kinds of complex constraints.

And finally I would try to improve the functionality of turning theconstraints back on.There it would be a wonderful feature that when turning them on the DBMSwould go over all the data and check that those constraints arerespected. If not then it would return a result set that details all theconstraint violations.

The user could fix these violations and then try turning on theconstraints back on again.

This would make users more secure about the consistency of their data.

- Rami

--
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en.

Check Constraint

Reply via email to