[sqlite] updating SQLite to implement The Third Manifesto

Darren Duncan Fri, 10 Mar 2006 16:37:47 -0800

All, and particularly Dr. Hipp,

Lately, my own database project in Perl, named Rosetta, has evolvedto officially be an implementation of Chris Date's and Hugh Darwen'sproposal for relational databases called "The Third Manifesto", whichis talked about at http://www.thethirdmanifesto.com/ and in variousbooks and papers of theirs.

Rosetta has its own API which defines a "D" relational databaselanguage, as apposed to a SQL language, and Rosetta's back-ends toimplement this are interchangeable. I am making a pure Perlreference implementation (called Rosetta::Engine::Example) which iscoded simply for ease of understanding but that is slow.

Separately, I or third parties would be making alternate back-endsthat either are self-implemented and better performing, or thatconstitute wrappers over existing database products, usually SQLbased, since those are fairly mature and plentiful. SQLite is one ofthe first such back-ends to be used.

Now, I would like to propose, and if necessary I will contributesignificant targeted funding (when I have the money) to pay Dr. Hippand/or other developers, some significant feature changes (as a forkif necessary) to SQLite such that it directly implements and gainsthe efficiencies of The Third Manifesto. This includes both theaddition of and the removal of current features, and certainbehaviours would change. Hopefully all for the better.

As a result of these changes, not only would SQLite better serve as aback-end of Rosetta, but non-Rosetta users of SQLite would get themost critical of the same benefits from it directly.

I anticipate that the changes would mainly affect the upper layers,which convert user commands into virtual machine code, but that thevirtual machine and b-tree and OS layers would remain more or lessunchanged (this depends, of course, on a few details). Possibly, wewould add a new command language.

I am hoping that, to keep longer term maintenance easier, thesechanges can be implemented in the trunk and activated using eitherrun time pragmas or compile time options or both. But if they wouldrequire a fork, then the forked product would have to be namedsomething else that doesn't have 'SQL' in its name, since SQL doesnot satisfy The Third Manifesto. Maybe 'TTMLite' or something thatsounds better.

Here are some of the changes that I propose the pragma or compiletime option or fork would have; they all refer to what the user sees,not to implementation details that should be hidden:

1. Add a distinct logical BOOLEAN data type. It is the data type ofoutput from logical expressions like comparisons, and the input to'and', 'or', etc.


2.  Have strong and strict data typing for both variables and values.

2.1 Table columns are always declared to be of a specific type (eg:BOOLEAN, INTEGER, REAL, TEXT, BLOB) and nothing but values of thesame type can be stored in them; attempting to do otherwise wouldfail with an exception.


2.2  The plain equality test is supported for all data types.

2.3 All operators/functions have strongly typed parameters andreturn values, and invoking them with arguments that aren't of theright type will fail with an exception. The equality test likewisecan only compare operands of the same type.

2.4 There is no implicit type conversion; data types must beexplicitly converted from one type to another.

2.5 INTEGER and REAL data types have separate sets of operators,which do the expected thing with their types. For example, each hasa separate division operator whose input and output are all of thatsame type. No worrying about when to round or not.

2.6 SQLite may already be this way, but: All characters in a stringare significant, including whitespace, so 'a' and 'a ' are alwaysunequal.


3.  There is no such thing as a NULL.

3.1  All logic is 2VL (true, false) not 3VL (true, false, unknown).

3.2 Every variable of a particular type always contains a value thatis valid for that type, so logic for dealing with it is simpler.Likewise, every with every literal value.


3.3  The code to implement operators is a lot simpler.

3.4 Missing data can be either represented with the data type'sempty value, or a table column that may possibly be unknown can besplit into a separate related table, that only has records when thevalue is known.

3.5 All variables default to a reasonable valid value for their typeif not explicitly set, such as the number zero or the empty string.

4. There is no significant hidden data. A row id can only be anexplicitly declared table column. The implementation of a table canuse hidden row ids, but the user wouldn't see them.


5.  No duplicate rows in tables or queries are allowed.

5.1 In SQL terms, every table has an implicit unique key constraintover all of its columns. This is ignored if there are any actualexplicit keys, whether primary or otherwise. In TTM terms, it isimpossible by definition to have duplicate rows.

5.2 The results of all stages of queries do not contain duplicaterows. In SQL terms, every query or subquery has an implicit'distinct' or 'group by all' clause on it. No joins produceduplicates. No unions etc do either.

5.3 By doing this and #3, all queries that look like they shouldreturn the same results actually do, whereas in SQL they may returndifferent results in the presence of duplicates or nulls. Queriescan also be simpler.

6. Columns in tables and views and query results have no ordinalvalue; they all have names and are referred to using only thosenames. Moreover, every column must have a different name from everyother column.

7. Rows in tables and views and query results have no ordinal value;they are referenced by relational expressions that match on thevalues of columns, like in a SQL where-clause.

7.1 An order-by or limit clause only makes sense in an outer-mostquery, right when results are being returned from the database to theapplication, where it then specifies the order to return otherwiseorder-less rows.

In doing all of the above, SQLite should actually be simpler toimplement, and it will be easier to use, with more predictableresults and fewer bugs.


This next one can be implemented separately from all the other suggestions:

8. Add some standard relational logic operators that can be combinedand nested to get all the power of selects and more, with lesseffort, such as any of the following you don't already have:restrict, project, join, product, union, intersection, difference,divide, rename.

8.1 The simplest join syntax, such as an unqualified comma-delimitedlist, would perform a natural join by default. Or we could more orless just have natural joins (and cartesian products, 'product') asthe only kind of join.

8.2 Using these instead of 'select' should allow for easierimplementation and optimization; for one thing, the expressions aremore associative or commutative.


This next one can be implemented separately from all the other suggestions:

9. Support nested/child transactions, such as a 'begin transaction'inside another one, which can make things a lot easier forapplications; they have to worry less about whether a transactionalready exists before starting another one. These are functionallysort of like save-points in SQL, in that even if an inner transactioncommits, it is still thrown away if the outer transaction rolls back.To implement this best, you would probably need multiple (cascading?)journal files, one per transaction level.

Following are also features of The Third Manifesto, but can possiblybe left out of SQLite in accordance with its Lite nature:

1. All views are updateable like they were tables. From the user'spoint of view, tables and views are the same sort of thing in howthey can be used.

2. Tables can be assigned to directly like they were variables, andinsert/update/delete is actually a short-hand for this. Eg, aninsert is equivalent to an assignment to a table of the table's oldvalue unioned with the rows being inserted. Supporting this allowsusers to define arbitrarily flexible updating operations, such as"replace or add" and such.

3. The system catalog tables can be updated directly using datadefinition language, which results in the schema being updated. Eg,you can use insert statements to create a table rather than a createstatement.


4.  Support definition and use of custom data types.

5. It should not be necessary to explicitly declare indexes to helpwith speed.

6. Generally speaking, users should not have to know aboutimplementation details, but rather just express what their dataactually means.


Okay, that's about all for this initial proposal email.

Ultimately, I believe that the core of my proposal involvessimplifying SQLite, making it leaner and meaner, and also reducespossible or actual bugs or difficulty in understanding.

At the very least, I hope that the trunk would have the pragma orcompile option that essentially strips out current features likenulls and other ambiguity, so essentially we have a restricted orsimplified SQL.

I also bring this up because I would expect that SQLite should beable to perform faster when it doesn't handle nulls or duplicates orweak data types than if it does. The conceptual logic is simplerwhen we don't have those, and the implementation code should also besimpler, and perform faster, since there are fewer possibilities tocheck at logical decision points. And it should be easier tooptimize queries.

So even if no incompatible changes are made, I would hope that it ispossible to optimize for the simplest case.


-- Darren Duncan

[sqlite] updating SQLite to implement The Third Manifesto

Reply via email to