All, and particularly Dr. Hipp,

Lately, my own database project in Perl, named Rosetta, has evolved to officially be an implementation of Chris Date's and Hugh Darwen's proposal for relational databases called "The Third Manifesto", which is talked about at http://www.thethirdmanifesto.com/ and in various books and papers of theirs.

Rosetta has its own API which defines a "D" relational database language, as apposed to a SQL language, and Rosetta's back-ends to implement this are interchangeable. I am making a pure Perl reference implementation (called Rosetta::Engine::Example) which is coded simply for ease of understanding but that is slow.

Separately, I or third parties would be making alternate back-ends that either are self-implemented and better performing, or that constitute wrappers over existing database products, usually SQL based, since those are fairly mature and plentiful. SQLite is one of the first such back-ends to be used.

Now, I would like to propose, and if necessary I will contribute significant targeted funding (when I have the money) to pay Dr. Hipp and/or other developers, some significant feature changes (as a fork if necessary) to SQLite such that it directly implements and gains the efficiencies of The Third Manifesto. This includes both the addition of and the removal of current features, and certain behaviours would change. Hopefully all for the better.

As a result of these changes, not only would SQLite better serve as a back-end of Rosetta, but non-Rosetta users of SQLite would get the most critical of the same benefits from it directly.

I anticipate that the changes would mainly affect the upper layers, which convert user commands into virtual machine code, but that the virtual machine and b-tree and OS layers would remain more or less unchanged (this depends, of course, on a few details). Possibly, we would add a new command language.

I am hoping that, to keep longer term maintenance easier, these changes can be implemented in the trunk and activated using either run time pragmas or compile time options or both. But if they would require a fork, then the forked product would have to be named something else that doesn't have 'SQL' in its name, since SQL does not satisfy The Third Manifesto. Maybe 'TTMLite' or something that sounds better.

Here are some of the changes that I propose the pragma or compile time option or fork would have; they all refer to what the user sees, not to implementation details that should be hidden:

1. Add a distinct logical BOOLEAN data type. It is the data type of output from logical expressions like comparisons, and the input to 'and', 'or', etc.

2.  Have strong and strict data typing for both variables and values.

2.1 Table columns are always declared to be of a specific type (eg: BOOLEAN, INTEGER, REAL, TEXT, BLOB) and nothing but values of the same type can be stored in them; attempting to do otherwise would fail with an exception.

2.2  The plain equality test is supported for all data types.

2.3 All operators/functions have strongly typed parameters and return values, and invoking them with arguments that aren't of the right type will fail with an exception. The equality test likewise can only compare operands of the same type.

2.4 There is no implicit type conversion; data types must be explicitly converted from one type to another.

2.5 INTEGER and REAL data types have separate sets of operators, which do the expected thing with their types. For example, each has a separate division operator whose input and output are all of that same type. No worrying about when to round or not.

2.6 SQLite may already be this way, but: All characters in a string are significant, including whitespace, so 'a' and 'a ' are always unequal.

3.  There is no such thing as a NULL.

3.1  All logic is 2VL (true, false) not 3VL (true, false, unknown).

3.2 Every variable of a particular type always contains a value that is valid for that type, so logic for dealing with it is simpler. Likewise, every with every literal value.

3.3  The code to implement operators is a lot simpler.

3.4 Missing data can be either represented with the data type's empty value, or a table column that may possibly be unknown can be split into a separate related table, that only has records when the value is known.

3.5 All variables default to a reasonable valid value for their type if not explicitly set, such as the number zero or the empty string.

4. There is no significant hidden data. A row id can only be an explicitly declared table column. The implementation of a table can use hidden row ids, but the user wouldn't see them.

5.  No duplicate rows in tables or queries are allowed.

5.1 In SQL terms, every table has an implicit unique key constraint over all of its columns. This is ignored if there are any actual explicit keys, whether primary or otherwise. In TTM terms, it is impossible by definition to have duplicate rows.

5.2 The results of all stages of queries do not contain duplicate rows. In SQL terms, every query or subquery has an implicit 'distinct' or 'group by all' clause on it. No joins produce duplicates. No unions etc do either.

5.3 By doing this and #3, all queries that look like they should return the same results actually do, whereas in SQL they may return different results in the presence of duplicates or nulls. Queries can also be simpler.

6. Columns in tables and views and query results have no ordinal value; they all have names and are referred to using only those names. Moreover, every column must have a different name from every other column.

7. Rows in tables and views and query results have no ordinal value; they are referenced by relational expressions that match on the values of columns, like in a SQL where-clause.

7.1 An order-by or limit clause only makes sense in an outer-most query, right when results are being returned from the database to the application, where it then specifies the order to return otherwise order-less rows.

In doing all of the above, SQLite should actually be simpler to implement, and it will be easier to use, with more predictable results and fewer bugs.

This next one can be implemented separately from all the other suggestions:

8. Add some standard relational logic operators that can be combined and nested to get all the power of selects and more, with less effort, such as any of the following you don't already have: restrict, project, join, product, union, intersection, difference, divide, rename.

8.1 The simplest join syntax, such as an unqualified comma-delimited list, would perform a natural join by default. Or we could more or less just have natural joins (and cartesian products, 'product') as the only kind of join.

8.2 Using these instead of 'select' should allow for easier implementation and optimization; for one thing, the expressions are more associative or commutative.

This next one can be implemented separately from all the other suggestions:

9. Support nested/child transactions, such as a 'begin transaction' inside another one, which can make things a lot easier for applications; they have to worry less about whether a transaction already exists before starting another one. These are functionally sort of like save-points in SQL, in that even if an inner transaction commits, it is still thrown away if the outer transaction rolls back. To implement this best, you would probably need multiple (cascading?) journal files, one per transaction level.

Following are also features of The Third Manifesto, but can possibly be left out of SQLite in accordance with its Lite nature:

1. All views are updateable like they were tables. From the user's point of view, tables and views are the same sort of thing in how they can be used.

2. Tables can be assigned to directly like they were variables, and insert/update/delete is actually a short-hand for this. Eg, an insert is equivalent to an assignment to a table of the table's old value unioned with the rows being inserted. Supporting this allows users to define arbitrarily flexible updating operations, such as "replace or add" and such.

3. The system catalog tables can be updated directly using data definition language, which results in the schema being updated. Eg, you can use insert statements to create a table rather than a create statement.

4.  Support definition and use of custom data types.

5. It should not be necessary to explicitly declare indexes to help with speed.

6. Generally speaking, users should not have to know about implementation details, but rather just express what their data actually means.

Okay, that's about all for this initial proposal email.

Ultimately, I believe that the core of my proposal involves simplifying SQLite, making it leaner and meaner, and also reduces possible or actual bugs or difficulty in understanding.

At the very least, I hope that the trunk would have the pragma or compile option that essentially strips out current features like nulls and other ambiguity, so essentially we have a restricted or simplified SQL.

I also bring this up because I would expect that SQLite should be able to perform faster when it doesn't handle nulls or duplicates or weak data types than if it does. The conceptual logic is simpler when we don't have those, and the implementation code should also be simpler, and perform faster, since there are fewer possibilities to check at logical decision points. And it should be easier to optimize queries.

So even if no incompatible changes are made, I would hope that it is possible to optimize for the simplest case.

-- Darren Duncan

Reply via email to