Re: [sqlite] updating SQLite to implement The Third Manifesto

Darren Duncan Sat, 11 Mar 2006 13:07:23 -0800

Note: This email is a combined reply to several other emails in thesame thread.


At 12:32 AM -0600 3/11/06, Marian Olteanu wrote:

You're right Darren, but the problem is that we're not in a DB class. We
cannot tell people who have a solution for their problems that "your
solution is wrong. You need to reimplement your stuff to make it right".
Most of SQLite users are practical people, and all they want is their
problem to be solved. They don't really care if the SQL language is
implementing correctly relational algebra or not.

You'll notice that I never advocated replacing functionality inSQLite. People who already use it the way it is can continue to doso. What I proposed was options that people can choose to use thatmake it behave a different way that works better for them. So whileI think one way is better, I'm not proposing that the functionalityto support being able to do it another way is removed.

You have to take all the parts of my proposal in that context, thatthey are options which I wanted SQLite to improve support for, notthe only way to do it.


As a furtherance to that proposal ...

I propose that to both get the main desired effects of reliabilityand to remain backwards compatible with other SQLite databaseinstallations, that much of the altered behaviour be associated withindividual database files themselves (from creation) so that theirdata is treated correctly wherever they go.

I propose that this be similar to how SQLite deals with implementingendian-ness and text encodings now. That is, at the time when aSQLite database is first created, it is declared to have its bytes ina specific order (hi-lo or lo-hi) and that its text is a specificencoding (utf8, utf16le, utf16be), and those attributes stick with itfor the life of that database file.

Similarly, I propose that it is at the time of a database file'screation that one can declare they want TTM prescribed behaviourenforced for that database file, such as nulls and duplicates beingforbidden.

When a SQLite install encounters a database file set this way, ithonors the TTM behaviour; when it encounters a database file withoutthat declaration, it follows the more traditional SQL prescribedbehaviour.

Of course, database files created for TTM behaviour will probably notwork with older SQLite versions, and only newer ones. If my proposalis adopted and implemented within the trunk, it would probably comeout as a major version change. Or it doesn't have to; whatever isappropriate.


At 1:58 AM -0500 3/11/06, Andrew Piskorski wrote:

There is no such thing as null, really?  So, when you do an outer join
between two tables, which in SQL would produce null columns in the
result set, what do YOU propose producing instead of those nulls?

Perhaps I missed it, but in my brief reading of some of Date's work, I
never saw him answer that question.


There are several answers to this.

The main one is to consider why you are using an outer join, and howthe data is going to be used, such that you can prescribe (eg, in thequery requesting the outer join) appropriate default values for theotherwise null fields in the result, so that your application canthen handle all values in each column as a value appropriate for thetype.


For example, given these 2 tables:

a: |foo INT|bar INT|
   -----------------
   |     1 |    17 |
   |     2 |     6 |

b: |foo INT|baz INT|
   -----------------
   |     1 |   101 |

You could outer-join b to a like this:

  SELECT *
  FROM a NATURAL LEFT OUTER JOIN b
    DEFAULT baz = 0;

And get:

  |foo INT|bar INT|baz INT|
  -------------------------
  |     1 |    17 |   101 |
  |     2 |     6 |     0 |

So there are no nulls here.

Alternately, situation depending, perhaps what you actually want todo isn't best served by an outer join, but with some other kind ofquery.


Alternately, see the next comment.

At 11:50 AM +0100 3/11/06, Xavier Noria wrote:

There is no such thing as null, really?  So, when you do an outer join
between two tables, which in SQL would produce null columns in the
result set, what do YOU propose producing instead of those nulls?
I never understood that restriction. I read in the books: "since wehave defined things this ways from a formal point of view there's noroom for NULL". And my question is well, why don't you change thedefinitions to augment the datatype sets with a special constantNULL which is by definition not present in any datatype? Wouldn'tthat give an analogous theory more aligned with real world?

If you want to have a data type which can represent only a singlevalue and use it to mean unknown, and all instances of that value areequal, then that would be fine.


The main problem with NULL is more how it is used in SQL than the idea itself.

For one thing, SQL's NULL violates the logical principle that afteryou say "set foo to bar, then foo equals bar". With every normaldata type, if "foo := 1; bar := foo;" then a subsequent comparison of"foo = bar" would return true. But with nulls, if you say "foo :=NULL; bar:= foo", then a subsequent comparison of "foo = bar" doesnot return true.

More simply, with nulls, saying "foo = foo" will not return true,which flies in the face of common sense.

All sorts of other problems in SQL result from that basic situation,that no NULL value ever equals itself.

But its worse than that, in that SQL isn't even consistent withitself in how it treats nulls. With some kinds of operations orqueries, it treats every null being unique, and in other situationsit treats them all as being equal. No normal data type has thisproblem.

So you have to write much more complicated SQL and application codeto handle data which may be null to get the results that you want.


At 10:28 PM -0800 3/10/06, Roger Binns wrote:

My main app happens to store phone numbers.  You won't believe
how irritating it is when I find things automatically assume they
are integers.

The problem you describe only happens when you *are* using manifesttypes, since code that you haven't written is looking at the contentof your variable and guessing incorrectly how to treat it based onwhat its content looks like. By contrast, if you explicitly declarethat your phone numbers are text (or a custom data type), forexample, then the database will never treat it like an integer. Inthis respect at least, you made my point for me about strong typesreducing errors.

Just for the record:

it wouldn't require any significant amount more code.


Yes it would.  My code currently approximates to this:

 cursor.execute("insert into foo (x,y,z) values(?,?,?)", x,y,z)

It would have to change into this:

 # column x is defined as string
 if isinstance(x, string): storex=x
 elif isinstance(x, int): storex=`x`
 elif isinstance(x, bool):        if x: storex="1" else: storex="0"
 else # various other types and conditions for this context
 # repeat for y and z
 ....
 # add in values

cursor.execute("insert into foo (x,y,z) values(?,?,?)", storex,storey, storez)

It's clear from your example that you actually want to store multipledistinct types of data in the same table columns. In this case,under my proposal, you would declare that column to either be of theScalar type or don't specify a type at all. Then your code remainsas it was.

My first point is that for people who actually want a column thatstores just text or just numbers etc, they declare columns as thosetypes explicitly, and therefore data of those types is all which willbe stored.

Moreover, such people using a manifestly typed programming languagewould already be working under the assumption that, while their appvariables are capable of storing multiple data types, they think thatthey are only storing the one type they want. Eg, a count variablewould not be assigned 'abc' in their program, or if it was, thatwould be an error. Since they assume that the correct type of datais in their variables, they can also just store it in the stricterdatabase type without any conditionals, using one line as before.

Not having manifest types in the database throws away informationwhen you store values and requires restituting them when reading.

I don't propose throwing away manifest types, but rather that peoplecan choose between manifest or non-manifest types as it suitsthemselves. SQLite 3 sort of does that already with its columnafinity, but my proposal would make the distinction more formal oreasier to optimize.

Or, looking at this another way, perhaps the Python bindings forSQLite should be taking care of this for you.
They can't, unless they do something like silenty add an extra
column that stores the types of the values in the other columns
and attempt to transparently modify the SQL as it flys by to get orupdate that column. (BTW I also happen to be an author
of wrappers for Python).  (Your proposal sort of does this
by introducing a manifest type.)

SQLite and Python both already do this behind the scenes to implementtheir manifest typing. Computers only know numbers, with everythingelse being an abstraction; some extra numbers are stored that tell ithow to interpret the other numbers.


But perhaps we're thinking of slightly different things.

I would suggest finding an open source application that uses
SQLite and see if you would indeed make it simpler.  One good
example I would suggest is Trac which was originally written
to use SQLite.

I'll look into this and get back to you some time. Though I haveother usage scenarios that I would be addressing first.


-- Darren Duncan

Re: [sqlite] updating SQLite to implement The Third Manifesto

Reply via email to