Note: This email is a combined reply to several other emails in the same thread.

At 12:32 AM -0600 3/11/06, Marian Olteanu wrote:
You're right Darren, but the problem is that we're not in a DB class. We
cannot tell people who have a solution for their problems that "your
solution is wrong. You need to reimplement your stuff to make it right".
Most of SQLite users are practical people, and all they want is their
problem to be solved. They don't really care if the SQL language is
implementing correctly relational algebra or not.

You'll notice that I never advocated replacing functionality in SQLite. People who already use it the way it is can continue to do so. What I proposed was options that people can choose to use that make it behave a different way that works better for them. So while I think one way is better, I'm not proposing that the functionality to support being able to do it another way is removed.

You have to take all the parts of my proposal in that context, that they are options which I wanted SQLite to improve support for, not the only way to do it.

As a furtherance to that proposal ...

I propose that to both get the main desired effects of reliability and to remain backwards compatible with other SQLite database installations, that much of the altered behaviour be associated with individual database files themselves (from creation) so that their data is treated correctly wherever they go.

I propose that this be similar to how SQLite deals with implementing endian-ness and text encodings now. That is, at the time when a SQLite database is first created, it is declared to have its bytes in a specific order (hi-lo or lo-hi) and that its text is a specific encoding (utf8, utf16le, utf16be), and those attributes stick with it for the life of that database file.

Similarly, I propose that it is at the time of a database file's creation that one can declare they want TTM prescribed behaviour enforced for that database file, such as nulls and duplicates being forbidden.

When a SQLite install encounters a database file set this way, it honors the TTM behaviour; when it encounters a database file without that declaration, it follows the more traditional SQL prescribed behaviour.

Of course, database files created for TTM behaviour will probably not work with older SQLite versions, and only newer ones. If my proposal is adopted and implemented within the trunk, it would probably come out as a major version change. Or it doesn't have to; whatever is appropriate.

At 1:58 AM -0500 3/11/06, Andrew Piskorski wrote:
There is no such thing as null, really?  So, when you do an outer join
between two tables, which in SQL would produce null columns in the
result set, what do YOU propose producing instead of those nulls?

Perhaps I missed it, but in my brief reading of some of Date's work, I
never saw him answer that question.

There are several answers to this.

The main one is to consider why you are using an outer join, and how the data is going to be used, such that you can prescribe (eg, in the query requesting the outer join) appropriate default values for the otherwise null fields in the result, so that your application can then handle all values in each column as a value appropriate for the type.

For example, given these 2 tables:

a: |foo INT|bar INT|
   -----------------
   |     1 |    17 |
   |     2 |     6 |

b: |foo INT|baz INT|
   -----------------
   |     1 |   101 |

You could outer-join b to a like this:

  SELECT *
  FROM a NATURAL LEFT OUTER JOIN b
    DEFAULT baz = 0;

And get:

  |foo INT|bar INT|baz INT|
  -------------------------
  |     1 |    17 |   101 |
  |     2 |     6 |     0 |

So there are no nulls here.

Alternately, situation depending, perhaps what you actually want to do isn't best served by an outer join, but with some other kind of query.

Alternately, see the next comment.

At 11:50 AM +0100 3/11/06, Xavier Noria wrote:
There is no such thing as null, really?  So, when you do an outer join
between two tables, which in SQL would produce null columns in the
result set, what do YOU propose producing instead of those nulls?

I never understood that restriction. I read in the books: "since we have defined things this ways from a formal point of view there's no room for NULL". And my question is well, why don't you change the definitions to augment the datatype sets with a special constant NULL which is by definition not present in any datatype? Wouldn't that give an analogous theory more aligned with real world?

If you want to have a data type which can represent only a single value and use it to mean unknown, and all instances of that value are equal, then that would be fine.

The main problem with NULL is more how it is used in SQL than the idea itself.

For one thing, SQL's NULL violates the logical principle that after you say "set foo to bar, then foo equals bar". With every normal data type, if "foo := 1; bar := foo;" then a subsequent comparison of "foo = bar" would return true. But with nulls, if you say "foo := NULL; bar:= foo", then a subsequent comparison of "foo = bar" does not return true.

More simply, with nulls, saying "foo = foo" will not return true, which flies in the face of common sense.

All sorts of other problems in SQL result from that basic situation, that no NULL value ever equals itself.

But its worse than that, in that SQL isn't even consistent with itself in how it treats nulls. With some kinds of operations or queries, it treats every null being unique, and in other situations it treats them all as being equal. No normal data type has this problem.

So you have to write much more complicated SQL and application code to handle data which may be null to get the results that you want.

At 10:28 PM -0800 3/10/06, Roger Binns wrote:
My main app happens to store phone numbers.  You won't believe
how irritating it is when I find things automatically assume they
are integers.

The problem you describe only happens when you *are* using manifest types, since code that you haven't written is looking at the content of your variable and guessing incorrectly how to treat it based on what its content looks like. By contrast, if you explicitly declare that your phone numbers are text (or a custom data type), for example, then the database will never treat it like an integer. In this respect at least, you made my point for me about strong types reducing errors.

Just for the record:

it wouldn't require any significant amount more code.

Yes it would.  My code currently approximates to this:

 cursor.execute("insert into foo (x,y,z) values(?,?,?)", x,y,z)

It would have to change into this:

 # column x is defined as string
 if isinstance(x, string): storex=x
 elif isinstance(x, int): storex=`x`
 elif isinstance(x, bool):        if x: storex="1" else: storex="0"
 else # various other types and conditions for this context
 # repeat for y and z
 ....
 # add in values
cursor.execute("insert into foo (x,y,z) values(?,?,?)", storex, storey, storez)

It's clear from your example that you actually want to store multiple distinct types of data in the same table columns. In this case, under my proposal, you would declare that column to either be of the Scalar type or don't specify a type at all. Then your code remains as it was.

My first point is that for people who actually want a column that stores just text or just numbers etc, they declare columns as those types explicitly, and therefore data of those types is all which will be stored.

Moreover, such people using a manifestly typed programming language would already be working under the assumption that, while their app variables are capable of storing multiple data types, they think that they are only storing the one type they want. Eg, a count variable would not be assigned 'abc' in their program, or if it was, that would be an error. Since they assume that the correct type of data is in their variables, they can also just store it in the stricter database type without any conditionals, using one line as before.

Not having manifest types in the database throws away information when you store values and requires restituting them when reading.

I don't propose throwing away manifest types, but rather that people can choose between manifest or non-manifest types as it suits themselves. SQLite 3 sort of does that already with its column afinity, but my proposal would make the distinction more formal or easier to optimize.

Or, looking at this another way, perhaps the Python bindings for SQLite should be taking care of this for you.

They can't, unless they do something like silenty add an extra
column that stores the types of the values in the other columns
and attempt to transparently modify the SQL as it flys by to get or update that column. (BTW I also happen to be an author
of wrappers for Python).  (Your proposal sort of does this
by introducing a manifest type.)

SQLite and Python both already do this behind the scenes to implement their manifest typing. Computers only know numbers, with everything else being an abstraction; some extra numbers are stored that tell it how to interpret the other numbers.

But perhaps we're thinking of slightly different things.

I would suggest finding an open source application that uses
SQLite and see if you would indeed make it simpler.  One good
example I would suggest is Trac which was originally written
to use SQLite.

I'll look into this and get back to you some time. Though I have other usage scenarios that I would be addressing first.

-- Darren Duncan

Reply via email to