Re: [libpqxx-general] Updating a row that does not exist in a table.

Jeroen T. Vermeulen Thu, 19 Jan 2006 09:12:21 -0800

On Thu, January 19, 2006 21:56, Bart Samwel wrote:
> Wagener, Johannes J wrote:


>> When I try to update a row that does not exist in a table (the table is
>> empty as a matter of fact), the query does not fail -   Instead it
>> executes successfully but obviously the row still does not exist and
>> therefore the query does nothing.

First off, Bart sums up the issue as far as SQL is concerned very well:

> This really is standard SQL behaviour. An UPDATE is simply a SELECT
> followed by a manipulation of the selected records. And just like a
> SELECT can return 0 rows without returning an error, an UPDATE can
> affect 0 rows without returning an error.


> There are a number of DBMSes that do raise an "error" if the number of
> rows returned by a SELECT is 0, and apparently DB2 even does this for
> UPDATEs. This is largely a matter of style, as it's not clear that a
> 0-row SELECT or UPDATE is really an error.

The only thing I can add to this is a story on how we view data, and how
the database model does.

One big misunderstanding in the database world is that in the relational
model, we've learned to think of rows as objects.  This is wrong.  Objects
have identities, rows don't.  A row is merely an indication that a
relationship exists between a particular combination of values for the
respective columns.

This is why what we call a "table" is really called a "relation" in
database theory.  Imagine a relation as a huge discrete space, a
multi-dimensional array of booleans.  One dimension for every column, with
the length along that dimension the total number of possible values for
that column.  A value of "true" anywhere in the array means "this
particular combination of values is related."  What that means is
application-specific and depends on the individual relation.  This way of
seeing things allowed some incredible breakthroughs in how relations are
queried and manipulated.

>From this point of view, it really doesn't matter whether there happen to
be zero, or one, or more rows that have the data you're asking for.  The
notion of "present rows" is merely a convenient way of compressing that
huge array of booleans.

Keys were also added to the model; a "key" is just a subset of a
relation's attributes whose combined values must be unique for any row. 
Any combination of values for one key's attributes is a way of identifying
a row, as if it were an object, and we've come to think of database tables
as containers for objects.  We often create tables with an extra "oid"
column just to emulate this model.

As I understand history, this is what seemed to make sense even before SQL
came around.  But SQL allowed keys, relationships between objects,
identity etc. to be hidden away in the schema, without popping up in
regular code.  Optimization aside, it did away with the distinction
between e.g. "dereferencing" a foreign key and "finding" objects that were
related to another by some ad-hoc criterion.  Ad-hoc relationships were
suddenly no longer different from ones built into the schema.  This leap
of the imagination made it possible to write set-oriented code that went
to a much higher level of abstraction and had a cleaner atomicity model. 
Apparently the query languages that people were using when SQL was
developed made you find objects that matched your conditions and then
iterate over them--and took pages of code to do the work SQL did in a few
lines.

Now, SQL never was a very faithful implementation of the relational model,
so it's not surprising that some database vendors may have thought it
convenient to issue an error when you try to update rows that don't exist.
 That seems reasonable when you're trying to manipulate a single row and
SQL seems to be working against you.  But how should the DBMS know that
that is the case?  The smarter the guess is, the more confusing it works
out for you.

Consider the cases where we want to increase (a) a certain employee's
salary if his performance rating is good; or (b) that same employee's
salary regardless of his performance rating; or (c) the salaries of all
employees with good performance ratings:

    UPDATE employee
    SET salary = salary + 100
    WHERE
(a)   id=327 AND rating > 0
(b)   id=327
(c)   rating > 0

In which of these cases would you like an error if zero rows were updated?

Not in (a), although you may still want to check whether a row was
updated; perhaps not in (b), because you don't want an application to fail
just because someone has left the company just before qualifying for a
raise; and probably for (c) because it may indicate there's something
seriously wrong in either the system or the company.  But what happens if
we only give the raise to superhero employees with unusually high
performance ratings?  What if the database is split up into smaller
departments and some of them may not have any employees that qualify?  It
really should be up to the application to decide whether this is an error.

The trick of saying "WHERE id=327 AND satisfactory" (instead of first
doing a SELECT and then an "if" based on the performance rating) might not
occur to someone not familiar with the relational way of thinking.  But
it's perfectly valid.  It's a better SQL expression than the combination
of "SELECT rating FROM employee WHERE id=327" and "UPDATE employee WHERE
id=327."  Also, it saves one entire roundtrip from the application to the
database.


Jeroen


_______________________________________________
Libpqxx-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/libpqxx-general

Re: [libpqxx-general] Updating a row that does not exist in a table.

Reply via email to