On Tue, Jul 14, 2009 at 07:32:25AM -0400, Pavel Ivanov scratched on the wall:
> Not to continue argument with Jay but just to express my opinion in 
> comparison:
> 
> >  The ORDER/LIMIT approach is much more resilient to changes, however,
> >  and should more or less behave the same no matter what you do to the
> >  rest of the query.
> 
> Seriously, I don't believe this. There's no way to change the query so
> that min/max will scan all table and order/limit will take just value
> from index. They both will take the same approach in all cases (except
> of course old SQLite which I fortunately didn't work with :) ).

  You clearly have experience with more robust databases, where I would
  expect this statement to be true.  SQLite has a long history of keeping
  the engine as simple and small as possible, however.  Part of that was
  a strong use of standard user-defined functions for the built-ins, like
  min/max/count.  For years stuff like min/max used the same API as
  user-defined functions, and as such couldn't be touched by the
  optimizer.  Even now, with the integration, it isn't that strong.
  Perhaps in other products, but not here.

  In specific, here's the comment in the code of the optimizer that
  attempts to short-cut a min/max:

/*
** Analyze the SELECT statement passed as an argument to see if it
** is a min() or max() query. Return WHERE_ORDERBY_MIN or WHERE_ORDERBY_MAX if
** it is, or 0 otherwise. At present, a query is considered to be
** a min()/max() query if:
**
**   1. There is a single object in the FROM clause.
**
**   2. There is a single expression in the result set, and it is
**      either min(x) or max(x), where x is a column reference.
*/

  So, a simple JOIN (perhaps used as a filtering semijoin) will kill
  the min/max approach, but still leave the use of the index available
  for something like the ORDER/LIMIT approach.  I'm not actually
  sure what SQLite will do if you add a JOIN, so I'm not saying I know
  the ORDER/LIMIT is faster, but this seems to say that min/max will go
  to a scan.

  Also, as pointed out, if x becomes an expression or computed column
  (even one that only accesses the indexed column), rather than a direct
  table reference, all bets are off with the SQLite optimizer.  Sure,
  it is possible to write an optimizer that can deal with these
  situations, but that's not what we're dealing with here.  We'd be
  having a different conversation if we were talking about a
  large-scale data-center product.

  I'll also admit I'm not sure these situations are extremely likely,
  but they're still valid examples, and I like to program defensively.

> >  The ORDER/LIMIT approach is, arguable, less graceful, but
> >  it is also (IMHO) a lot easier to break down into logical blocks that
> >  a newbie can follow and understand both the design and intent-- even
> >  if the query is a bit strung out.
> 
> This is where my opinion is exactly opposite: query with min/max is
> more readable by newbie just getting introduced to the code - it
> clearly states what requester is trying to do: "get minimum among
> records with this condition".

  OK, I'll buy that for someone that is just trying to figure out
  what it does without thinking about it very hard, the min/max does
  "read" better.  But for someone trying to figure out how it works,
  and maybe change it, I still disagree.  Most people think of the
  SELECT clause as a column-wise operator... that's where you define
  the vertical "shape" of your result.   If you want to pick out a
  specific row, they think WHERE.

  Now clearly GROUP BY mixes this up, and allows you to alter the row
  configuration using the SELECT heading, but my own experience
  teaching people SQL has shown me that the two most confusing concepts
  for an SQL newbee are JOINs and GROUP BYs (see OP's response), and
  this depends on a GROUP BY that isn't even there (although at least
  it is a simple one!).

  Further, from a code maintenance standpoint, I'd be seriously afraid
  someone look at this, think "Oh, its obvious this picks out the _row_
  with the max value," and do something like add a second column to the
  query (like the OP is needing to do).  Only problem is that's not how
  that query works.  Even worse, as we've already talked about, if you
  do add a column the query still runs, still works, still returns a
  value-- it just happens to be the wrong value.  Unless you've got a
  strong understanding of how GROUP BY works (see above note about
  learning SQL) someone is going to spend hours trying to debug that
  one.

  From a set-wise, higher-thinking approach I agree that min/max is
  cleaner and more concise, and has the advantage of also running fast
  IF you have a good optimizer.  But I also consider it a "clever"
  piece of code.  ORDER/LIMIT is bigger, and has more bits that you
  need to piece together in your mind to see the big picture, but the
  individual bits are, in my mind, more isolated and simpler (i.e. no
  GROUP BY) for the less experienced developer.



  And again, if we were dealing with a high-end multi-terabyte
  commercial database installation with a team of DBAs, I wouldn't
  even think about it.  To any database professional with a few
  years of experience, the difference between these queries is
  minimal, at worst.  But part of my very defense approach has to
  do with the type of programmers you often find using SQLite (and
  from that, the typical people on this list).  No doubt, we've got
  more than a few extremely smart people on this list, but the
  majority of SQLite users tend to be traditional procedural
  programmers that have a small "on the side" database need.  They're
  often extremely good at what they do, they just don't "do"
  databases very much.  They're not SQL wizards, and they tend not to
  think in declarative set operations or Relational Calculus, making
  SQL a slightly uncomfortable language.  (And then there is Igor,
  who wouldn't surprise me by posting a thousand line compound SQL
  statement that implements a web browser.)

  A good database programmer might not like the idea of "making SQL
  look like C", but from my own experience, that's not always a bad
  thing if you've got a team of C (or insert your traditional language
  here) programmers.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to