[sqlite] FW: range enclosing a number

Bogdan Nicula Tue, 14 Jul 2009 00:25:10 -0700

Sorry if you receive twice this email:


Thank you everybody for the incredible help.


Maybe is better if I explain more the problem:


I need to do piecewise linear interpolations between two very large sets of 
numbers. The interpolation is the reason I was using <= and>=. One set 
represents clock ticks (46 bits integers) and the other real times (floating 
point numbers). The table has two columns which are indexed and the elements 
are unique. The interpolation would have to be done either way: from ticks to 
times or from times to ticks. To perform the interpolation I need that select 
returns both columns.


Adding the second column as in:

select max(ticks),time from time_pair where ticks <= ?;


seems to work, although I don't understand the GROUP BY comments.


I have thought of edge conditions (there I need to do extrapolation) and 
planned either to retrieve the max and min beforehand and use them to compare 
with the current argument, or to do always something like:


select * from time_pair where ticks <= ? order by ticks desc limit 2;

i.e. try to retrieve two rows before (and similarly two after). Since the max 
and min cannot be reused because the database is closed between interpolation 
operations and might be altered, it's not clear which is the better approach, 
so I will have to test.


The idea to try on both 3.4 and 3.6 is very good since since this is a mixed 
environment and ultimately it may run against 3.4.


Thank you again for your time and efforts,

Bogdan



>> Date: Mon, 13 Jul 2009 23:35:22 -0500
>> From: j...@kreibi.ch
>> To: paiva...@gmail.com
>> CC: sqlite-users@sqlite.org
>> Subject: Re: [sqlite] range enclosing a number
>> 
>> On Mon, Jul 13, 2009 at 10:33:00PM -0400, Pavel Ivanov scratched on the wall:
>>> Jay, you're pretty much mistaken:
>>> 
>>>>  I'm pretty sure you don't want to do it this way.  What this does is
>>>>  gather every row that meets the WHERE condition and then runs a max()
>>>>  or min() aggregation function across all of those rows.  That means
>>>>  that even if the column "number" has an index on it, between these
>>>>  two statements you're going to end up scanning the whole table.
>>> 
>>> All database engines optimize queries which ask for min/max on indexed
>>> column with condition including only  or = on this very column.
>>> And SQLite is among these too:
>> 
>> 
>>   Not "all."  This type of optimization is actually a fairly new
>>   addition to SQLite (considering the product lifetime) and first
>>   appeared in 3.5.5, which was released in early 2008.
>> 
>> 
>>   And I'd still go with ORDER/LIMIT.  Here's why:
>> 
>> 
>>   For my tests I just used the default build under the current version
>>   of Mac OS X, which is a bit old (3.4).  Under that build, the
>>   ORDER/LIMIT is clearly faster, as this is before the optimization
>>   existed:
>> 
>>   (using the same test set you did)
>>   
>>   Full scan, 3.4: 
>>   -------------
>>   real       0m5.99s
>>   user       0m4.73s
>>   sys        0m0.84s
>> 
>>   Using ORDER/LIMIT, 3.4:
>>   -------------
>>   real       0m0.00s
>>   user       0m0.01s
>>   sys        0m0.00s
>> 
>>   Using min/max, 3.4:
>>   -------------
>>   real       0m5.97s
>>   user       0m2.94s
>>   sys        0m0.38s
>> 
>>   In this case, it is clear that min/max are NOT integrated into the
>>   optimizer, and requires half a table scan, just as I stated.
>> 
>>   I also have a build of the current 3.6.16 around, and in that case,
>>   the numbers are better:
>> 
>>   Using ORDER/LIMIT, 3.6.16
>>   -------------
>>   real       0m0.12s
>>   user       0m0.01s
>>   sys        0m0.03s
>> 
>>   Using min/max, 3.6.16
>>   -------------
>>   real       0m0.04s
>>   user       0m0.01s
>>   sys        0m0.03s
>> 
>>   This clearly shows that the optimization does exist, and that for
>>   this very basic case my assumptions were incorrect.
>> 
>>   With the current 3.6.16 build, using min/max seems a tad faster-- but
>>   only in "real" time.  In terms of user/sys times, the results shown
>>   here (and you're own numbers, which were 0.043/0.001/0.005 and
>>   0.005/0.001/0.001) were pretty typical (i.e. very very close).
>>   That might just be an I/O fluke.  We're getting small enough that
>>   to really say anything definite requires better profiling.  So
>>   there does appear to be a difference, but it is pretty small and
>>   unclear where it is coming from.
>> 
>>   However, I'd point out that using ORDER/LIMIT under 3.4 is the
>>   fastest of all.  This isn't just a quirk of one run, either.
>>   I ran these several times and the 3.4 ORDER/LIMIT was always fastest.
>>   We're still playing with number to small to really trust, but it
>>   seems that if the 3.6.16 ORDER/LIMIT was as fast as the one in 3.4,
>>   it would likely be the best choice of all.
>> 
>> 
>>   So you've sold me that the current version of SQLite clearly does
>>   have the min/max optimization and doesn't require a table scan.  It
>>   also appears to be slightly faster, but not by a big enough gap to
>>   clearly consider it a better choice on that alone.
>> 
>> 
>> 
>>   Personally, I'd still go with ORDER/LIMIT.  With the current version
>>   of SQLite the runtimes of both approaches are extremely similar,
>>   but the min/max approach depends on the query optimizer being able to
>>   take the min/max notation and basically turn it into an internal
>>   ORDER BY (look at the code).
>> 
>>   There are a lot of limits on when the optimizer can do this.  If the
>>   real-world query is a bit more complex than this example, or if (down
>>   the road) the query conditions get changed, or if the query gets more
>>   complex in just about anyway, the optimization is going to break with
>>   the min/max approach and you'll be stuck with a table scan--
>>   something that I think we can all agree is MUCH slower.  If that comes
>>   up due to a change in the SQL six months down the road, you're going
>>   to be spending a lot of time wondering why things just got so slow.
>>   The ORDER/LIMIT approach is much more resilient to changes, however,
>>   and should more or less behave the same no matter what you do to the
>>   rest of the query.
>> 
>>   The ORDER/LIMIT approach is also the clear winner if there is any
>>   possible chance at all you'll be running on older code.  "Older," in
>>   this case, is code that was released less than two years ago.  In
>>   many production environments, that's not all that long.
>> 
>>   There is also this...
>> 
>>> But of course your other point is true - if you want some other data
>>> from table along with min/max value, you need to make additional
>>> select in case of using min/max.
>> 
>>   ... which also negates any possible advantage of using the min/max
>>   method.  The ORDER/LIMIT approach lets you pull out all the data you
>>   might need or want in a single query.
>> 
>>   For the simple case on newer code, both approaches give similar
>>   returns.  But the min/max approach will default to a very expensive
>>   behavior if it is changed in just about any way, while the
>>   ORDER/LIMIT approach is valid for most variations on this example.
>> 
>> 
>>   I could also go off on a more idealistic tangent, about how I think
>>   the ORDER/LIMIT approach is more in the spirit of how SELECT is meant
>>   to be used... the min/max approach clearly depends on the query
>>   optimizer to do its work, while the ORDER/LIMIT works "by design."
>>   I'm never a fan of trusting the query optimizer.  It changes from time
>>   to time and sometimes things break.  The min/max approach also
>>   depends on things like an implied GROUP BY.  That might be standard
>>   SQL, but for someone less versed in SQL it looks like a query that
>>   shouldn't even work (and, in fact, doesn't once you add another
>>   column).  The ORDER/LIMIT approach is, arguable, less graceful, but
>>   it is also (IMHO) a lot easier to break down into logical blocks that
>>   a newbie can follow and understand both the design and intent-- even
>>   if the query is a bit strung out.  I would say that the min/max
>>   approach is more in the spirit of the Relational Model, but that's
>>   not what SQL is about and, at least where I work, the mention of
>>   "Relational Model" will get you nothing but blank looks from the
>>   majority of my co-workers.  It might be a more elegant approach,
>>   but six months down the road when some random team member needs to
>>   update it, that elegance is going to be totally lost on them. 
>>   I'd prefer the practical, more bullet proof solution that isn't
>>   dependent on the optimizer retaining the exact same behavor.
>>   And speaking of changes...  if the current version of SQLite can be
>>   made to run the ORDER/LIMIT as fast as the older ones, then
>>   ORDER/LIMIT could win in time as well!
>>   
>>   I can go on, but I'll stop there, since at this point we're well into
>>   opinions and I need to be doing other things.
>> 
>>    -j
>> 
>> -- 
>> Jay A. Kreibich < J A Y  @  K R E I B I.C H>
>> 
>> "Our opponent is an alien starship packed with atomic bombs.  We have
>>  a protractor."   "I'll go home and see if I can scrounge up a ruler
>>  and a piece of string."  --from Anathem by Neal Stephenson
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> 
> _________________________________________________________________
> More than messages–check out the rest of the Windows Live™.
> http://www.microsoft.com/windows/windowslive/

_________________________________________________________________
Drag n’ drop—Get easy photo sharing with Windows Live™ Photos.

http://www.microsoft.com/windows/windowslive/products/photos.aspx
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FW: range enclosing a number

Reply via email to