Re: [sqlite] Hints for the query planner

2013-10-02 Thread Nico Williams
On Wednesday, October 2, 2013, Richard Hipp wrote:
>
> In the current beta (at http://www.sqlite.org/download.html) the function
> is either unlikely(X) or likelihood(X,Y).  In the second form, Y must be a
> floating point constant between 0.0 and 1.0, inclusive.  The first form is
> equivalent to likelihood(X, 0.0625).


 Perfect.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-10-02 Thread Ryan Johnson

On 02/10/2013 2:19 AM, Baruch Burstein wrote:

On Fri, Sep 13, 2013 at 5:29 AM, Ryan Johnson
wrote:


-- Join cardinality: Bach was a *very* prolific composer whose output
likely dwarfs the (surviving) output of his contemporaries
select p.title, c.name, p.year from composers c join pieces p on p.c_id =
c.id where c.name like '%bach%' and p.year between 1700 and 1750


How would you mark this even using the suggested function syntax? The
likelihood of "c.name like '%bach%' " being true depends on the order the
query optimizer decides to evaluate the 2 predicates in, which in turn
depends on the likelihood of the predicate!
Predicate order doesn't matter; the optimizer will push both down to 
their respective source tables before the join, where they will execute 
independently of each other. The problem would come if the above join 
were input to some outer query and the optimizer needed to reason about 
the cardinality of the join based on the selectivity of the inputs: the 
inputs would be highly filtered but the join would impose unexpectedly 
little additional filtering due to the correlation between famous 
composes named Bach and the half century of interest.


Ryan






___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-10-02 Thread Richard Hipp
On Wed, Oct 2, 2013 at 1:40 AM, Nico Williams  wrote:

> On Tuesday, September 10, 2013, Richard Hipp wrote:
>
> > SURVEY QUESTION:
> >
> > The question for today is what to call this magic hint function:
> >
> > (1)  unlikely(EXPR)
> > (2)  selective(EXPR)
> > (3)  seldom(EXPR)
> > (4)  seldom_true(EXPR)
> > (5)  usually_not_true(EXPR)
>
>
>  (1), on account of: it's already used in various programming languages.
>  However, because o your point below i prefer "likely(,
> )", perhaps with the probability being required (can the
> probability be a non-constant expression?).
>

In the current beta (at http://www.sqlite.org/download.html) the function
is either unlikely(X) or likelihood(X,Y).  In the second form, Y must be a
floating point constant between 0.0 and 1.0, inclusive.  The first form is
equivalent to likelihood(X, 0.0625).

Additional links:

(1) http://www.sqlite.org/draft/releaselog/3_8_1.html
(2) http://www.sqlite.org/draft/lang_corefunc.html#unlikely
(3) http://www.sqlite.org/draft/lang_corefunc.html#likelihood

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-10-02 Thread Baruch Burstein
On Fri, Sep 13, 2013 at 5:29 AM, Ryan Johnson
wrote:

> -- Join cardinality: Bach was a *very* prolific composer whose output
> likely dwarfs the (surviving) output of his contemporaries
> select p.title, c.name, p.year from composers c join pieces p on p.c_id =
> c.id where c.name like '%bach%' and p.year between 1700 and 1750
>

How would you mark this even using the suggested function syntax? The
likelihood of "c.name like '%bach%' " being true depends on the order the
query optimizer decides to evaluate the 2 predicates in, which in turn
depends on the likelihood of the predicate!


-- 
˙uʍop-ǝpısdn sı ɹoʇıuoɯ ɹnoʎ 'sıɥʇ pɐǝɹ uɐɔ noʎ ɟı
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-10-01 Thread Nico Williams
On Tuesday, September 10, 2013, Richard Hipp wrote:

> SURVEY QUESTION:
>
> The question for today is what to call this magic hint function:
>
> (1)  unlikely(EXPR)
> (2)  selective(EXPR)
> (3)  seldom(EXPR)
> (4)  seldom_true(EXPR)
> (5)  usually_not_true(EXPR)


 (1), on account of: it's already used in various programming languages.
 However, because o your point below i prefer "likely(,
)", perhaps with the probability being required (can the
probability be a non-constant expression?).

Please feel free to suggest other names if you think of any.
>
> ADDITIONAL INFORMATION:
>
> The current implementation allows a second argument which must be a
> floating point constant between 0.0 and 1.0, inclusive. The second argument
> is an estimate of the probability that the expression in the first argument
> will be true.  The default is 0.05.  Names like "unlikely" or "seldom" work
> well when this probability is small, but if the second argument is close to
> 1.0, then those names seem backwards.  I don't know if this matters.  The
> optional second argument is not guaranteed to make it into an actually
> release.
>

 Right, but this is less annoying when the keyword name denotes likeliness:
the probability argument is quite clearly a modifier of likelihood.  That's
another possible name: "likelihood()".

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-30 Thread John G
Dr Hipp

I second Kyan's suggestion of a pseudo-comment hinting syntax:

... that the planner hint is not interleaved inside normal SQL
> syntax. Instead I propose a special comment-like syntax instead, as
> Oracle's /*+ */ or --+, but replacing "+" with another symbol, e.g. ">":
>

Having had to switch between databases a couple of times, I would prefer
something that keeps the SQL more portable.
As for names I would vote for the 'seldom()' option as the 'likely()'
option is already covered by the default action.

John Gillespie


On 10 September 2013 20:26, Richard Hipp  wrote:

> There is a survey question at the bottom of this message.  But first some
> context...
>
> Over on the sqlite-dev mailing list, a debate has been going on about the
> best way to provide some useful hints to the query planner.  The query
> under discussion looks like this:
>
> SELECT DISTINCT aname
>   FROM album, composer, track
>  WHERE cname LIKE '%bach%'
>AND composer.cid=track.cid
>AND album.aid=track.aid;
>
> Assuming that the schema has appropriate indices and ANALYZE has been run,
> SQLite does a good job of selecting an efficient query plan for the above.
> But the query planner lacks a key piece of information that could help it
> to do a better job.  In particular, the query planner does not know how
> often the subexpression "cname LIKE '%bach%'" will be true.  But, it turns
> out, the best query plan depends critically on this one fact.
>
> By default, the query planner (in SQLite 3.8.0) assumes that a
> subexpression that cannot use an index will always be true.  Probably this
> will be tweaked in 3.8.1 so that such subexpressions will be assumed to
> usually, but not always, be true.  Either way, it would be useful to be
> able to convey to the query planner the other extreme - that a
> subexpression is usually not true.
>
> (Pedantic detail:  "not true" is not the same as "false" in SQL because
> NULL is neither true nor false.)
>
> There is currently code in a branch that provides a hinting mechanism using
> a magic "unlikely()" function.  Subexpressions contained within
> "unlikely()" are assumed to usually not be true.  Other than this hint to
> the query planner, the unlikely() function is a complete no-op and
> optimized out of the VDBE code so that it does not consume any CPU cycles.
> The only purpose of the unlikely() function is to let the query planner
> know that the subexpression contained in its argument is not commonly
> true.  So, if an application developer knows that the string "bach" seldom
> occurs in composer names, then she might rewrite the query like this:
>
> SELECT DISTINCT aname
>   FROM album, composer, track
>  WHERE unlikely(cname LIKE '%bach%')
>AND composer.cid=track.cid
>AND album.aid=track.aid;
>
> The query planner might use this "likelihood" hint to choose a different
> query plan that works better when the subexpression is commonly false.  Or
> it might decide that the original query plan was good enough and ignore the
> hint.  The query planner gets to make that decision.  The application
> developer is not telling the query planner what to do. The application
> developer has merely provided a small amount of meta-information about the
> likelihood of the subexpression being true, meta-information which the
> query planner may or may not use.
>
> Note that the subexpression does not have to be a LIKE operator.
> PostgreSQL, to name one example, estimates how often a LIKE operator will
> be true based on the pattern on its right-hand side, and adjust query plans
> accordingly, and some have argued for this sort of thing in SQLite.  But I
> want a more general solution.  Suppose the subexpression involves one or
> more calls to application-defined functions about which the query planner
> cannot possible know anything.  A general mechanism for letting the query
> planner know that subexpressions are commonly not true is what is desired -
> not a technique for making LIKE operators more efficient.
>
> SURVEY QUESTION:
>
> The question for today is what to call this magic hint function:
>
> (1)  unlikely(EXPR)
> (2)  selective(EXPR)
> (3)  seldom(EXPR)
> (4)  seldom_true(EXPR)
> (5)  usually_not_true(EXPR)
>
> Please feel free to suggest other names if you think of any.
>
> ADDITIONAL INFORMATION:
>
> The current implementation allows a second argument which must be a
> floating point constant between 0.0 and 1.0, inclusive. The second argument
> is an estimate of the probability that the expression in the first argument
> will be true.  The default is 0.05.  Names like "unlikely" or "seldom" work
> well when this probability is small, but if the second argument is close to
> 1.0, then those names seem backwards.  I don't know if this matters.  The
> optional second argument is not guaranteed to make it into an actually
> release.
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing 

Re: [sqlite] Hints for the query planner

2013-09-12 Thread ibrahim

On 10.09.2013 21:26, Richard Hipp wrote:

SURVEY QUESTION:

The question for today is what to call this magic hint function:

(1)  unlikely(EXPR)
(2)  selective(EXPR)
(3)  seldom(EXPR)
(4)  seldom_true(EXPR)
(5)  usually_not_true(EXPR)




A function call like proposed is much better than adjusting the sql 
gramer for implementation details. Therefore this is a better solution.


(6) assume (EXPR, [%value%])
(7) approximate (EXPR, [%value%])

It would be nice to further have a pragma to adjust the probability 
expectations of the following query with text involved. I think people 
who have to deal with more than one database engine would prefer to use 
use such a pragma to adjust such implementation behaviour.


pragma set_text_match_probability %value%
pragma set_pattern_match_probability %value%


Ibrahim

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-12 Thread Ryan Johnson

On 12/09/2013 7:12 PM, Simon Slavin wrote:

On 12 Sep 2013, at 11:35pm, Roger Binns  wrote:


On 12/09/13 05:03, Gabor Grothendieck wrote:

Perhaps indexing the expression in question would be an alternative
that would keep the performance info separate from the select.

I'd rather just do 'ANALYZE query' and have SQLite go off and do whatever
it deems necessary to gather to make the query perform well in the future.

I think I prefer Roger's idea to that of messing with a vital expression.  
Rather than modify a command you need to work, introduce a new command which, 
even if it entirely fails, doesn't interfere with the operation of the SELECT.  
If the ANALYZE fails the SELECT (or UPDATE, or whatever) will still work, 
albeit more slowly.  And this causes fewer problems for users who have to write 
code which works with many SQL engines.

But because ANALYZE already means something in SQLite, perhaps it might be 
better to introduce a new command
Would it really be so confusing to extend ANALYZE that way? It reads so 
naturally, I bet the most likely source of list traffic would be people 
who tried to use it in older versions of sqlite3 and were surprised it's 
not there...



STORE LIKELIHOOD test, probability

or maybe

REMEMBER LIKELIHOOD OF test AS probability

which will store the fact that such-and-such a test has a certain probability as a new 
row in a table somewhere.  Could be a new row in sqlite_stat3 (or sqlite_stat4), or could 
be in another sqlite_ table.  Omitting the second parameter tells SQLite to do the 
evaluation itself (like ANALYZE does) and store the result.  Curious users could dump the 
table just like people sometimes do "SELECT * FROM sqlite_stat3".
I think it's pretty important to examine predicates in the context of 
specific queries (and to allow the same predicate to appear any number 
of such queries). The predicate "c.name like '%bach%'" is going to 
behave quite differently in these three queries, for example:


-- Vanilla predicate: Bach isn't a very common name
select c.name from composers c where c.name like '%bach%';

-- Join cardinality: Bach was a *very* prolific composer whose output 
likely dwarfs the (surviving) output of his contemporaries
select p.title, c.name, p.year from composers c join pieces p on p.c_id 
= c.id where c.name like '%bach%' and p.year between 1700 and 1750;


-- Correlated columns: Very few Brandenburg anythings were written by 
composers other than J.S. Bach
select c.name, p.title from composers c join pieces p on p.c_id = c.id 
where c.name like '%bach%' and p.title like '%brandenburg%';


(110% agree that any new information that changes query plans needs to 
be in a stats table somewhere. It's a huge aid to performance debugging 
when you can repro a problematic query plan using only the schema, 
query, and a dump of the various stats tables.)


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-12 Thread Simon Slavin

On 12 Sep 2013, at 11:35pm, Roger Binns  wrote:

> On 12/09/13 05:03, Gabor Grothendieck wrote:
>> Perhaps indexing the expression in question would be an alternative 
>> that would keep the performance info separate from the select.
> 
> I'd rather just do 'ANALYZE query' and have SQLite go off and do whatever
> it deems necessary to gather to make the query perform well in the future.

I think I prefer Roger's idea to that of messing with a vital expression.  
Rather than modify a command you need to work, introduce a new command which, 
even if it entirely fails, doesn't interfere with the operation of the SELECT.  
If the ANALYZE fails the SELECT (or UPDATE, or whatever) will still work, 
albeit more slowly.  And this causes fewer problems for users who have to write 
code which works with many SQL engines.

But because ANALYZE already means something in SQLite, perhaps it might be 
better to introduce a new command:

STORE LIKELIHOOD test, probability

or maybe

REMEMBER LIKELIHOOD OF test AS probability

which will store the fact that such-and-such a test has a certain probability 
as a new row in a table somewhere.  Could be a new row in sqlite_stat3 (or 
sqlite_stat4), or could be in another sqlite_ table.  Omitting the second 
parameter tells SQLite to do the evaluation itself (like ANALYZE does) and 
store the result.  Curious users could dump the table just like people 
sometimes do "SELECT * FROM sqlite_stat3".

Perhaps if such entries are already present when you do an ANALYZE it could 
update them.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-12 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/09/13 05:03, Gabor Grothendieck wrote:
> Perhaps indexing the expression in question would be an alternative 
> that would keep the performance info separate from the select.

I'd rather just do 'ANALYZE query' and have SQLite go off and do whatever
it deems necessary to gather to make the query perform well in the future.

Roger

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlIyQagACgkQmOOfHg372QT8jgCgtSROjcL1dyrHo+yP2leh1ffV
xBEAoKEOTIVqz3vlrVrlVeJ130Wru/Mg
=+8TU
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-12 Thread Gabor Grothendieck
PostgreSQL supports

   create index  on  (  )

Note that it allows an expression and not just a column name.

See:

  http://www.postgresql.org/docs/9.1/static/sql-createindex.html

Perhaps indexing the expression in question would be an alternative
that would keep the performance info separate from the select.




On Tue, Sep 10, 2013 at 3:26 PM, Richard Hipp  wrote:
> There is a survey question at the bottom of this message.  But first some
> context...
>
> Over on the sqlite-dev mailing list, a debate has been going on about the
> best way to provide some useful hints to the query planner.  The query
> under discussion looks like this:
>
> SELECT DISTINCT aname
>   FROM album, composer, track
>  WHERE cname LIKE '%bach%'
>AND composer.cid=track.cid
>AND album.aid=track.aid;
>
> Assuming that the schema has appropriate indices and ANALYZE has been run,
> SQLite does a good job of selecting an efficient query plan for the above.
> But the query planner lacks a key piece of information that could help it
> to do a better job.  In particular, the query planner does not know how
> often the subexpression "cname LIKE '%bach%'" will be true.  But, it turns
> out, the best query plan depends critically on this one fact.
>
> By default, the query planner (in SQLite 3.8.0) assumes that a
> subexpression that cannot use an index will always be true.  Probably this
> will be tweaked in 3.8.1 so that such subexpressions will be assumed to
> usually, but not always, be true.  Either way, it would be useful to be
> able to convey to the query planner the other extreme - that a
> subexpression is usually not true.
>
> (Pedantic detail:  "not true" is not the same as "false" in SQL because
> NULL is neither true nor false.)
>
> There is currently code in a branch that provides a hinting mechanism using
> a magic "unlikely()" function.  Subexpressions contained within
> "unlikely()" are assumed to usually not be true.  Other than this hint to
> the query planner, the unlikely() function is a complete no-op and
> optimized out of the VDBE code so that it does not consume any CPU cycles.
> The only purpose of the unlikely() function is to let the query planner
> know that the subexpression contained in its argument is not commonly
> true.  So, if an application developer knows that the string "bach" seldom
> occurs in composer names, then she might rewrite the query like this:
>
> SELECT DISTINCT aname
>   FROM album, composer, track
>  WHERE unlikely(cname LIKE '%bach%')
>AND composer.cid=track.cid
>AND album.aid=track.aid;
>
> The query planner might use this "likelihood" hint to choose a different
> query plan that works better when the subexpression is commonly false.  Or
> it might decide that the original query plan was good enough and ignore the
> hint.  The query planner gets to make that decision.  The application
> developer is not telling the query planner what to do. The application
> developer has merely provided a small amount of meta-information about the
> likelihood of the subexpression being true, meta-information which the
> query planner may or may not use.
>
> Note that the subexpression does not have to be a LIKE operator.
> PostgreSQL, to name one example, estimates how often a LIKE operator will
> be true based on the pattern on its right-hand side, and adjust query plans
> accordingly, and some have argued for this sort of thing in SQLite.  But I
> want a more general solution.  Suppose the subexpression involves one or
> more calls to application-defined functions about which the query planner
> cannot possible know anything.  A general mechanism for letting the query
> planner know that subexpressions are commonly not true is what is desired -
> not a technique for making LIKE operators more efficient.
>
> SURVEY QUESTION:
>
> The question for today is what to call this magic hint function:
>
> (1)  unlikely(EXPR)
> (2)  selective(EXPR)
> (3)  seldom(EXPR)
> (4)  seldom_true(EXPR)
> (5)  usually_not_true(EXPR)
>
> Please feel free to suggest other names if you think of any.
>
> ADDITIONAL INFORMATION:
>
> The current implementation allows a second argument which must be a
> floating point constant between 0.0 and 1.0, inclusive. The second argument
> is an estimate of the probability that the expression in the first argument
> will be true.  The default is 0.05.  Names like "unlikely" or "seldom" work
> well when this probability is small, but if the second argument is close to
> 1.0, then those names seem backwards.  I don't know if this matters.  The
> optional second argument is not guaranteed to make it into an actually
> release.
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Re: [sqlite] Hints for the query planner

2013-09-12 Thread Klaas V
Richard  wrote:

The question for today is what to call this magic hint function:

(1)  unlikely(EXPR)
(2)  selective(EXPR)
(3)  seldom(EXPR)
(4)  seldom_true(EXPR)
(5)  usually_not_true(EXPR)

Please feel free to suggest other names if you think of any.

 
I dislike #4 and#5, but what about one of the following three:
1. Probability

2. Improbability

3. Probe

Number 2 might be in honor of British writer Douglas Adams's masterpiece H2G2

Cordiali saluti/Vriendelijke groeten/Kind regards,
Klaas "Z4us" V MetaDBA
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-11 Thread Daniel Polski


probability(EXPR, value)

- Would force the user to set their best guess. I would think that the 
users guess would be more accurate than a general guess? What's 
considered unlikely for the users data, 0.05? 0.20?


I would prefer if the mechanism could be handled with pragmas (or 
something) for adding / removing hints, to keep special functions away 
from the SQL.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-11 Thread Doug Currie

On Sep 10, 2013, at 6:23 PM, Scott Robison  wrote:

> I think I prefer something along the lines of "unlikely" or "likely". The
> problem with a term like "selective" (at least in my brain) is that it
> doesn't imply (for the single argument version) in what way it is being
> selective.
> 
> If a negative form of the magic function is used ("unlikely", "seldom",
> etc) I would suggest considering inverting the optional second parameter.
> In other words, 0.05 would become 0.95. In my opinion, that reads better:
> "unlikely(COLUMN LIKE '%pattern%', 0.95)" reads "it is unlikely the
> expression will be true 95% of the time".
> 
> In like fashion, a positive form of the magic function would keep the
> current meaning of the optional second parameter.

This is the best suggestion. The pseudo-function names do not change the 
meaning of the query, and they are more clear with regard to the optional 
numeric argument.

e

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-11 Thread Ned Fleming
On Tue, 10 Sep 2013 15:26:51 -0400, Richard Hipp
 wrote:

>SURVEY QUESTION:
>
>The question for today is what to call this magic hint function:
>
>(1)  unlikely(EXPR)
>(2)  selective(EXPR)
>(3)  seldom(EXPR)
>(4)  seldom_true(EXPR)
>(5)  usually_not_true(EXPR)
>
>Please feel free to suggest other names if you think of any.
>
>ADDITIONAL INFORMATION:
>
>The current implementation allows a second argument which must be a
>floating point constant between 0.0 and 1.0, inclusive. The second argument
>is an estimate of the probability that the expression in the first argument
>will be true.  The default is 0.05.  Names like "unlikely" or "seldom" work
>well when this probability is small, but if the second argument is close to
>1.0, then those names seem backwards.  I don't know if this matters.  The
>optional second argument is not guaranteed to make it into an actually
>release.

Use a word that connotes no value on a scale, a word that has
no value implied by the word itself.

These seem OK to me:

confidence
probability
chance
selective
ned (I like this one especially)

These do not:

unlikely
seldom
usually

-- 

Ned


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-11 Thread Stephan Beal
On Wed, Sep 11, 2013 at 5:36 PM, Kevin Benson wrote:

> On Tue, Sep 10, 2013 at 3:26 PM, Richard Hipp  wrote:
> >  SURVEY QUESTION:
> > (1)  unlikely(EXPR)
> > (2)  selective(EXPR)
> > (3)  seldom(EXPR)
> > (4)  seldom_true(EXPR)
> > (5)  usually_not_true(EXPR)
>
>
> (6)  nominal(EXPR)
>

(VII): prioritize() or priority()

is neither negative nor positive in connotation, but i'm not 100% sure
whether it's really indicative of what the op does.

-- 
- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-11 Thread Kevin Benson
On Tue, Sep 10, 2013 at 3:26 PM, Richard Hipp  wrote:

>  SURVEY QUESTION:
>
> The question for today is what to call this magic hint function:
>
> (1)  unlikely(EXPR)
> (2)  selective(EXPR)
> (3)  seldom(EXPR)
> (4)  seldom_true(EXPR)
> (5)  usually_not_true(EXPR)


(6)  nominal(EXPR)

--
   --
  --
 --Ô¿Ô--
K e V i N
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-11 Thread Ralf Junker

On 11.09.2013 16:07, Ryan Johnson wrote:


Perhaps you meant "demote" rather than "degrade" ? That would be a
better fit (an external action that does not necessarily make the
object worse or less useful), and less vague, but it still carries a
negative connotation.


"demote" sounds fine to me, especially since its antonym "promote" may
be used for a function name to raise an expression's rank for the query
planner rather than the 2nd argument.

The negative connotation of both "degrade" and "demote" does not feel
bad for me as a non native English speaker. Both, however, express an
action rather than a quality which is more telling to me than "unlikely"
or the other adjectives suggested so far.

Maybe the function name could be prefixed by "qp_" (for query planner)
or similar to clarify their functionality even more: "qp_demote" and
"qp_promote"?

Ralf
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-11 Thread Ralf Junker
I suggest a verb to express what the function is actually doing, namely 
to reduce its argument in rank or degree for the query planner:


DEGRADE

1. to reduce in worth, character, etc; disgrace;
2. to reduce in rank, status, or degree; remove from office;
3. to reduce in strength, quality, intensity, etc

Source: http://www.collinsdictionary.com/dictionary/english/degrade

On 10.09.2013 21:26, Richard Hipp wrote:


Please feel free to suggest other names if you think of any.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-11 Thread Konrad Hambrick
> -Original Message-
> From: sqlite-users-boun...@sqlite.org 
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf
> Of Richard Hipp
> Sent: Tuesday, September 10, 2013 2:27 PM
> To: General Discussion of SQLite Database
> Subject: [sqlite] Hints for the query planner
> 
> There is a survey question at the bottom of this message.  But first some
> context...
> 

> 
> SURVEY QUESTION:
> 
> The question for today is what to call this magic hint function:
> 
> (1)  unlikely(EXPR)
> (2)  selective(EXPR)
> (3)  seldom(EXPR)
> (4)  seldom_true(EXPR)
> (5)  usually_not_true(EXPR)
> 
> Please feel free to suggest other names if you think of any.
> 
> ADDITIONAL INFORMATION:
> 
> The current implementation allows a second argument which must be a
> floating point constant between 0.0 and 1.0, inclusive. The second argument
> is an estimate of the probability that the expression in the first argument
> will be true.  The default is 0.05.  Names like "unlikely" or "seldom" work
> well when this probability is small, but if the second argument is close to
> 1.0, then those names seem backwards.  I don't know if this matters.  The
> optional second argument is not guaranteed to make it into an actually
> release.

All --

Since the optional second arg is not guaranteed to make it into a release, 
I like (3) - SELDOM( EXPR ) ...

--- cut  here --
SELECT DISTINCT aname
  FROM album, composer, track
 WHERE SELDOM( cname LIKE '%bach%' )
   AND composer.cid=track.cid
   AND album.aid=track.aid
;
--- cut there --

-- kjh

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-11 Thread Baruch Burstein
I also think it should not be directly in the SQL. I like the
not-really-a-comment syntax. Another option might be a few PRAGMAs,
something like

PRAGMA hint("table1.col1 IN (1,2,5)", 0.05);
PRAGMA hint("table1.col2 LIKE '%bach%'". 0.4);

these would add the hints to an internal table. When preparing a query, the
planner would check the hints table to see if any hints match the
table/column/condition triplet, and if so optionally use the hint. Removing
a hint and removing all hints would also be a couple of PRAGMAs.


On Wed, Sep 11, 2013 at 3:53 AM, kyan  wrote:

> Hello Dr Hipp,
>
> First of all, I apologize for this rather off-topic suggestion knowing that
> you may have already implemented the syntax you describe, but there is an
> IMHO good reason for it, read ahead.
>
> On Tue, Sep 10, 2013 at 10:26 PM, Richard Hipp  wrote:
>
> > SELECT DISTINCT aname
> >   FROM album, composer, track
> >  WHERE unlikely(cname LIKE '%bach%')
> >AND composer.cid=track.cid
> >AND album.aid=track.aid;
> >
>
> I would prefer that the planner hint is not interleaved inside normal SQL
> syntax. Instead I propose a special comment-like syntax instead, as
> Oracle's /*+ */ or --+, but replacing "+" with another symbol, e.g. ">":
>
> SELECT DISTINCT aname
> >   FROM album, composer, track
> >  WHERE cname LIKE '%bach%'
> > /*> unlikely */
> >  AND composer.cid=track.cid AND album.aid=track.aid;
> >
>
> or:
>
> SELECT DISTINCT aname
> >   FROM album, composer, track
> >  WHERE cname LIKE '%bach%'
> > --> unlikely
> >AND composer.cid=track.cid
> >AND album.aid=track.aid;
>
>
> If the hint is to be applied to an expression that combines many column
> predicates with AND (I am not sure if this actually makes sense):
>
> SELECT DISTINCT aname
> >   FROM album, composer, track
> >  WHERE unlikely(cname LIKE '%bach%'
> >AND composer.cid=track.cid)
> >AND album.aid=track.aid;
> >
>
> then a -normally redundant- pair of parentheses can be used to specify the
> scope of the hint:
>
> SELECT DISTINCT aname
> >   FROM album, composer, track
> >  WHERE (cname LIKE '%bach%' AND composer.cid=track.cid) /*> unlikely */
> >AND album.aid=track.aid;
> >
>
> The SQLite SQL parser will have to look for exactly "/*>" or "-->" without
> whitespace between the characters, so it can easily tell a planner hint
> from a plain comment with a single character read-ahead. Also, the fact
> that hints are "transparent" to the SQL syntax will allow the query parser
> to handle them in an "orthogonal" way (e.g. a small separate parser for
> hints) to normal SQL parsing, IMO making handling of any future hints
> easier to add.
>
> The main reason for this proposal is that the planner hint will be ignored
> by default by other SQL parsers without the need to modify them, which in
> some cases may not even be possible. For instance it will allow someone to
> write SQL that is valid in databases of alternative DB vendors and still
> provide planner hints when the DB vendor is SQLite (that is why I replaced
> "+" with ">", to avoid conflicts with a hypothetical alternate Oracle query
> optimizer) without having to modify the SQL in the application code to
> remove the hints. This is a property of the Oracle optimizer hint syntax I
> have always appreciated when writing SQL that is to be executed in
> databases of alternative DB vendors with the same schema, for applications
> where the user chooses the database vendor from a list of supported ones.
>
> For more on Oracle optimizer hints see
> http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm.
>
> As for the name of the hint itself I would propose:
>
> --> PROBABLY(True) -- the current default
> --> PROBABLY(False)
> --> PROBABLY(False, 0.7)
> --> PROBABLY(False, 0.6, 0.3)  --re "pedantic detail", the second value if
> for True, the remainder for NULL.
>
> Kind regards,
>
> Constantine Yannakopoulos
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
˙uʍop-ǝpısdn sı ɹoʇıuoɯ ɹnoʎ 'sıɥʇ pɐǝɹ uɐɔ noʎ ɟı
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-11 Thread Hick Gunter
(6) maybe(EXPR)

-Ursprüngliche Nachricht-
Von: Richard Hipp [mailto:d...@sqlite.org]
Gesendet: Dienstag, 10. September 2013 21:27
An: General Discussion of SQLite Database
Betreff: [sqlite] Hints for the query planner

There is a survey question at the bottom of this message.  But first some 
context...

Over on the sqlite-dev mailing list, a debate has been going on about the best 
way to provide some useful hints to the query planner.  The query under 
discussion looks like this:

SELECT DISTINCT aname
  FROM album, composer, track
 WHERE cname LIKE '%bach%'
   AND composer.cid=track.cid
   AND album.aid=track.aid;

Assuming that the schema has appropriate indices and ANALYZE has been run, 
SQLite does a good job of selecting an efficient query plan for the above.
But the query planner lacks a key piece of information that could help it to do 
a better job.  In particular, the query planner does not know how often the 
subexpression "cname LIKE '%bach%'" will be true.  But, it turns out, the best 
query plan depends critically on this one fact.

By default, the query planner (in SQLite 3.8.0) assumes that a subexpression 
that cannot use an index will always be true.  Probably this will be tweaked in 
3.8.1 so that such subexpressions will be assumed to usually, but not always, 
be true.  Either way, it would be useful to be able to convey to the query 
planner the other extreme - that a subexpression is usually not true.

(Pedantic detail:  "not true" is not the same as "false" in SQL because NULL is 
neither true nor false.)

There is currently code in a branch that provides a hinting mechanism using a 
magic "unlikely()" function.  Subexpressions contained within "unlikely()" are 
assumed to usually not be true.  Other than this hint to the query planner, the 
unlikely() function is a complete no-op and optimized out of the VDBE code so 
that it does not consume any CPU cycles.
The only purpose of the unlikely() function is to let the query planner know 
that the subexpression contained in its argument is not commonly true.  So, if 
an application developer knows that the string "bach" seldom occurs in composer 
names, then she might rewrite the query like this:

SELECT DISTINCT aname
  FROM album, composer, track
 WHERE unlikely(cname LIKE '%bach%')
   AND composer.cid=track.cid
   AND album.aid=track.aid;

The query planner might use this "likelihood" hint to choose a different query 
plan that works better when the subexpression is commonly false.  Or it might 
decide that the original query plan was good enough and ignore the hint.  The 
query planner gets to make that decision.  The application developer is not 
telling the query planner what to do. The application developer has merely 
provided a small amount of meta-information about the likelihood of the 
subexpression being true, meta-information which the query planner may or may 
not use.

Note that the subexpression does not have to be a LIKE operator.
PostgreSQL, to name one example, estimates how often a LIKE operator will be 
true based on the pattern on its right-hand side, and adjust query plans 
accordingly, and some have argued for this sort of thing in SQLite.  But I want 
a more general solution.  Suppose the subexpression involves one or more calls 
to application-defined functions about which the query planner cannot possible 
know anything.  A general mechanism for letting the query planner know that 
subexpressions are commonly not true is what is desired - not a technique for 
making LIKE operators more efficient.

SURVEY QUESTION:

The question for today is what to call this magic hint function:

(1)  unlikely(EXPR)
(2)  selective(EXPR)
(3)  seldom(EXPR)
(4)  seldom_true(EXPR)
(5)  usually_not_true(EXPR)

Please feel free to suggest other names if you think of any.

ADDITIONAL INFORMATION:

The current implementation allows a second argument which must be a floating 
point constant between 0.0 and 1.0, inclusive. The second argument is an 
estimate of the probability that the expression in the first argument will be 
true.  The default is 0.05.  Names like "unlikely" or "seldom" work well when 
this probability is small, but if the second argument is close to 1.0, then 
those names seem backwards.  I don't know if this matters.  The optional second 
argument is not guaranteed to make it into an actually release.
--
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


--
 Gunter Hick
Software Engineer
Scientific Games International GmbH
Klitschgasse 2 – 4, A - 1130 Vienna, Austria
FN 157284 a, HG Wien
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This e-mail is confidential and may well also be legally privileged. If you 
have received it in error, you are on notice as to its status and accordingly 
please 

Re: [sqlite] Hints for the query planner

2013-09-11 Thread Tony Papadimitriou

How about:

maybe(COLUMN LIKE '%pattern%',.95)

or (as percent using integer value in 0..100)

maybe(COLUMN LIKE '%pattern%',95)

with a default value of (possibly) 50% (or .5) for the optional second arg?

-Original Message- 
From: Richard Hipp 
Sent: Tuesday, September 10, 2013 10:26 PM 
To: General Discussion of SQLite Database 
Subject: [sqlite] Hints for the query planner 


The question for today is what to call this magic hint function:

(1)  unlikely(EXPR)
(2)  selective(EXPR)
(3)  seldom(EXPR)
(4)  seldom_true(EXPR)
(5)  usually_not_true(EXPR)

Please feel free to suggest other names if you think of any.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread James Berry

On Sep 10, 2013, at 12:26 PM, Richard Hipp  wrote:

> SURVEY QUESTION:
> 
> The question for today is what to call this magic hint function:
> 
> (1)  unlikely(EXPR)
> (2)  selective(EXPR)
> (3)  seldom(EXPR)
> (4)  seldom_true(EXPR)
> (5)  usually_not_true(EXPR)
> 
> Please feel free to suggest other names if you think of any.


I like the optional second parameter. Apart from the obvious change to 
likelihood, which somebody else suggested, but which is less self documenting 
in the one-argument case, I'd suggest that you actually add two words: likely 
and unlikely, and whose second parameters are the inverse of each other, 
crossing at 0.5. So an expression could then, in the simplest case, be labeled 
LIKELY or UNLIKELY, with second parameter defaulting to 1.0 (or to whatever 
value you feel is appropriate), but allowing the user to specify a lesser 
likelihood by lowering that value. LIKELY(expr, 0) would mean the same as 
UNLIKELY(expr) and UNLIKELY(expr, 1), and UNLIKELY(expr, 0) would be the same 
as LIKELY(expr, 1), etc.

James
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread Marc L. Allen
As I was reading this, I said to myself, "what they really need is a confidence 
value."  Then I read the end and, there it was!  A confidence value.  Ok.. not 
exactly confidence, but I think you get my meaning.

It seems to me that you're allowing the query writer to substitute personal 
knowledge of the DB for knowledge based on ANALYZE or other statistical 
indexes.  So, I'm all in favor of allowing that second argument.

If so, I would suggest "confidence(exp, confidence_value)".  Or, perhaps, 
"likelihood(..)"  Likely is fine, or you might even establish several names 
with built-in defaults... e.g. "likely(xxx)" might be "confidence(xxx, .75)" 
and "unlikely(xxx)" might be "confidence(xxx, .25)"  You've got "rarely," 
"mostly," and a whole suite of other synonyms.




This email and any attachments are only for use by the intended recipient(s) 
and may contain legally privileged, confidential, proprietary or otherwise 
private information. Any unauthorized use, reproduction, dissemination, 
distribution or other disclosure of the contents of this e-mail or its 
attachments is strictly prohibited. If you have received this email in error, 
please notify the sender immediately and delete the original.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread kyan
Hello Dr Hipp,

First of all, I apologize for this rather off-topic suggestion knowing that
you may have already implemented the syntax you describe, but there is an
IMHO good reason for it, read ahead.

On Tue, Sep 10, 2013 at 10:26 PM, Richard Hipp  wrote:

> SELECT DISTINCT aname
>   FROM album, composer, track
>  WHERE unlikely(cname LIKE '%bach%')
>AND composer.cid=track.cid
>AND album.aid=track.aid;
>

I would prefer that the planner hint is not interleaved inside normal SQL
syntax. Instead I propose a special comment-like syntax instead, as
Oracle's /*+ */ or --+, but replacing "+" with another symbol, e.g. ">":

SELECT DISTINCT aname
>   FROM album, composer, track
>  WHERE cname LIKE '%bach%'
> /*> unlikely */
>  AND composer.cid=track.cid AND album.aid=track.aid;
>

or:

SELECT DISTINCT aname
>   FROM album, composer, track
>  WHERE cname LIKE '%bach%'
> --> unlikely
>AND composer.cid=track.cid
>AND album.aid=track.aid;


If the hint is to be applied to an expression that combines many column
predicates with AND (I am not sure if this actually makes sense):

SELECT DISTINCT aname
>   FROM album, composer, track
>  WHERE unlikely(cname LIKE '%bach%'
>AND composer.cid=track.cid)
>AND album.aid=track.aid;
>

then a -normally redundant- pair of parentheses can be used to specify the
scope of the hint:

SELECT DISTINCT aname
>   FROM album, composer, track
>  WHERE (cname LIKE '%bach%' AND composer.cid=track.cid) /*> unlikely */
>AND album.aid=track.aid;
>

The SQLite SQL parser will have to look for exactly "/*>" or "-->" without
whitespace between the characters, so it can easily tell a planner hint
from a plain comment with a single character read-ahead. Also, the fact
that hints are "transparent" to the SQL syntax will allow the query parser
to handle them in an "orthogonal" way (e.g. a small separate parser for
hints) to normal SQL parsing, IMO making handling of any future hints
easier to add.

The main reason for this proposal is that the planner hint will be ignored
by default by other SQL parsers without the need to modify them, which in
some cases may not even be possible. For instance it will allow someone to
write SQL that is valid in databases of alternative DB vendors and still
provide planner hints when the DB vendor is SQLite (that is why I replaced
"+" with ">", to avoid conflicts with a hypothetical alternate Oracle query
optimizer) without having to modify the SQL in the application code to
remove the hints. This is a property of the Oracle optimizer hint syntax I
have always appreciated when writing SQL that is to be executed in
databases of alternative DB vendors with the same schema, for applications
where the user chooses the database vendor from a list of supported ones.

For more on Oracle optimizer hints see
http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm.

As for the name of the hint itself I would propose:

--> PROBABLY(True) -- the current default
--> PROBABLY(False)
--> PROBABLY(False, 0.7)
--> PROBABLY(False, 0.6, 0.3)  --re "pedantic detail", the second value if
for True, the remainder for NULL.

Kind regards,

Constantine Yannakopoulos
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread Keith Medcalf

How about three, where two are just overloaded, or rather syntactic sugar, for 
the main declaration:

unlikely(expr)
likely(expr)
likelihood(expr, rate)

Where unlikely(expr) -> likelihood(expr, 0.05)
  likely(expr)   -> likelihood(expr, 0.95)

I would presume that the rate is the expected selection rate of expr over the 
entire table, so the correctly computed value of the rate for a table A of a 
given expr, for example:

select * from A where col LIKE '%bach%'

would be

select * from A where likelihood(col like '%bach%', (select count(*) from a 
where col like '%bach%')/(select count(*) from a))
and is not related to the application of other conditionals not included in the 
expr itself.

So, if there are usable statistics available, should the likelihood given be 
ignored; or, should likelihood completely override the statistical input to the 
optimizer?  I vote that the hint should only be used if no other statistical 
data is available.

> Plus an overload: unlikely(expr) ==> likelihood (expr, 0.05)
> 
> (sent from a mobile device - please excuse brevity, typos, and top-
> posting)
> - stephan beal
> http://wanderinghorse.net
> On Sep 10, 2013 11:49 PM, "Tim Streater"  wrote:
> 
> > On 10 Sep 2013 at 20:26, Richard Hipp  wrote:
> >
> > > SURVEY QUESTION:
> > >
> > > The question for today is what to call this magic hint function:
> > >
> > > (1)  unlikely(EXPR)
> > > (2)  selective(EXPR)
> > > (3)  seldom(EXPR)
> > > (4)  seldom_true(EXPR)
> > > (5)  usually_not_true(EXPR)
> > >
> > > Please feel free to suggest other names if you think of any.
> >
> > likelihood (EXPR, value)
> >
> >
> >
> >
> > --
> > Cheers  --  Tim
> >
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> >
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread Simon Slavin

On 10 Sep 2013, at 10:48pm, Tim Streater  wrote:

> likelihood (EXPR, value)

Best I've seen so far.  I know it makes no sense without the second parameter 
but I think if you're going to make use of a special non-standard optimisation 
system you can be expected to know exactly what it means.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread Stephan Beal
Or...

unlikely(expr (, prob=0.05)) ==> likelihood (...)

(sent from a mobile device - please excuse brevity, typos, and top-posting)
- stephan beal
http://wanderinghorse.net
On Sep 11, 2013 12:51 AM, "Stephan Beal"  wrote:

> Plus an overload: unlikely(expr) ==> likelihood (expr, 0.05)
>
> (sent from a mobile device - please excuse brevity, typos, and
> top-posting)
> - stephan beal
> http://wanderinghorse.net
> On Sep 10, 2013 11:49 PM, "Tim Streater"  wrote:
>
>> On 10 Sep 2013 at 20:26, Richard Hipp  wrote:
>>
>> > SURVEY QUESTION:
>> >
>> > The question for today is what to call this magic hint function:
>> >
>> > (1)  unlikely(EXPR)
>> > (2)  selective(EXPR)
>> > (3)  seldom(EXPR)
>> > (4)  seldom_true(EXPR)
>> > (5)  usually_not_true(EXPR)
>> >
>> > Please feel free to suggest other names if you think of any.
>>
>> likelihood (EXPR, value)
>>
>>
>>
>>
>> --
>> Cheers  --  Tim
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread David de Regt
Seconded.

-Original Message-
From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] 
On Behalf Of Simon Slavin
Sent: Tuesday, September 10, 2013 3:46 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Hints for the query planner


On 10 Sep 2013, at 10:48pm, Tim Streater <t...@clothears.org.uk> wrote:

> likelihood (EXPR, value)

Best I've seen so far.  I know it makes no sense without the second parameter 
but I think if you're going to make use of a special non-standard optimisation 
system you can be expected to know exactly what it means.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread Stephan Beal
Plus an overload: unlikely(expr) ==> likelihood (expr, 0.05)

(sent from a mobile device - please excuse brevity, typos, and top-posting)
- stephan beal
http://wanderinghorse.net
On Sep 10, 2013 11:49 PM, "Tim Streater"  wrote:

> On 10 Sep 2013 at 20:26, Richard Hipp  wrote:
>
> > SURVEY QUESTION:
> >
> > The question for today is what to call this magic hint function:
> >
> > (1)  unlikely(EXPR)
> > (2)  selective(EXPR)
> > (3)  seldom(EXPR)
> > (4)  seldom_true(EXPR)
> > (5)  usually_not_true(EXPR)
> >
> > Please feel free to suggest other names if you think of any.
>
> likelihood (EXPR, value)
>
>
>
>
> --
> Cheers  --  Tim
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread Tim Streater
On 10 Sep 2013 at 20:26, Richard Hipp  wrote:

> SURVEY QUESTION:
>
> The question for today is what to call this magic hint function:
>
> (1)  unlikely(EXPR)
> (2)  selective(EXPR)
> (3)  seldom(EXPR)
> (4)  seldom_true(EXPR)
> (5)  usually_not_true(EXPR)
>
> Please feel free to suggest other names if you think of any.

likelihood (EXPR, value)




--
Cheers  --  Tim
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread Scott Robison
I think I prefer something along the lines of "unlikely" or "likely". The
problem with a term like "selective" (at least in my brain) is that it
doesn't imply (for the single argument version) in what way it is being
selective.

If a negative form of the magic function is used ("unlikely", "seldom",
etc) I would suggest considering inverting the optional second parameter.
In other words, 0.05 would become 0.95. In my opinion, that reads better:
"unlikely(COLUMN LIKE '%pattern%', 0.95)" reads "it is unlikely the
expression will be true 95% of the time".

In like fashion, a positive form of the magic function would keep the
current meaning of the optional second parameter.

Just ideas, I'm not married to them. Thanks for the discussion.

SDR


On Tue, Sep 10, 2013 at 4:17 PM, Eric Minbiole  wrote:

> > (1)  unlikely(EXPR)
> > (2)  selective(EXPR)
> > (3)  seldom(EXPR)
> > (4)  seldom_true(EXPR)
> > (5)  usually_not_true(EXPR)
> >
> >
> I quite like (2) "selective".  I think it's reasonably descriptive on its
> own, and also works well with the optional second argument.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread Eric Minbiole
> (1)  unlikely(EXPR)
> (2)  selective(EXPR)
> (3)  seldom(EXPR)
> (4)  seldom_true(EXPR)
> (5)  usually_not_true(EXPR)
>
>
I quite like (2) "selective".  I think it's reasonably descriptive on its
own, and also works well with the optional second argument.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread Roman Fleysher
In Bayesian statistics there is a term "prior", prior probability. 



From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Richard Hipp [d...@sqlite.org]
Sent: Tuesday, September 10, 2013 3:26 PM
To: General Discussion of SQLite Database
Subject: [sqlite] Hints for the query planner

There is a survey question at the bottom of this message.  But first some
context...

Over on the sqlite-dev mailing list, a debate has been going on about the
best way to provide some useful hints to the query planner.  The query
under discussion looks like this:

SELECT DISTINCT aname
  FROM album, composer, track
 WHERE cname LIKE '%bach%'
   AND composer.cid=track.cid
   AND album.aid=track.aid;

Assuming that the schema has appropriate indices and ANALYZE has been run,
SQLite does a good job of selecting an efficient query plan for the above.
But the query planner lacks a key piece of information that could help it
to do a better job.  In particular, the query planner does not know how
often the subexpression "cname LIKE '%bach%'" will be true.  But, it turns
out, the best query plan depends critically on this one fact.

By default, the query planner (in SQLite 3.8.0) assumes that a
subexpression that cannot use an index will always be true.  Probably this
will be tweaked in 3.8.1 so that such subexpressions will be assumed to
usually, but not always, be true.  Either way, it would be useful to be
able to convey to the query planner the other extreme - that a
subexpression is usually not true.

(Pedantic detail:  "not true" is not the same as "false" in SQL because
NULL is neither true nor false.)

There is currently code in a branch that provides a hinting mechanism using
a magic "unlikely()" function.  Subexpressions contained within
"unlikely()" are assumed to usually not be true.  Other than this hint to
the query planner, the unlikely() function is a complete no-op and
optimized out of the VDBE code so that it does not consume any CPU cycles.
The only purpose of the unlikely() function is to let the query planner
know that the subexpression contained in its argument is not commonly
true.  So, if an application developer knows that the string "bach" seldom
occurs in composer names, then she might rewrite the query like this:

SELECT DISTINCT aname
  FROM album, composer, track
 WHERE unlikely(cname LIKE '%bach%')
   AND composer.cid=track.cid
   AND album.aid=track.aid;

The query planner might use this "likelihood" hint to choose a different
query plan that works better when the subexpression is commonly false.  Or
it might decide that the original query plan was good enough and ignore the
hint.  The query planner gets to make that decision.  The application
developer is not telling the query planner what to do. The application
developer has merely provided a small amount of meta-information about the
likelihood of the subexpression being true, meta-information which the
query planner may or may not use.

Note that the subexpression does not have to be a LIKE operator.
PostgreSQL, to name one example, estimates how often a LIKE operator will
be true based on the pattern on its right-hand side, and adjust query plans
accordingly, and some have argued for this sort of thing in SQLite.  But I
want a more general solution.  Suppose the subexpression involves one or
more calls to application-defined functions about which the query planner
cannot possible know anything.  A general mechanism for letting the query
planner know that subexpressions are commonly not true is what is desired -
not a technique for making LIKE operators more efficient.

SURVEY QUESTION:

The question for today is what to call this magic hint function:

(1)  unlikely(EXPR)
(2)  selective(EXPR)
(3)  seldom(EXPR)
(4)  seldom_true(EXPR)
(5)  usually_not_true(EXPR)

Please feel free to suggest other names if you think of any.

ADDITIONAL INFORMATION:

The current implementation allows a second argument which must be a
floating point constant between 0.0 and 1.0, inclusive. The second argument
is an estimate of the probability that the expression in the first argument
will be true.  The default is 0.05.  Names like "unlikely" or "seldom" work
well when this probability is small, but if the second argument is close to
1.0, then those names seem backwards.  I don't know if this matters.  The
optional second argument is not guaranteed to make it into an actually
release.
--
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-10 Thread Philip Bennefall

Hi Richard,

What about "probability" or "likelyhood"? This works in both the case where 
the likelyhood is great as well as when it is low. From the list you 
provided, I would pick "unlikely".


Kind regards,

Philip Bennefall
- Original Message - 
From: "Richard Hipp" 

To: "General Discussion of SQLite Database" 
Sent: Tuesday, September 10, 2013 9:26 PM
Subject: [sqlite] Hints for the query planner


There is a survey question at the bottom of this message.  But first some
context...

Over on the sqlite-dev mailing list, a debate has been going on about the
best way to provide some useful hints to the query planner.  The query
under discussion looks like this:

SELECT DISTINCT aname
 FROM album, composer, track
WHERE cname LIKE '%bach%'
  AND composer.cid=track.cid
  AND album.aid=track.aid;

Assuming that the schema has appropriate indices and ANALYZE has been run,
SQLite does a good job of selecting an efficient query plan for the above.
But the query planner lacks a key piece of information that could help it
to do a better job.  In particular, the query planner does not know how
often the subexpression "cname LIKE '%bach%'" will be true.  But, it turns
out, the best query plan depends critically on this one fact.

By default, the query planner (in SQLite 3.8.0) assumes that a
subexpression that cannot use an index will always be true.  Probably this
will be tweaked in 3.8.1 so that such subexpressions will be assumed to
usually, but not always, be true.  Either way, it would be useful to be
able to convey to the query planner the other extreme - that a
subexpression is usually not true.

(Pedantic detail:  "not true" is not the same as "false" in SQL because
NULL is neither true nor false.)

There is currently code in a branch that provides a hinting mechanism using
a magic "unlikely()" function.  Subexpressions contained within
"unlikely()" are assumed to usually not be true.  Other than this hint to
the query planner, the unlikely() function is a complete no-op and
optimized out of the VDBE code so that it does not consume any CPU cycles.
The only purpose of the unlikely() function is to let the query planner
know that the subexpression contained in its argument is not commonly
true.  So, if an application developer knows that the string "bach" seldom
occurs in composer names, then she might rewrite the query like this:

SELECT DISTINCT aname
 FROM album, composer, track
WHERE unlikely(cname LIKE '%bach%')
  AND composer.cid=track.cid
  AND album.aid=track.aid;

The query planner might use this "likelihood" hint to choose a different
query plan that works better when the subexpression is commonly false.  Or
it might decide that the original query plan was good enough and ignore the
hint.  The query planner gets to make that decision.  The application
developer is not telling the query planner what to do. The application
developer has merely provided a small amount of meta-information about the
likelihood of the subexpression being true, meta-information which the
query planner may or may not use.

Note that the subexpression does not have to be a LIKE operator.
PostgreSQL, to name one example, estimates how often a LIKE operator will
be true based on the pattern on its right-hand side, and adjust query plans
accordingly, and some have argued for this sort of thing in SQLite.  But I
want a more general solution.  Suppose the subexpression involves one or
more calls to application-defined functions about which the query planner
cannot possible know anything.  A general mechanism for letting the query
planner know that subexpressions are commonly not true is what is desired -
not a technique for making LIKE operators more efficient.

SURVEY QUESTION:

The question for today is what to call this magic hint function:

(1)  unlikely(EXPR)
(2)  selective(EXPR)
(3)  seldom(EXPR)
(4)  seldom_true(EXPR)
(5)  usually_not_true(EXPR)

Please feel free to suggest other names if you think of any.

ADDITIONAL INFORMATION:

The current implementation allows a second argument which must be a
floating point constant between 0.0 and 1.0, inclusive. The second argument
is an estimate of the probability that the expression in the first argument
will be true.  The default is 0.05.  Names like "unlikely" or "seldom" work
well when this probability is small, but if the second argument is close to
1.0, then those names seem backwards.  I don't know if this matters.  The
optional second argument is not guaranteed to make it into an actually
release.
--
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users