Re: [sqlite] Hints for the query planner
On Wednesday, October 2, 2013, Richard Hipp wrote: > > In the current beta (at http://www.sqlite.org/download.html) the function > is either unlikely(X) or likelihood(X,Y). In the second form, Y must be a > floating point constant between 0.0 and 1.0, inclusive. The first form is > equivalent to likelihood(X, 0.0625). Perfect. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
On 02/10/2013 2:19 AM, Baruch Burstein wrote: On Fri, Sep 13, 2013 at 5:29 AM, Ryan Johnsonwrote: -- Join cardinality: Bach was a *very* prolific composer whose output likely dwarfs the (surviving) output of his contemporaries select p.title, c.name, p.year from composers c join pieces p on p.c_id = c.id where c.name like '%bach%' and p.year between 1700 and 1750 How would you mark this even using the suggested function syntax? The likelihood of "c.name like '%bach%' " being true depends on the order the query optimizer decides to evaluate the 2 predicates in, which in turn depends on the likelihood of the predicate! Predicate order doesn't matter; the optimizer will push both down to their respective source tables before the join, where they will execute independently of each other. The problem would come if the above join were input to some outer query and the optimizer needed to reason about the cardinality of the join based on the selectivity of the inputs: the inputs would be highly filtered but the join would impose unexpectedly little additional filtering due to the correlation between famous composes named Bach and the half century of interest. Ryan ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
On Wed, Oct 2, 2013 at 1:40 AM, Nico Williamswrote: > On Tuesday, September 10, 2013, Richard Hipp wrote: > > > SURVEY QUESTION: > > > > The question for today is what to call this magic hint function: > > > > (1) unlikely(EXPR) > > (2) selective(EXPR) > > (3) seldom(EXPR) > > (4) seldom_true(EXPR) > > (5) usually_not_true(EXPR) > > > (1), on account of: it's already used in various programming languages. > However, because o your point below i prefer "likely(, > )", perhaps with the probability being required (can the > probability be a non-constant expression?). > In the current beta (at http://www.sqlite.org/download.html) the function is either unlikely(X) or likelihood(X,Y). In the second form, Y must be a floating point constant between 0.0 and 1.0, inclusive. The first form is equivalent to likelihood(X, 0.0625). Additional links: (1) http://www.sqlite.org/draft/releaselog/3_8_1.html (2) http://www.sqlite.org/draft/lang_corefunc.html#unlikely (3) http://www.sqlite.org/draft/lang_corefunc.html#likelihood -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
On Fri, Sep 13, 2013 at 5:29 AM, Ryan Johnsonwrote: > -- Join cardinality: Bach was a *very* prolific composer whose output > likely dwarfs the (surviving) output of his contemporaries > select p.title, c.name, p.year from composers c join pieces p on p.c_id = > c.id where c.name like '%bach%' and p.year between 1700 and 1750 > How would you mark this even using the suggested function syntax? The likelihood of "c.name like '%bach%' " being true depends on the order the query optimizer decides to evaluate the 2 predicates in, which in turn depends on the likelihood of the predicate! -- ˙uʍop-ǝpısdn sı ɹoʇıuoɯ ɹnoʎ 'sıɥʇ pɐǝɹ uɐɔ noʎ ɟı ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
On Tuesday, September 10, 2013, Richard Hipp wrote: > SURVEY QUESTION: > > The question for today is what to call this magic hint function: > > (1) unlikely(EXPR) > (2) selective(EXPR) > (3) seldom(EXPR) > (4) seldom_true(EXPR) > (5) usually_not_true(EXPR) (1), on account of: it's already used in various programming languages. However, because o your point below i prefer "likely(, )", perhaps with the probability being required (can the probability be a non-constant expression?). Please feel free to suggest other names if you think of any. > > ADDITIONAL INFORMATION: > > The current implementation allows a second argument which must be a > floating point constant between 0.0 and 1.0, inclusive. The second argument > is an estimate of the probability that the expression in the first argument > will be true. The default is 0.05. Names like "unlikely" or "seldom" work > well when this probability is small, but if the second argument is close to > 1.0, then those names seem backwards. I don't know if this matters. The > optional second argument is not guaranteed to make it into an actually > release. > Right, but this is less annoying when the keyword name denotes likeliness: the probability argument is quite clearly a modifier of likelihood. That's another possible name: "likelihood()". Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
Dr Hipp I second Kyan's suggestion of a pseudo-comment hinting syntax: ... that the planner hint is not interleaved inside normal SQL > syntax. Instead I propose a special comment-like syntax instead, as > Oracle's /*+ */ or --+, but replacing "+" with another symbol, e.g. ">": > Having had to switch between databases a couple of times, I would prefer something that keeps the SQL more portable. As for names I would vote for the 'seldom()' option as the 'likely()' option is already covered by the default action. John Gillespie On 10 September 2013 20:26, Richard Hippwrote: > There is a survey question at the bottom of this message. But first some > context... > > Over on the sqlite-dev mailing list, a debate has been going on about the > best way to provide some useful hints to the query planner. The query > under discussion looks like this: > > SELECT DISTINCT aname > FROM album, composer, track > WHERE cname LIKE '%bach%' >AND composer.cid=track.cid >AND album.aid=track.aid; > > Assuming that the schema has appropriate indices and ANALYZE has been run, > SQLite does a good job of selecting an efficient query plan for the above. > But the query planner lacks a key piece of information that could help it > to do a better job. In particular, the query planner does not know how > often the subexpression "cname LIKE '%bach%'" will be true. But, it turns > out, the best query plan depends critically on this one fact. > > By default, the query planner (in SQLite 3.8.0) assumes that a > subexpression that cannot use an index will always be true. Probably this > will be tweaked in 3.8.1 so that such subexpressions will be assumed to > usually, but not always, be true. Either way, it would be useful to be > able to convey to the query planner the other extreme - that a > subexpression is usually not true. > > (Pedantic detail: "not true" is not the same as "false" in SQL because > NULL is neither true nor false.) > > There is currently code in a branch that provides a hinting mechanism using > a magic "unlikely()" function. Subexpressions contained within > "unlikely()" are assumed to usually not be true. Other than this hint to > the query planner, the unlikely() function is a complete no-op and > optimized out of the VDBE code so that it does not consume any CPU cycles. > The only purpose of the unlikely() function is to let the query planner > know that the subexpression contained in its argument is not commonly > true. So, if an application developer knows that the string "bach" seldom > occurs in composer names, then she might rewrite the query like this: > > SELECT DISTINCT aname > FROM album, composer, track > WHERE unlikely(cname LIKE '%bach%') >AND composer.cid=track.cid >AND album.aid=track.aid; > > The query planner might use this "likelihood" hint to choose a different > query plan that works better when the subexpression is commonly false. Or > it might decide that the original query plan was good enough and ignore the > hint. The query planner gets to make that decision. The application > developer is not telling the query planner what to do. The application > developer has merely provided a small amount of meta-information about the > likelihood of the subexpression being true, meta-information which the > query planner may or may not use. > > Note that the subexpression does not have to be a LIKE operator. > PostgreSQL, to name one example, estimates how often a LIKE operator will > be true based on the pattern on its right-hand side, and adjust query plans > accordingly, and some have argued for this sort of thing in SQLite. But I > want a more general solution. Suppose the subexpression involves one or > more calls to application-defined functions about which the query planner > cannot possible know anything. A general mechanism for letting the query > planner know that subexpressions are commonly not true is what is desired - > not a technique for making LIKE operators more efficient. > > SURVEY QUESTION: > > The question for today is what to call this magic hint function: > > (1) unlikely(EXPR) > (2) selective(EXPR) > (3) seldom(EXPR) > (4) seldom_true(EXPR) > (5) usually_not_true(EXPR) > > Please feel free to suggest other names if you think of any. > > ADDITIONAL INFORMATION: > > The current implementation allows a second argument which must be a > floating point constant between 0.0 and 1.0, inclusive. The second argument > is an estimate of the probability that the expression in the first argument > will be true. The default is 0.05. Names like "unlikely" or "seldom" work > well when this probability is small, but if the second argument is close to > 1.0, then those names seem backwards. I don't know if this matters. The > optional second argument is not guaranteed to make it into an actually > release. > -- > D. Richard Hipp > d...@sqlite.org > ___ > sqlite-users mailing
Re: [sqlite] Hints for the query planner
On 10.09.2013 21:26, Richard Hipp wrote: SURVEY QUESTION: The question for today is what to call this magic hint function: (1) unlikely(EXPR) (2) selective(EXPR) (3) seldom(EXPR) (4) seldom_true(EXPR) (5) usually_not_true(EXPR) A function call like proposed is much better than adjusting the sql gramer for implementation details. Therefore this is a better solution. (6) assume (EXPR, [%value%]) (7) approximate (EXPR, [%value%]) It would be nice to further have a pragma to adjust the probability expectations of the following query with text involved. I think people who have to deal with more than one database engine would prefer to use use such a pragma to adjust such implementation behaviour. pragma set_text_match_probability %value% pragma set_pattern_match_probability %value% Ibrahim ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
On 12/09/2013 7:12 PM, Simon Slavin wrote: On 12 Sep 2013, at 11:35pm, Roger Binnswrote: On 12/09/13 05:03, Gabor Grothendieck wrote: Perhaps indexing the expression in question would be an alternative that would keep the performance info separate from the select. I'd rather just do 'ANALYZE query' and have SQLite go off and do whatever it deems necessary to gather to make the query perform well in the future. I think I prefer Roger's idea to that of messing with a vital expression. Rather than modify a command you need to work, introduce a new command which, even if it entirely fails, doesn't interfere with the operation of the SELECT. If the ANALYZE fails the SELECT (or UPDATE, or whatever) will still work, albeit more slowly. And this causes fewer problems for users who have to write code which works with many SQL engines. But because ANALYZE already means something in SQLite, perhaps it might be better to introduce a new command Would it really be so confusing to extend ANALYZE that way? It reads so naturally, I bet the most likely source of list traffic would be people who tried to use it in older versions of sqlite3 and were surprised it's not there... STORE LIKELIHOOD test, probability or maybe REMEMBER LIKELIHOOD OF test AS probability which will store the fact that such-and-such a test has a certain probability as a new row in a table somewhere. Could be a new row in sqlite_stat3 (or sqlite_stat4), or could be in another sqlite_ table. Omitting the second parameter tells SQLite to do the evaluation itself (like ANALYZE does) and store the result. Curious users could dump the table just like people sometimes do "SELECT * FROM sqlite_stat3". I think it's pretty important to examine predicates in the context of specific queries (and to allow the same predicate to appear any number of such queries). The predicate "c.name like '%bach%'" is going to behave quite differently in these three queries, for example: -- Vanilla predicate: Bach isn't a very common name select c.name from composers c where c.name like '%bach%'; -- Join cardinality: Bach was a *very* prolific composer whose output likely dwarfs the (surviving) output of his contemporaries select p.title, c.name, p.year from composers c join pieces p on p.c_id = c.id where c.name like '%bach%' and p.year between 1700 and 1750; -- Correlated columns: Very few Brandenburg anythings were written by composers other than J.S. Bach select c.name, p.title from composers c join pieces p on p.c_id = c.id where c.name like '%bach%' and p.title like '%brandenburg%'; (110% agree that any new information that changes query plans needs to be in a stats table somewhere. It's a huge aid to performance debugging when you can repro a problematic query plan using only the schema, query, and a dump of the various stats tables.) Ryan ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
On 12 Sep 2013, at 11:35pm, Roger Binnswrote: > On 12/09/13 05:03, Gabor Grothendieck wrote: >> Perhaps indexing the expression in question would be an alternative >> that would keep the performance info separate from the select. > > I'd rather just do 'ANALYZE query' and have SQLite go off and do whatever > it deems necessary to gather to make the query perform well in the future. I think I prefer Roger's idea to that of messing with a vital expression. Rather than modify a command you need to work, introduce a new command which, even if it entirely fails, doesn't interfere with the operation of the SELECT. If the ANALYZE fails the SELECT (or UPDATE, or whatever) will still work, albeit more slowly. And this causes fewer problems for users who have to write code which works with many SQL engines. But because ANALYZE already means something in SQLite, perhaps it might be better to introduce a new command: STORE LIKELIHOOD test, probability or maybe REMEMBER LIKELIHOOD OF test AS probability which will store the fact that such-and-such a test has a certain probability as a new row in a table somewhere. Could be a new row in sqlite_stat3 (or sqlite_stat4), or could be in another sqlite_ table. Omitting the second parameter tells SQLite to do the evaluation itself (like ANALYZE does) and store the result. Curious users could dump the table just like people sometimes do "SELECT * FROM sqlite_stat3". Perhaps if such entries are already present when you do an ANALYZE it could update them. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/09/13 05:03, Gabor Grothendieck wrote: > Perhaps indexing the expression in question would be an alternative > that would keep the performance info separate from the select. I'd rather just do 'ANALYZE query' and have SQLite go off and do whatever it deems necessary to gather to make the query perform well in the future. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlIyQagACgkQmOOfHg372QT8jgCgtSROjcL1dyrHo+yP2leh1ffV xBEAoKEOTIVqz3vlrVrlVeJ130Wru/Mg =+8TU -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
PostgreSQL supports create index on ( ) Note that it allows an expression and not just a column name. See: http://www.postgresql.org/docs/9.1/static/sql-createindex.html Perhaps indexing the expression in question would be an alternative that would keep the performance info separate from the select. On Tue, Sep 10, 2013 at 3:26 PM, Richard Hippwrote: > There is a survey question at the bottom of this message. But first some > context... > > Over on the sqlite-dev mailing list, a debate has been going on about the > best way to provide some useful hints to the query planner. The query > under discussion looks like this: > > SELECT DISTINCT aname > FROM album, composer, track > WHERE cname LIKE '%bach%' >AND composer.cid=track.cid >AND album.aid=track.aid; > > Assuming that the schema has appropriate indices and ANALYZE has been run, > SQLite does a good job of selecting an efficient query plan for the above. > But the query planner lacks a key piece of information that could help it > to do a better job. In particular, the query planner does not know how > often the subexpression "cname LIKE '%bach%'" will be true. But, it turns > out, the best query plan depends critically on this one fact. > > By default, the query planner (in SQLite 3.8.0) assumes that a > subexpression that cannot use an index will always be true. Probably this > will be tweaked in 3.8.1 so that such subexpressions will be assumed to > usually, but not always, be true. Either way, it would be useful to be > able to convey to the query planner the other extreme - that a > subexpression is usually not true. > > (Pedantic detail: "not true" is not the same as "false" in SQL because > NULL is neither true nor false.) > > There is currently code in a branch that provides a hinting mechanism using > a magic "unlikely()" function. Subexpressions contained within > "unlikely()" are assumed to usually not be true. Other than this hint to > the query planner, the unlikely() function is a complete no-op and > optimized out of the VDBE code so that it does not consume any CPU cycles. > The only purpose of the unlikely() function is to let the query planner > know that the subexpression contained in its argument is not commonly > true. So, if an application developer knows that the string "bach" seldom > occurs in composer names, then she might rewrite the query like this: > > SELECT DISTINCT aname > FROM album, composer, track > WHERE unlikely(cname LIKE '%bach%') >AND composer.cid=track.cid >AND album.aid=track.aid; > > The query planner might use this "likelihood" hint to choose a different > query plan that works better when the subexpression is commonly false. Or > it might decide that the original query plan was good enough and ignore the > hint. The query planner gets to make that decision. The application > developer is not telling the query planner what to do. The application > developer has merely provided a small amount of meta-information about the > likelihood of the subexpression being true, meta-information which the > query planner may or may not use. > > Note that the subexpression does not have to be a LIKE operator. > PostgreSQL, to name one example, estimates how often a LIKE operator will > be true based on the pattern on its right-hand side, and adjust query plans > accordingly, and some have argued for this sort of thing in SQLite. But I > want a more general solution. Suppose the subexpression involves one or > more calls to application-defined functions about which the query planner > cannot possible know anything. A general mechanism for letting the query > planner know that subexpressions are commonly not true is what is desired - > not a technique for making LIKE operators more efficient. > > SURVEY QUESTION: > > The question for today is what to call this magic hint function: > > (1) unlikely(EXPR) > (2) selective(EXPR) > (3) seldom(EXPR) > (4) seldom_true(EXPR) > (5) usually_not_true(EXPR) > > Please feel free to suggest other names if you think of any. > > ADDITIONAL INFORMATION: > > The current implementation allows a second argument which must be a > floating point constant between 0.0 and 1.0, inclusive. The second argument > is an estimate of the probability that the expression in the first argument > will be true. The default is 0.05. Names like "unlikely" or "seldom" work > well when this probability is small, but if the second argument is close to > 1.0, then those names seem backwards. I don't know if this matters. The > optional second argument is not guaranteed to make it into an actually > release. > -- > D. Richard Hipp > d...@sqlite.org > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Re: [sqlite] Hints for the query planner
Richard wrote: The question for today is what to call this magic hint function: (1) unlikely(EXPR) (2) selective(EXPR) (3) seldom(EXPR) (4) seldom_true(EXPR) (5) usually_not_true(EXPR) Please feel free to suggest other names if you think of any. I dislike #4 and#5, but what about one of the following three: 1. Probability 2. Improbability 3. Probe Number 2 might be in honor of British writer Douglas Adams's masterpiece H2G2 Cordiali saluti/Vriendelijke groeten/Kind regards, Klaas "Z4us" V MetaDBA ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
probability(EXPR, value) - Would force the user to set their best guess. I would think that the users guess would be more accurate than a general guess? What's considered unlikely for the users data, 0.05? 0.20? I would prefer if the mechanism could be handled with pragmas (or something) for adding / removing hints, to keep special functions away from the SQL. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
On Sep 10, 2013, at 6:23 PM, Scott Robisonwrote: > I think I prefer something along the lines of "unlikely" or "likely". The > problem with a term like "selective" (at least in my brain) is that it > doesn't imply (for the single argument version) in what way it is being > selective. > > If a negative form of the magic function is used ("unlikely", "seldom", > etc) I would suggest considering inverting the optional second parameter. > In other words, 0.05 would become 0.95. In my opinion, that reads better: > "unlikely(COLUMN LIKE '%pattern%', 0.95)" reads "it is unlikely the > expression will be true 95% of the time". > > In like fashion, a positive form of the magic function would keep the > current meaning of the optional second parameter. This is the best suggestion. The pseudo-function names do not change the meaning of the query, and they are more clear with regard to the optional numeric argument. e ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
On Tue, 10 Sep 2013 15:26:51 -0400, Richard Hippwrote: >SURVEY QUESTION: > >The question for today is what to call this magic hint function: > >(1) unlikely(EXPR) >(2) selective(EXPR) >(3) seldom(EXPR) >(4) seldom_true(EXPR) >(5) usually_not_true(EXPR) > >Please feel free to suggest other names if you think of any. > >ADDITIONAL INFORMATION: > >The current implementation allows a second argument which must be a >floating point constant between 0.0 and 1.0, inclusive. The second argument >is an estimate of the probability that the expression in the first argument >will be true. The default is 0.05. Names like "unlikely" or "seldom" work >well when this probability is small, but if the second argument is close to >1.0, then those names seem backwards. I don't know if this matters. The >optional second argument is not guaranteed to make it into an actually >release. Use a word that connotes no value on a scale, a word that has no value implied by the word itself. These seem OK to me: confidence probability chance selective ned (I like this one especially) These do not: unlikely seldom usually -- Ned ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
On Wed, Sep 11, 2013 at 5:36 PM, Kevin Bensonwrote: > On Tue, Sep 10, 2013 at 3:26 PM, Richard Hipp wrote: > > SURVEY QUESTION: > > (1) unlikely(EXPR) > > (2) selective(EXPR) > > (3) seldom(EXPR) > > (4) seldom_true(EXPR) > > (5) usually_not_true(EXPR) > > > (6) nominal(EXPR) > (VII): prioritize() or priority() is neither negative nor positive in connotation, but i'm not 100% sure whether it's really indicative of what the op does. -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
On Tue, Sep 10, 2013 at 3:26 PM, Richard Hippwrote: > SURVEY QUESTION: > > The question for today is what to call this magic hint function: > > (1) unlikely(EXPR) > (2) selective(EXPR) > (3) seldom(EXPR) > (4) seldom_true(EXPR) > (5) usually_not_true(EXPR) (6) nominal(EXPR) -- -- -- --Ô¿Ô-- K e V i N ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
On 11.09.2013 16:07, Ryan Johnson wrote: Perhaps you meant "demote" rather than "degrade" ? That would be a better fit (an external action that does not necessarily make the object worse or less useful), and less vague, but it still carries a negative connotation. "demote" sounds fine to me, especially since its antonym "promote" may be used for a function name to raise an expression's rank for the query planner rather than the 2nd argument. The negative connotation of both "degrade" and "demote" does not feel bad for me as a non native English speaker. Both, however, express an action rather than a quality which is more telling to me than "unlikely" or the other adjectives suggested so far. Maybe the function name could be prefixed by "qp_" (for query planner) or similar to clarify their functionality even more: "qp_demote" and "qp_promote"? Ralf ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
I suggest a verb to express what the function is actually doing, namely to reduce its argument in rank or degree for the query planner: DEGRADE 1. to reduce in worth, character, etc; disgrace; 2. to reduce in rank, status, or degree; remove from office; 3. to reduce in strength, quality, intensity, etc Source: http://www.collinsdictionary.com/dictionary/english/degrade On 10.09.2013 21:26, Richard Hipp wrote: Please feel free to suggest other names if you think of any. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
> -Original Message- > From: sqlite-users-boun...@sqlite.org > [mailto:sqlite-users-boun...@sqlite.org] On Behalf > Of Richard Hipp > Sent: Tuesday, September 10, 2013 2:27 PM > To: General Discussion of SQLite Database > Subject: [sqlite] Hints for the query planner > > There is a survey question at the bottom of this message. But first some > context... > > > SURVEY QUESTION: > > The question for today is what to call this magic hint function: > > (1) unlikely(EXPR) > (2) selective(EXPR) > (3) seldom(EXPR) > (4) seldom_true(EXPR) > (5) usually_not_true(EXPR) > > Please feel free to suggest other names if you think of any. > > ADDITIONAL INFORMATION: > > The current implementation allows a second argument which must be a > floating point constant between 0.0 and 1.0, inclusive. The second argument > is an estimate of the probability that the expression in the first argument > will be true. The default is 0.05. Names like "unlikely" or "seldom" work > well when this probability is small, but if the second argument is close to > 1.0, then those names seem backwards. I don't know if this matters. The > optional second argument is not guaranteed to make it into an actually > release. All -- Since the optional second arg is not guaranteed to make it into a release, I like (3) - SELDOM( EXPR ) ... --- cut here -- SELECT DISTINCT aname FROM album, composer, track WHERE SELDOM( cname LIKE '%bach%' ) AND composer.cid=track.cid AND album.aid=track.aid ; --- cut there -- -- kjh ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
I also think it should not be directly in the SQL. I like the not-really-a-comment syntax. Another option might be a few PRAGMAs, something like PRAGMA hint("table1.col1 IN (1,2,5)", 0.05); PRAGMA hint("table1.col2 LIKE '%bach%'". 0.4); these would add the hints to an internal table. When preparing a query, the planner would check the hints table to see if any hints match the table/column/condition triplet, and if so optionally use the hint. Removing a hint and removing all hints would also be a couple of PRAGMAs. On Wed, Sep 11, 2013 at 3:53 AM, kyanwrote: > Hello Dr Hipp, > > First of all, I apologize for this rather off-topic suggestion knowing that > you may have already implemented the syntax you describe, but there is an > IMHO good reason for it, read ahead. > > On Tue, Sep 10, 2013 at 10:26 PM, Richard Hipp wrote: > > > SELECT DISTINCT aname > > FROM album, composer, track > > WHERE unlikely(cname LIKE '%bach%') > >AND composer.cid=track.cid > >AND album.aid=track.aid; > > > > I would prefer that the planner hint is not interleaved inside normal SQL > syntax. Instead I propose a special comment-like syntax instead, as > Oracle's /*+ */ or --+, but replacing "+" with another symbol, e.g. ">": > > SELECT DISTINCT aname > > FROM album, composer, track > > WHERE cname LIKE '%bach%' > > /*> unlikely */ > > AND composer.cid=track.cid AND album.aid=track.aid; > > > > or: > > SELECT DISTINCT aname > > FROM album, composer, track > > WHERE cname LIKE '%bach%' > > --> unlikely > >AND composer.cid=track.cid > >AND album.aid=track.aid; > > > If the hint is to be applied to an expression that combines many column > predicates with AND (I am not sure if this actually makes sense): > > SELECT DISTINCT aname > > FROM album, composer, track > > WHERE unlikely(cname LIKE '%bach%' > >AND composer.cid=track.cid) > >AND album.aid=track.aid; > > > > then a -normally redundant- pair of parentheses can be used to specify the > scope of the hint: > > SELECT DISTINCT aname > > FROM album, composer, track > > WHERE (cname LIKE '%bach%' AND composer.cid=track.cid) /*> unlikely */ > >AND album.aid=track.aid; > > > > The SQLite SQL parser will have to look for exactly "/*>" or "-->" without > whitespace between the characters, so it can easily tell a planner hint > from a plain comment with a single character read-ahead. Also, the fact > that hints are "transparent" to the SQL syntax will allow the query parser > to handle them in an "orthogonal" way (e.g. a small separate parser for > hints) to normal SQL parsing, IMO making handling of any future hints > easier to add. > > The main reason for this proposal is that the planner hint will be ignored > by default by other SQL parsers without the need to modify them, which in > some cases may not even be possible. For instance it will allow someone to > write SQL that is valid in databases of alternative DB vendors and still > provide planner hints when the DB vendor is SQLite (that is why I replaced > "+" with ">", to avoid conflicts with a hypothetical alternate Oracle query > optimizer) without having to modify the SQL in the application code to > remove the hints. This is a property of the Oracle optimizer hint syntax I > have always appreciated when writing SQL that is to be executed in > databases of alternative DB vendors with the same schema, for applications > where the user chooses the database vendor from a list of supported ones. > > For more on Oracle optimizer hints see > http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm. > > As for the name of the hint itself I would propose: > > --> PROBABLY(True) -- the current default > --> PROBABLY(False) > --> PROBABLY(False, 0.7) > --> PROBABLY(False, 0.6, 0.3) --re "pedantic detail", the second value if > for True, the remainder for NULL. > > Kind regards, > > Constantine Yannakopoulos > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- ˙uʍop-ǝpısdn sı ɹoʇıuoɯ ɹnoʎ 'sıɥʇ pɐǝɹ uɐɔ noʎ ɟı ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
(6) maybe(EXPR) -Ursprüngliche Nachricht- Von: Richard Hipp [mailto:d...@sqlite.org] Gesendet: Dienstag, 10. September 2013 21:27 An: General Discussion of SQLite Database Betreff: [sqlite] Hints for the query planner There is a survey question at the bottom of this message. But first some context... Over on the sqlite-dev mailing list, a debate has been going on about the best way to provide some useful hints to the query planner. The query under discussion looks like this: SELECT DISTINCT aname FROM album, composer, track WHERE cname LIKE '%bach%' AND composer.cid=track.cid AND album.aid=track.aid; Assuming that the schema has appropriate indices and ANALYZE has been run, SQLite does a good job of selecting an efficient query plan for the above. But the query planner lacks a key piece of information that could help it to do a better job. In particular, the query planner does not know how often the subexpression "cname LIKE '%bach%'" will be true. But, it turns out, the best query plan depends critically on this one fact. By default, the query planner (in SQLite 3.8.0) assumes that a subexpression that cannot use an index will always be true. Probably this will be tweaked in 3.8.1 so that such subexpressions will be assumed to usually, but not always, be true. Either way, it would be useful to be able to convey to the query planner the other extreme - that a subexpression is usually not true. (Pedantic detail: "not true" is not the same as "false" in SQL because NULL is neither true nor false.) There is currently code in a branch that provides a hinting mechanism using a magic "unlikely()" function. Subexpressions contained within "unlikely()" are assumed to usually not be true. Other than this hint to the query planner, the unlikely() function is a complete no-op and optimized out of the VDBE code so that it does not consume any CPU cycles. The only purpose of the unlikely() function is to let the query planner know that the subexpression contained in its argument is not commonly true. So, if an application developer knows that the string "bach" seldom occurs in composer names, then she might rewrite the query like this: SELECT DISTINCT aname FROM album, composer, track WHERE unlikely(cname LIKE '%bach%') AND composer.cid=track.cid AND album.aid=track.aid; The query planner might use this "likelihood" hint to choose a different query plan that works better when the subexpression is commonly false. Or it might decide that the original query plan was good enough and ignore the hint. The query planner gets to make that decision. The application developer is not telling the query planner what to do. The application developer has merely provided a small amount of meta-information about the likelihood of the subexpression being true, meta-information which the query planner may or may not use. Note that the subexpression does not have to be a LIKE operator. PostgreSQL, to name one example, estimates how often a LIKE operator will be true based on the pattern on its right-hand side, and adjust query plans accordingly, and some have argued for this sort of thing in SQLite. But I want a more general solution. Suppose the subexpression involves one or more calls to application-defined functions about which the query planner cannot possible know anything. A general mechanism for letting the query planner know that subexpressions are commonly not true is what is desired - not a technique for making LIKE operators more efficient. SURVEY QUESTION: The question for today is what to call this magic hint function: (1) unlikely(EXPR) (2) selective(EXPR) (3) seldom(EXPR) (4) seldom_true(EXPR) (5) usually_not_true(EXPR) Please feel free to suggest other names if you think of any. ADDITIONAL INFORMATION: The current implementation allows a second argument which must be a floating point constant between 0.0 and 1.0, inclusive. The second argument is an estimate of the probability that the expression in the first argument will be true. The default is 0.05. Names like "unlikely" or "seldom" work well when this probability is small, but if the second argument is close to 1.0, then those names seem backwards. I don't know if this matters. The optional second argument is not guaranteed to make it into an actually release. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- Gunter Hick Software Engineer Scientific Games International GmbH Klitschgasse 2 – 4, A - 1130 Vienna, Austria FN 157284 a, HG Wien Tel: +43 1 80100 0 E-Mail: h...@scigames.at This e-mail is confidential and may well also be legally privileged. If you have received it in error, you are on notice as to its status and accordingly please
Re: [sqlite] Hints for the query planner
How about: maybe(COLUMN LIKE '%pattern%',.95) or (as percent using integer value in 0..100) maybe(COLUMN LIKE '%pattern%',95) with a default value of (possibly) 50% (or .5) for the optional second arg? -Original Message- From: Richard Hipp Sent: Tuesday, September 10, 2013 10:26 PM To: General Discussion of SQLite Database Subject: [sqlite] Hints for the query planner The question for today is what to call this magic hint function: (1) unlikely(EXPR) (2) selective(EXPR) (3) seldom(EXPR) (4) seldom_true(EXPR) (5) usually_not_true(EXPR) Please feel free to suggest other names if you think of any. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
On Sep 10, 2013, at 12:26 PM, Richard Hippwrote: > SURVEY QUESTION: > > The question for today is what to call this magic hint function: > > (1) unlikely(EXPR) > (2) selective(EXPR) > (3) seldom(EXPR) > (4) seldom_true(EXPR) > (5) usually_not_true(EXPR) > > Please feel free to suggest other names if you think of any. I like the optional second parameter. Apart from the obvious change to likelihood, which somebody else suggested, but which is less self documenting in the one-argument case, I'd suggest that you actually add two words: likely and unlikely, and whose second parameters are the inverse of each other, crossing at 0.5. So an expression could then, in the simplest case, be labeled LIKELY or UNLIKELY, with second parameter defaulting to 1.0 (or to whatever value you feel is appropriate), but allowing the user to specify a lesser likelihood by lowering that value. LIKELY(expr, 0) would mean the same as UNLIKELY(expr) and UNLIKELY(expr, 1), and UNLIKELY(expr, 0) would be the same as LIKELY(expr, 1), etc. James ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
As I was reading this, I said to myself, "what they really need is a confidence value." Then I read the end and, there it was! A confidence value. Ok.. not exactly confidence, but I think you get my meaning. It seems to me that you're allowing the query writer to substitute personal knowledge of the DB for knowledge based on ANALYZE or other statistical indexes. So, I'm all in favor of allowing that second argument. If so, I would suggest "confidence(exp, confidence_value)". Or, perhaps, "likelihood(..)" Likely is fine, or you might even establish several names with built-in defaults... e.g. "likely(xxx)" might be "confidence(xxx, .75)" and "unlikely(xxx)" might be "confidence(xxx, .25)" You've got "rarely," "mostly," and a whole suite of other synonyms. This email and any attachments are only for use by the intended recipient(s) and may contain legally privileged, confidential, proprietary or otherwise private information. Any unauthorized use, reproduction, dissemination, distribution or other disclosure of the contents of this e-mail or its attachments is strictly prohibited. If you have received this email in error, please notify the sender immediately and delete the original. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
Hello Dr Hipp, First of all, I apologize for this rather off-topic suggestion knowing that you may have already implemented the syntax you describe, but there is an IMHO good reason for it, read ahead. On Tue, Sep 10, 2013 at 10:26 PM, Richard Hippwrote: > SELECT DISTINCT aname > FROM album, composer, track > WHERE unlikely(cname LIKE '%bach%') >AND composer.cid=track.cid >AND album.aid=track.aid; > I would prefer that the planner hint is not interleaved inside normal SQL syntax. Instead I propose a special comment-like syntax instead, as Oracle's /*+ */ or --+, but replacing "+" with another symbol, e.g. ">": SELECT DISTINCT aname > FROM album, composer, track > WHERE cname LIKE '%bach%' > /*> unlikely */ > AND composer.cid=track.cid AND album.aid=track.aid; > or: SELECT DISTINCT aname > FROM album, composer, track > WHERE cname LIKE '%bach%' > --> unlikely >AND composer.cid=track.cid >AND album.aid=track.aid; If the hint is to be applied to an expression that combines many column predicates with AND (I am not sure if this actually makes sense): SELECT DISTINCT aname > FROM album, composer, track > WHERE unlikely(cname LIKE '%bach%' >AND composer.cid=track.cid) >AND album.aid=track.aid; > then a -normally redundant- pair of parentheses can be used to specify the scope of the hint: SELECT DISTINCT aname > FROM album, composer, track > WHERE (cname LIKE '%bach%' AND composer.cid=track.cid) /*> unlikely */ >AND album.aid=track.aid; > The SQLite SQL parser will have to look for exactly "/*>" or "-->" without whitespace between the characters, so it can easily tell a planner hint from a plain comment with a single character read-ahead. Also, the fact that hints are "transparent" to the SQL syntax will allow the query parser to handle them in an "orthogonal" way (e.g. a small separate parser for hints) to normal SQL parsing, IMO making handling of any future hints easier to add. The main reason for this proposal is that the planner hint will be ignored by default by other SQL parsers without the need to modify them, which in some cases may not even be possible. For instance it will allow someone to write SQL that is valid in databases of alternative DB vendors and still provide planner hints when the DB vendor is SQLite (that is why I replaced "+" with ">", to avoid conflicts with a hypothetical alternate Oracle query optimizer) without having to modify the SQL in the application code to remove the hints. This is a property of the Oracle optimizer hint syntax I have always appreciated when writing SQL that is to be executed in databases of alternative DB vendors with the same schema, for applications where the user chooses the database vendor from a list of supported ones. For more on Oracle optimizer hints see http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm. As for the name of the hint itself I would propose: --> PROBABLY(True) -- the current default --> PROBABLY(False) --> PROBABLY(False, 0.7) --> PROBABLY(False, 0.6, 0.3) --re "pedantic detail", the second value if for True, the remainder for NULL. Kind regards, Constantine Yannakopoulos ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
How about three, where two are just overloaded, or rather syntactic sugar, for the main declaration: unlikely(expr) likely(expr) likelihood(expr, rate) Where unlikely(expr) -> likelihood(expr, 0.05) likely(expr) -> likelihood(expr, 0.95) I would presume that the rate is the expected selection rate of expr over the entire table, so the correctly computed value of the rate for a table A of a given expr, for example: select * from A where col LIKE '%bach%' would be select * from A where likelihood(col like '%bach%', (select count(*) from a where col like '%bach%')/(select count(*) from a)) and is not related to the application of other conditionals not included in the expr itself. So, if there are usable statistics available, should the likelihood given be ignored; or, should likelihood completely override the statistical input to the optimizer? I vote that the hint should only be used if no other statistical data is available. > Plus an overload: unlikely(expr) ==> likelihood (expr, 0.05) > > (sent from a mobile device - please excuse brevity, typos, and top- > posting) > - stephan beal > http://wanderinghorse.net > On Sep 10, 2013 11:49 PM, "Tim Streater"wrote: > > > On 10 Sep 2013 at 20:26, Richard Hipp wrote: > > > > > SURVEY QUESTION: > > > > > > The question for today is what to call this magic hint function: > > > > > > (1) unlikely(EXPR) > > > (2) selective(EXPR) > > > (3) seldom(EXPR) > > > (4) seldom_true(EXPR) > > > (5) usually_not_true(EXPR) > > > > > > Please feel free to suggest other names if you think of any. > > > > likelihood (EXPR, value) > > > > > > > > > > -- > > Cheers -- Tim > > > > ___ > > sqlite-users mailing list > > sqlite-users@sqlite.org > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
On 10 Sep 2013, at 10:48pm, Tim Streaterwrote: > likelihood (EXPR, value) Best I've seen so far. I know it makes no sense without the second parameter but I think if you're going to make use of a special non-standard optimisation system you can be expected to know exactly what it means. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
Or... unlikely(expr (, prob=0.05)) ==> likelihood (...) (sent from a mobile device - please excuse brevity, typos, and top-posting) - stephan beal http://wanderinghorse.net On Sep 11, 2013 12:51 AM, "Stephan Beal"wrote: > Plus an overload: unlikely(expr) ==> likelihood (expr, 0.05) > > (sent from a mobile device - please excuse brevity, typos, and > top-posting) > - stephan beal > http://wanderinghorse.net > On Sep 10, 2013 11:49 PM, "Tim Streater" wrote: > >> On 10 Sep 2013 at 20:26, Richard Hipp wrote: >> >> > SURVEY QUESTION: >> > >> > The question for today is what to call this magic hint function: >> > >> > (1) unlikely(EXPR) >> > (2) selective(EXPR) >> > (3) seldom(EXPR) >> > (4) seldom_true(EXPR) >> > (5) usually_not_true(EXPR) >> > >> > Please feel free to suggest other names if you think of any. >> >> likelihood (EXPR, value) >> >> >> >> >> -- >> Cheers -- Tim >> >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> >> ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
Seconded. -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Simon Slavin Sent: Tuesday, September 10, 2013 3:46 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] Hints for the query planner On 10 Sep 2013, at 10:48pm, Tim Streater <t...@clothears.org.uk> wrote: > likelihood (EXPR, value) Best I've seen so far. I know it makes no sense without the second parameter but I think if you're going to make use of a special non-standard optimisation system you can be expected to know exactly what it means. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
Plus an overload: unlikely(expr) ==> likelihood (expr, 0.05) (sent from a mobile device - please excuse brevity, typos, and top-posting) - stephan beal http://wanderinghorse.net On Sep 10, 2013 11:49 PM, "Tim Streater"wrote: > On 10 Sep 2013 at 20:26, Richard Hipp wrote: > > > SURVEY QUESTION: > > > > The question for today is what to call this magic hint function: > > > > (1) unlikely(EXPR) > > (2) selective(EXPR) > > (3) seldom(EXPR) > > (4) seldom_true(EXPR) > > (5) usually_not_true(EXPR) > > > > Please feel free to suggest other names if you think of any. > > likelihood (EXPR, value) > > > > > -- > Cheers -- Tim > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
On 10 Sep 2013 at 20:26, Richard Hippwrote: > SURVEY QUESTION: > > The question for today is what to call this magic hint function: > > (1) unlikely(EXPR) > (2) selective(EXPR) > (3) seldom(EXPR) > (4) seldom_true(EXPR) > (5) usually_not_true(EXPR) > > Please feel free to suggest other names if you think of any. likelihood (EXPR, value) -- Cheers -- Tim ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
I think I prefer something along the lines of "unlikely" or "likely". The problem with a term like "selective" (at least in my brain) is that it doesn't imply (for the single argument version) in what way it is being selective. If a negative form of the magic function is used ("unlikely", "seldom", etc) I would suggest considering inverting the optional second parameter. In other words, 0.05 would become 0.95. In my opinion, that reads better: "unlikely(COLUMN LIKE '%pattern%', 0.95)" reads "it is unlikely the expression will be true 95% of the time". In like fashion, a positive form of the magic function would keep the current meaning of the optional second parameter. Just ideas, I'm not married to them. Thanks for the discussion. SDR On Tue, Sep 10, 2013 at 4:17 PM, Eric Minbiolewrote: > > (1) unlikely(EXPR) > > (2) selective(EXPR) > > (3) seldom(EXPR) > > (4) seldom_true(EXPR) > > (5) usually_not_true(EXPR) > > > > > I quite like (2) "selective". I think it's reasonably descriptive on its > own, and also works well with the optional second argument. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
> (1) unlikely(EXPR) > (2) selective(EXPR) > (3) seldom(EXPR) > (4) seldom_true(EXPR) > (5) usually_not_true(EXPR) > > I quite like (2) "selective". I think it's reasonably descriptive on its own, and also works well with the optional second argument. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
In Bayesian statistics there is a term "prior", prior probability. From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Richard Hipp [d...@sqlite.org] Sent: Tuesday, September 10, 2013 3:26 PM To: General Discussion of SQLite Database Subject: [sqlite] Hints for the query planner There is a survey question at the bottom of this message. But first some context... Over on the sqlite-dev mailing list, a debate has been going on about the best way to provide some useful hints to the query planner. The query under discussion looks like this: SELECT DISTINCT aname FROM album, composer, track WHERE cname LIKE '%bach%' AND composer.cid=track.cid AND album.aid=track.aid; Assuming that the schema has appropriate indices and ANALYZE has been run, SQLite does a good job of selecting an efficient query plan for the above. But the query planner lacks a key piece of information that could help it to do a better job. In particular, the query planner does not know how often the subexpression "cname LIKE '%bach%'" will be true. But, it turns out, the best query plan depends critically on this one fact. By default, the query planner (in SQLite 3.8.0) assumes that a subexpression that cannot use an index will always be true. Probably this will be tweaked in 3.8.1 so that such subexpressions will be assumed to usually, but not always, be true. Either way, it would be useful to be able to convey to the query planner the other extreme - that a subexpression is usually not true. (Pedantic detail: "not true" is not the same as "false" in SQL because NULL is neither true nor false.) There is currently code in a branch that provides a hinting mechanism using a magic "unlikely()" function. Subexpressions contained within "unlikely()" are assumed to usually not be true. Other than this hint to the query planner, the unlikely() function is a complete no-op and optimized out of the VDBE code so that it does not consume any CPU cycles. The only purpose of the unlikely() function is to let the query planner know that the subexpression contained in its argument is not commonly true. So, if an application developer knows that the string "bach" seldom occurs in composer names, then she might rewrite the query like this: SELECT DISTINCT aname FROM album, composer, track WHERE unlikely(cname LIKE '%bach%') AND composer.cid=track.cid AND album.aid=track.aid; The query planner might use this "likelihood" hint to choose a different query plan that works better when the subexpression is commonly false. Or it might decide that the original query plan was good enough and ignore the hint. The query planner gets to make that decision. The application developer is not telling the query planner what to do. The application developer has merely provided a small amount of meta-information about the likelihood of the subexpression being true, meta-information which the query planner may or may not use. Note that the subexpression does not have to be a LIKE operator. PostgreSQL, to name one example, estimates how often a LIKE operator will be true based on the pattern on its right-hand side, and adjust query plans accordingly, and some have argued for this sort of thing in SQLite. But I want a more general solution. Suppose the subexpression involves one or more calls to application-defined functions about which the query planner cannot possible know anything. A general mechanism for letting the query planner know that subexpressions are commonly not true is what is desired - not a technique for making LIKE operators more efficient. SURVEY QUESTION: The question for today is what to call this magic hint function: (1) unlikely(EXPR) (2) selective(EXPR) (3) seldom(EXPR) (4) seldom_true(EXPR) (5) usually_not_true(EXPR) Please feel free to suggest other names if you think of any. ADDITIONAL INFORMATION: The current implementation allows a second argument which must be a floating point constant between 0.0 and 1.0, inclusive. The second argument is an estimate of the probability that the expression in the first argument will be true. The default is 0.05. Names like "unlikely" or "seldom" work well when this probability is small, but if the second argument is close to 1.0, then those names seem backwards. I don't know if this matters. The optional second argument is not guaranteed to make it into an actually release. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hints for the query planner
Hi Richard, What about "probability" or "likelyhood"? This works in both the case where the likelyhood is great as well as when it is low. From the list you provided, I would pick "unlikely". Kind regards, Philip Bennefall - Original Message - From: "Richard Hipp"To: "General Discussion of SQLite Database" Sent: Tuesday, September 10, 2013 9:26 PM Subject: [sqlite] Hints for the query planner There is a survey question at the bottom of this message. But first some context... Over on the sqlite-dev mailing list, a debate has been going on about the best way to provide some useful hints to the query planner. The query under discussion looks like this: SELECT DISTINCT aname FROM album, composer, track WHERE cname LIKE '%bach%' AND composer.cid=track.cid AND album.aid=track.aid; Assuming that the schema has appropriate indices and ANALYZE has been run, SQLite does a good job of selecting an efficient query plan for the above. But the query planner lacks a key piece of information that could help it to do a better job. In particular, the query planner does not know how often the subexpression "cname LIKE '%bach%'" will be true. But, it turns out, the best query plan depends critically on this one fact. By default, the query planner (in SQLite 3.8.0) assumes that a subexpression that cannot use an index will always be true. Probably this will be tweaked in 3.8.1 so that such subexpressions will be assumed to usually, but not always, be true. Either way, it would be useful to be able to convey to the query planner the other extreme - that a subexpression is usually not true. (Pedantic detail: "not true" is not the same as "false" in SQL because NULL is neither true nor false.) There is currently code in a branch that provides a hinting mechanism using a magic "unlikely()" function. Subexpressions contained within "unlikely()" are assumed to usually not be true. Other than this hint to the query planner, the unlikely() function is a complete no-op and optimized out of the VDBE code so that it does not consume any CPU cycles. The only purpose of the unlikely() function is to let the query planner know that the subexpression contained in its argument is not commonly true. So, if an application developer knows that the string "bach" seldom occurs in composer names, then she might rewrite the query like this: SELECT DISTINCT aname FROM album, composer, track WHERE unlikely(cname LIKE '%bach%') AND composer.cid=track.cid AND album.aid=track.aid; The query planner might use this "likelihood" hint to choose a different query plan that works better when the subexpression is commonly false. Or it might decide that the original query plan was good enough and ignore the hint. The query planner gets to make that decision. The application developer is not telling the query planner what to do. The application developer has merely provided a small amount of meta-information about the likelihood of the subexpression being true, meta-information which the query planner may or may not use. Note that the subexpression does not have to be a LIKE operator. PostgreSQL, to name one example, estimates how often a LIKE operator will be true based on the pattern on its right-hand side, and adjust query plans accordingly, and some have argued for this sort of thing in SQLite. But I want a more general solution. Suppose the subexpression involves one or more calls to application-defined functions about which the query planner cannot possible know anything. A general mechanism for letting the query planner know that subexpressions are commonly not true is what is desired - not a technique for making LIKE operators more efficient. SURVEY QUESTION: The question for today is what to call this magic hint function: (1) unlikely(EXPR) (2) selective(EXPR) (3) seldom(EXPR) (4) seldom_true(EXPR) (5) usually_not_true(EXPR) Please feel free to suggest other names if you think of any. ADDITIONAL INFORMATION: The current implementation allows a second argument which must be a floating point constant between 0.0 and 1.0, inclusive. The second argument is an estimate of the probability that the expression in the first argument will be true. The default is 0.05. Names like "unlikely" or "seldom" work well when this probability is small, but if the second argument is close to 1.0, then those names seem backwards. I don't know if this matters. The optional second argument is not guaranteed to make it into an actually release. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users