Re: [sqlite] All versions compatible "very big" estimatedCost (Virtual tables)
On Tue, Mar 28, 2017 at 11:26 AM, Max Vlasovwrote: > > > So, what is the maximum reasonable value of estimatedCost that will not > turn sqlite into possible overflow errors while telling at the same time > that I consider some variant very, very expensive? Or maybe changing cheap > from 1 to 0 will do the trick? > > Hi again, replying to myself since I noticed a more straightforward case explaining estimatedCost peculiarities Sqlite 3.17.0 My comma-list virtual table implementation (vtcommalist) reported the table structure as CREATE TABLE [xxx] ([CommaList] TEXT HIDDEN, [Value] TEXT) Basically it is only functional when CommaList is provided so it can produce Value in this case based on coming CommaList. Hidden column here to allow table-valued functions syntax. Database: The virtual table create virtual table [cmlist] Using VtCommaList a simple test table with lists data create table [lists] ([list] TEXT) populated with 1 rows insert into lists (list) values ('1, 2, 3, 4') The query in question: Select distinct trim(value) From lists, cmlist(list) Sqlite asks for index evaluation twice: 1. Suggesting constraint for CommaList field only, my code reported "cheap" (value 1) 2. Suggesting no constraint at all, my code reported "expensive" (value 10) But Sqlite still went for full-scan in this case (second choice) so I had to report error leading to "SQL Logic error" Only when I increased "expensive" to 10*10 = 100, Sqlite accepted my cheap/expensive ratio as worth considering with following filter with commalist field constraint. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] All versions compatible "very big" estimatedCost (Virtual tables)
You can always use CROSS JOIN to force a specific join order as in: SELECT ... FROM mytable m CROSS_JOIN split s ON (s.input = m.string_field) JOIN anothertable a ON (a.field = s.output) ... -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Max Vlasov Gesendet: Dienstag, 28. März 2017 13:37 An: SQLite mailing list <sqlite-users@mailinglists.sqlite.org> Betreff: Re: [sqlite] All versions compatible "very big" estimatedCost (Virtual tables) On Tue, Mar 28, 2017 at 12:51 PM, Dominique Devienne <ddevie...@gmail.com> wrote: > On Tue, Mar 28, 2017 at 11:00 AM, Hick Gunter <h...@scigames.at> wrote: > > > > > The "estimated cost" is described as "how many disk IO operations > > are expected". Version higher than 3.8.2 allow setting an "estimatedRows" > > (default: 25) and versions above 3.9.0 allow setting a flag that > indicates > > that 0 or 1 rows will be returned (i.e. the constraints form a > > "unique index"). > > > > Thanks for the RTFM. That helps. Looks like I have about 3 years worth > of catch-up to do on sqlite3_index_info. > > Thanks, Dominique, Hick I looked at the additional fields of sqlite3_index_info and probably they're of no use for me, at least they don't provide some enforcing of my particular index. As for my problem queries, they usually involve "order by" or "group by", that's when the planner (despite the obvious bias from me about the particular index) decides to provide only one of my two required fields. Also interesting that actually what I called input/output approach, sqlite implemented calling it table-valued functions (probably starting 3.9.0) ( https://sqlite.org/vtab.html#tabfunc2). To make my tables compatible with this syntax, I just had to append HIDDEN to the "input" fileds of the table definition. But seems like changing to this definition doesn't affect the planner. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ Gunter Hick Software Engineer Scientific Games International GmbH FN 157284 a, HG Wien Klitschgasse 2-4, A-1130 Vienna, Austria Tel: +43 1 80100 0 E-Mail: h...@scigames.at This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] All versions compatible "very big" estimatedCost (Virtual tables)
On Tue, Mar 28, 2017 at 12:51 PM, Dominique Deviennewrote: > On Tue, Mar 28, 2017 at 11:00 AM, Hick Gunter wrote: > > > > > The "estimated cost" is described as "how many disk IO operations are > > expected". Version higher than 3.8.2 allow setting an "estimatedRows" > > (default: 25) and versions above 3.9.0 allow setting a flag that > indicates > > that 0 or 1 rows will be returned (i.e. the constraints form a "unique > > index"). > > > > Thanks for the RTFM. That helps. Looks like I have about 3 years worth of > catch-up to do on sqlite3_index_info. > > Thanks, Dominique, Hick I looked at the additional fields of sqlite3_index_info and probably they're of no use for me, at least they don't provide some enforcing of my particular index. As for my problem queries, they usually involve "order by" or "group by", that's when the planner (despite the obvious bias from me about the particular index) decides to provide only one of my two required fields. Also interesting that actually what I called input/output approach, sqlite implemented calling it table-valued functions (probably starting 3.9.0) ( https://sqlite.org/vtab.html#tabfunc2). To make my tables compatible with this syntax, I just had to append HIDDEN to the "input" fileds of the table definition. But seems like changing to this definition doesn't affect the planner. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] All versions compatible "very big" estimatedCost (Virtual tables)
On Tue, Mar 28, 2017 at 11:00 AM, Hick Gunterwrote: > >FWIW, I've often wondered about the cost estimates of real tables versus > virtual tables, especially since many vtables implementations don't involve > real IO but pure in-memory computations. There's >very little advice or > documentation on this important subject, and Max's email reveals that > empirical testing leading to ad-hoc heuristics are vulnerable to breaking > when SQLite itself evolves. More >guidance and perhaps even some "normative" > >documentation is needed IMHO. Thanks, --DD __ > _ > > The "estimated cost" is described as "how many disk IO operations are > expected". Version higher than 3.8.2 allow setting an "estimatedRows" > (default: 25) and versions above 3.9.0 allow setting a flag that indicates > that 0 or 1 rows will be returned (i.e. the constraints form a "unique > index"). > Thanks for the RTFM. That helps. Looks like I have about 3 years worth of catch-up to do on sqlite3_index_info. Sorry for hijacking Max's thread, which is why his costs seems to be ignored. --DD /* Fields below are only available in SQLite 3.8.2 and later */ sqlite3_int64 estimatedRows;/* Estimated number of rows returned */ /* Fields below are only available in SQLite 3.9.0 and later */ int idxFlags; /* Mask of SQLITE_INDEX_SCAN_* flags */ /* Fields below are only available in SQLite 3.10.0 and later */ sqlite3_uint64 colUsed;/* Input: Mask of columns used by statement */ ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] All versions compatible "very big" estimatedCost (Virtual tables)
>FWIW, I've often wondered about the cost estimates of real tables versus >virtual tables, especially since many vtables implementations don't involve >real IO but pure in-memory computations. There's >very little advice or >documentation on this important subject, and Max's email reveals that >empirical testing leading to ad-hoc heuristics are vulnerable to breaking when >SQLite itself evolves. More >guidance and perhaps even some "normative" >documentation is needed IMHO. Thanks, --DD >___ The "estimated cost" is described as "how many disk IO operations are expected". Version higher than 3.8.2 allow setting an "estimatedRows" (default: 25) and versions above 3.9.0 allow setting a flag that indicates that 0 or 1 rows will be returned (i.e. the constraints form a "unique index"). ___ Gunter Hick Software Engineer Scientific Games International GmbH FN 157284 a, HG Wien Klitschgasse 2-4, A-1130 Vienna, Austria Tel: +43 1 80100 0 E-Mail: h...@scigames.at This communication (including any attachments) is intended for the use of the intended recipient(s) only and may contain information that is confidential, privileged or legally protected. Any unauthorized use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender by return e-mail message and delete all copies of the original communication. Thank you for your cooperation. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] All versions compatible "very big" estimatedCost (Virtual tables)
On Tue, Mar 28, 2017 at 10:26 AM, Max Vlasovwrote: > I sometimes use virtual tables to implement some kind of one-to-many > output. One of examples mentioned previously was the comma list virtual > table when a field containing comma-separated values might be used to > output rows of values from this list. Other example - performing > regexp-like query against a text and outputting the result columns. But > this involves assuming that some columns of the virtual table are required > "input" parameters while the other is "output". For example, for comma > list, the required parameter is the list, the output - extracted values. > The consequence of this trick is that this virtual table is not fully > functional one, so you can't query select * from it, it's useful only by > providing where or "join .. on" clause containing the required "input" > parameters. > > I usually encouraged sqlite to provide all required input parameters by > reporting a "cheap" value of estimatedCost when I recognized my "input" > parameters in xBestIndex call and providing an "expensive" values for any > other cases. This usually worked for simple and complex cases until > 3.8.0.0. It seems the version introduced Next-Generation Query Planner and > I noticed that in some complex cases my cheap-expensive recommendations > were ignored, so even when I noticed in the debugger that cheap value was > provided for the right index and expensive for all other cases, the > following xFilter provided not all values required probably deciding it > knew better :). Before this I used cheap value 1 and expensive value 1. > Testing (3.17.0) revealed that in some cases multiplying expensive by x100 > helped, but other cases required increasing this value even more. > > So, what is the maximum reasonable value of estimatedCost that will not > turn sqlite into possible overflow errors while telling at the same time > that I consider some variant very, very expensive? Or maybe changing cheap > from 1 to 0 will do the trick? > FWIW, I've often wondered about the cost estimates of real tables versus virtual tables, especially since many vtables implementations don't involve real IO but pure in-memory computations. There's very little advice or documentation on this important subject, and Max's email reveals that empirical testing leading to ad-hoc heuristics are vulnerable to breaking when SQLite itself evolves. More guidance and perhaps even some "normative" documentation is needed IMHO. Thanks, --DD ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] All versions compatible "very big" estimatedCost (Virtual tables)
Hi, I sometimes use virtual tables to implement some kind of one-to-many output. One of examples mentioned previously was the comma list virtual table when a field containing comma-separated values might be used to output rows of values from this list. Other example - performing regexp-like query against a text and outputting the result columns. But this involves assuming that some columns of the virtual table are required "input" parameters while the other is "output". For example, for comma list, the required parameter is the list, the output - extracted values. The consequence of this trick is that this virtual table is not fully functional one, so you can't query select * from it, it's useful only by providing where or "join .. on" clause containing the required "input" parameters. I usually encouraged sqlite to provide all required input parameters by reporting a "cheap" value of estimatedCost when I recognized my "input" parameters in xBestIndex call and providing an "expensive" values for any other cases. This usually worked for simple and complex cases until 3.8.0.0. It seems the version introduced Next-Generation Query Planner and I noticed that in some complex cases my cheap-expensive recommendations were ignored, so even when I noticed in the debugger that cheap value was provided for the right index and expensive for all other cases, the following xFilter provided not all values required probably deciding it knew better :). Before this I used cheap value 1 and expensive value 1. Testing (3.17.0) revealed that in some cases multiplying expensive by x100 helped, but other cases required increasing this value even more. So, what is the maximum reasonable value of estimatedCost that will not turn sqlite into possible overflow errors while telling at the same time that I consider some variant very, very expensive? Or maybe changing cheap from 1 to 0 will do the trick? Thanks Max ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users