Re: [sqlite] Use cases for sqlite3_value_frombind()?
On 4/17/19, Peter da Silva wrote: > whether it's a > bound parameter or a constant in the query string doesn't tell you if it's > from a trusted source or not. How do you get an SQL injection vulnerability to call sqlite3_bind()? Worse case is that the attacker can make use of an existing parameter value that was already available in the application. So, for example, if the application were written in TCL, then the attacker could inject a parameter that is the name of any global TCL variable and cause the value of that TCL variable to be bound. But probably the attacker could not control the value of that TCL variable - he has to make due with what is already there. So if an application-defined SQL function requires that one or more of its input parameters be from a parameter, that restricts what an attacker can do with that function using an SQL injection. It does not completely prevent mischief, but it makes it more difficult. Without the sqlite3_value_frombind() restriction, an attacker that can do SQL injection can trivially inject whatever value he desires into the function parameter. With the sqlite3_value_fromblob() restriction, the attackers choices are quite a much more restricted, and may be an empty set, depending on the system. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Use cases for sqlite3_value_frombind()?
I don't think I would use this as a security indicator, whether it's a bound parameter or a constant in the query string doesn't tell you if it's from a trusted source or not. This is more an indicator that this value is likely to change in subsequent queries. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Use cases for sqlite3_value_frombind()?
Besides which, of course, you example is useless. The sqlite3_value_frombind would be used to find out if the parameter came from a binding or not as in, within the SomeFunction function, to determine if the parameter were a "bound" parameter or not. Example: select SomeFunction(?); select SomeFunction(42); Then the implementation of SomeFunction can test whether or not the argument it received was a "bound" parameter (the first case) or not (the second case). --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of Keith Medcalf >Sent: Wednesday, 17 April, 2019 12:38 >To: SQLite mailing list >Subject: Re: [sqlite] Use cases for sqlite3_value_frombind()? > > >Simon, > >There are fields (columns) in your invoices table named 1.23 and >7524? Why did you do this (or did you just use the wrong quotes >around text strings?) > > >--- >The fact that there's a Highway to Hell but only a Stairway to Heaven >says a lot about anticipated traffic volume. > > >>-Original Message- >>From: sqlite-users [mailto:sqlite-users- >>boun...@mailinglists.sqlite.org] On Behalf Of Simon Slavin >>Sent: Wednesday, 17 April, 2019 12:22 >>To: SQLite mailing list >>Subject: Re: [sqlite] Use cases for sqlite3_value_frombind()? >> >>On 17 Apr 2019, at 6:37pm, Stephen Chrzanowski >>wrote: >> >>> What measures the trustworthiness? At what point would the >running >>> application be notified that the statement was bound or injection >>avenue? >> >>You can include parameters as text in your SQL command: >> >>UPDATE invoices SET toBePaid="1.23" WHERE customerId="7524" >> >>If someone is attacking your server using SQL injection on a whole >>statement, that's what they'd do. And sqlite3_value_frombind() >would >>return FALSE. Of course, to detect this the application does need >to >>call sqlite3_value_frombind() on each parameter it cares about. >>___ >>sqlite-users mailing list >>sqlite-users@mailinglists.sqlite.org >>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Use cases for sqlite3_value_frombind()?
On 17 Apr 2019, at 7:37pm, Keith Medcalf wrote: > There are fields (columns) in your invoices table named 1.23 and 7524? Why > did you do this (or did you just use the wrong quotes around text strings?) I used the wrong quotes. Sorry, I've been doing things in other languages recently. Keith is right, those " signs should have been ' . ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Use cases for sqlite3_value_frombind()?
Simon, There are fields (columns) in your invoices table named 1.23 and 7524? Why did you do this (or did you just use the wrong quotes around text strings?) --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of Simon Slavin >Sent: Wednesday, 17 April, 2019 12:22 >To: SQLite mailing list >Subject: Re: [sqlite] Use cases for sqlite3_value_frombind()? > >On 17 Apr 2019, at 6:37pm, Stephen Chrzanowski >wrote: > >> What measures the trustworthiness? At what point would the running >> application be notified that the statement was bound or injection >avenue? > >You can include parameters as text in your SQL command: > >UPDATE invoices SET toBePaid="1.23" WHERE customerId="7524" > >If someone is attacking your server using SQL injection on a whole >statement, that's what they'd do. And sqlite3_value_frombind() would >return FALSE. Of course, to detect this the application does need to >call sqlite3_value_frombind() on each parameter it cares about. >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Use cases for sqlite3_value_frombind()?
So it becomes the responsibility of the application to acknowledge when binding is allowed or not. Got it. Clear now. Thanks. I was thinking small in this case. My own app, I know what's being fed, I already bind most things, but I absolutely can see the use of this. Time to see if I can add this particular function to my wrapper... .. one of these days. On Wed, Apr 17, 2019 at 2:22 PM Simon Slavin wrote: > On 17 Apr 2019, at 6:37pm, Stephen Chrzanowski > wrote: > > > What measures the trustworthiness? At what point would the running > > application be notified that the statement was bound or injection avenue? > > You can include parameters as text in your SQL command: > > UPDATE invoices SET toBePaid="1.23" WHERE customerId="7524" > > If someone is attacking your server using SQL injection on a whole > statement, that's what they'd do. And sqlite3_value_frombind() would > return FALSE. Of course, to detect this the application does need to call > sqlite3_value_frombind() on each parameter it cares about. > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Use cases for sqlite3_value_frombind()?
On 17 Apr 2019, at 6:37pm, Stephen Chrzanowski wrote: > What measures the trustworthiness? At what point would the running > application be notified that the statement was bound or injection avenue? You can include parameters as text in your SQL command: UPDATE invoices SET toBePaid="1.23" WHERE customerId="7524" If someone is attacking your server using SQL injection on a whole statement, that's what they'd do. And sqlite3_value_frombind() would return FALSE. Of course, to detect this the application does need to call sqlite3_value_frombind() on each parameter it cares about. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] slow join, fast subselect
Your made up plans are intriguing. The plan you show for the latter query omit to join a and b. Are you just making things up? sqlite> select a.rowid from a, b where a.ref=7 and a.rowid in (select rowid from b); QUERY PLAN |--SEARCH TABLE a USING COVERING INDEX aa (ref=? AND rowid=?) (~8 rows) |--USING ROWID SEARCH ON TABLE b FOR IN-OPERATOR `--SCAN TABLE b (~1048576 rows) However, assuming that you typo'd the query (please learn about this new-fangled thing called cut-n-paste to avoid that error in the future) you get this if you specify "FROM a" rather than "FROM a, b". sqlite> select rowid from a where ref=7 and rowid in (select rowid from b); QUERY PLAN |--SEARCH TABLE a USING COVERING INDEX aa (ref=? AND rowid=?) (~8 rows) `--USING ROWID SEARCH ON TABLE b FOR IN-OPERATOR In any case, upon closer examination the VDBE code for the two queries "solves" the query quite differently: select a.rowid from a, b where a.rowid == b.rowid and a.ref == ? places the index constrained by ref = ? in the outer loop, and then if a.rowid exists in table b, returns a result row. This means that the number of times the outer loop is executed is dependent on the constraint ref == ? on the index. Since there can be no possible statistic for the a.rowid == b.rowid condition, table a (or rather the index) will and must always be in the outer loop, even if the constraint ref == ? selects all rows in table a and there is only one row in table b ... select rowid from a where ref == ? and rowid in (select rowid from b) however scans table b in the outer loop and then does an index lookup on (ref, rowid) into the index and returns a result row whenever it is found. This means that the number of times the outer loop is executed is dependent on the number of rows in table b (only). Since the QP can determine the number of rows in a where ref == 7 and the number of rows in b, the plan will probably be optimized by having statistics available. Do you have statistics available? Or do you just by happenstance have less rows in b than in a constrained by ref == ?. Do you have statistics? Have you run ANALYZE? --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of Poor Yorick >Sent: Wednesday, 17 April, 2019 08:48 >To: SQLite mailing list >Subject: Re: [sqlite] [EXTERNAL] slow join, fast subselect > >On Wed, Apr 17, 2019 at 01:24:11PM +, David Raymond wrote: >> Would you post what those explain query plans results are? All the >other replies not withstanding I'm still curious as to why #2 would >be faster (assuming "rowid" is indeed the actual rowid anyway) >> >> Also, is that a typo in #2, if you're not using b, why would you >include it in the from clause? Wouldn't that introduce a whole bunch >of duplicates? As in a copy of a.rowid for every single record in b? >(Maybe my brain just hasn't finished warming up this morning) >> >> #1 >> select a.rowid >> from a join b on a.rowid = b.rowid >> where a.ref = $x >> >> #2 >> select a.rowid >> from a,b >> where a.ref = $x and a.rowid in (select rowid from b) >> >> > >3 0 0 {SEARCH TABLE a USING COVERING INDEX idx_ref (ref=?)} >8 0 0 {SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?)} > >2 0 0 {SEARCH TABLE a USING COVERING INDEX idx_ref (ref=? AND >rowid=?)} >7 0 0 {USING ROWID SEARCH ON TABLE b FOR IN-OPERATOR} > >-- >Poor Yorick >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Use cases for sqlite3_value_frombind()?
What measures the trustworthiness? At what point would the running application be notified that the statement was bound or injection avenue? On Wed, Apr 17, 2019 at 12:40 PM Richard Hipp wrote: > On 4/17/19, Jens Alfke wrote: > > The new sqlite3_value_frombind() function sounds intriguing — "True if > value > > originated from a bound parameter > > — but I’m drawing a blank thinking of use cases for it. Optimizations? > > Security? What was the rationale for adding it? > > This facilities additional security measures. If a value comes from a > bind, then (at least in most systems) that means it did not come from > an SQL injection from an attacker, and hence the value is more > trustworthy. > -- > D. Richard Hipp > d...@sqlite.org > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Use cases for sqlite3_value_frombind()?
On 4/17/19, Jens Alfke wrote: > The new sqlite3_value_frombind() function sounds intriguing — "True if value > originated from a bound parameter > — but I’m drawing a blank thinking of use cases for it. Optimizations? > Security? What was the rationale for adding it? This facilities additional security measures. If a value comes from a bind, then (at least in most systems) that means it did not come from an SQL injection from an attacker, and hence the value is more trustworthy. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Use cases for sqlite3_value_frombind()?
The commit that added it ( https://www3.sqlite.org/cgi/src/info/b3f2c3205a28dc21) says that it's used to improve fts3_tokenizer(), so maybe look at the diffs of that function to see how it's used there to get an idea for what it's intended for? On Wed, Apr 17, 2019, 9:22 AM Jens Alfke wrote: > The new sqlite3_value_frombind() function sounds intriguing — "True if > value originated from a bound parameter > — but I’m drawing a blank thinking of use cases for it. Optimizations? > Security? What was the rationale for adding it? > > —Jens > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Use cases for sqlite3_value_frombind()?
The new sqlite3_value_frombind() function sounds intriguing — "True if value originated from a bound parameter — but I’m drawing a blank thinking of use cases for it. Optimizations? Security? What was the rationale for adding it? —Jens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables
On Wed, 17 Apr 2019 at 15:08, Hick Gunter wrote: > > SLEECT contents from textfiles where search_path = 'mypath' and > name_pattern IN ('*.txt','*.csv'); > > It's interesting, I implemented the mask and decided to give such a query a try (having the same cost adjust I explained in the first post). With newest versions and even older 3.15.1, it works ok with two cursors one after another (two xOpen calls) providing constraints with two masks. Version 3.8.3 visits constraints with optional parameters, but chooses a different route without filemask in constraint (so the tables outputs with a default mask). 3.6.10 both doesn't visit constraints with optional parameters and also gives results with default mask. Looks like similar to what you mentioned in your first reply. The usage of IN allows many expressive applications to such queries, but I should probably have in mind that sometimes it is safe to implement a data piece in a single entity so file mask may acquire something like comma-list (or semocolon) "*.txt;*.csv" Max ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] slow join, fast subselect
On Wed, Apr 17, 2019 at 01:24:11PM +, David Raymond wrote: > Would you post what those explain query plans results are? All the other > replies not withstanding I'm still curious as to why #2 would be faster > (assuming "rowid" is indeed the actual rowid anyway) > > Also, is that a typo in #2, if you're not using b, why would you include it > in the from clause? Wouldn't that introduce a whole bunch of duplicates? As > in a copy of a.rowid for every single record in b? (Maybe my brain just > hasn't finished warming up this morning) > > #1 > select a.rowid > from a join b on a.rowid = b.rowid > where a.ref = $x > > #2 > select a.rowid > from a,b > where a.ref = $x and a.rowid in (select rowid from b) > > 3 0 0 {SEARCH TABLE a USING COVERING INDEX idx_ref (ref=?)} 8 0 0 {SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?)} 2 0 0 {SEARCH TABLE a USING COVERING INDEX idx_ref (ref=? AND rowid=?)} 7 0 0 {USING ROWID SEARCH ON TABLE b FOR IN-OPERATOR} -- Poor Yorick ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] slow join, fast subselect
Second select is not correct. You should have "from a" only not "from a,b". R.Antloga On 17.04.2019 10:55, Poor Yorick wrote: I've used the following two test queries in a version of sqlite built against a recent checkout of trunk, and also another recent version of sqlite. a.ref is indexed. The subselect query is faster than the join query -- two orders of magnitude faster on a larger dataset. Is sqlite missing some easy optimisation opportunity here? select a.rowid from a join b on a.rowid = b.rowid where a.ref = $x select a.rowid from a,b where a.ref = $x and a.rowid in (select rowid from b) ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] slow join, fast subselect
Would you post what those explain query plans results are? All the other replies not withstanding I'm still curious as to why #2 would be faster (assuming "rowid" is indeed the actual rowid anyway) Also, is that a typo in #2, if you're not using b, why would you include it in the from clause? Wouldn't that introduce a whole bunch of duplicates? As in a copy of a.rowid for every single record in b? (Maybe my brain just hasn't finished warming up this morning) #1 select a.rowid from a join b on a.rowid = b.rowid where a.ref = $x #2 select a.rowid from a,b where a.ref = $x and a.rowid in (select rowid from b) -Original Message- From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Poor Yorick Sent: Wednesday, April 17, 2019 6:32 AM To: SQLite mailing list Subject: Re: [sqlite] [EXTERNAL] slow join, fast subselect On Wed, Apr 17, 2019 at 10:15:31AM +, Hick Gunter wrote: > Try EXPLAIN QUERY PLAN or even EXPLAIN to see what is going on > in each case. I already have, of course. The question is, how much effort would it be to get sqlite choose the better query plan in the "join" case as well? -- Poor Yorick ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables
On Wed, 17 Apr 2019 at 15:08, Hick Gunter wrote: > ... > > SELECT contents from textfiles( 'mypath', NULL, 0); > SELECT contents from textfiles where search_path = 'mypath' and > is_recursive = 1; > SLEECT contents from textfiles where search_path = 'mypath' and > name_pattern IN ('*.txt','*.csv'); > > With xBestIndex returning costs 1, 2, and 2/3 (with IN and without IN) > respectively. > > And you want to know how to make SQLite always call xFilter with > ('mypath','*.txt') and ('mypath','*.csv') instead of just once with only > ('mypath') and attempting to retrieve the undefined/empty name_pattern. Is > this correct? > > > Yes, you're correct. The name_pattern is also a good illustrative extension to the case. And indeed, IN operator is really a tricky case ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] slow join, fast subselect
Also, let me just add (in case it sounded different) - We definitely do not wish to document the "why an optimization opportunity might not be feasible" in any way that would discourage anyone from submitting such a possible optimization opportunity. That would work against the axiomatic premise of this forum, sqlite and open source communities in general. On 2019/04/17 1:23 PM, Poor Yorick wrote: ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] slow join, fast subselect
On 2019/04/17 1:23 PM, Poor Yorick wrote: That's an apt and accessible description of the issue, but at the denotational level the meanings of the queries are in fact identical under the conditions you enumerated. Ideally sqlite would notice and adjust its query plan accordingly. Ideally yes, but the heart of the matter, I would like to reiterate, is that the query in question has to be just so for the optimization to work, anything you change (using b.anything anywhere, having duplicate b values or in any way not using rowids or at least unique columns of some flavour, using a collation, etc.) would render the optimization opportunity void. The opportunity hangs by the tiniest of threads, or put another way: the decision tree + checks to arrive at this optimization opportunity is very long, and I doubt a similar query is encountered by the QP more than one in a few million times (with a strong possibility that that should read "in a few billion times") ever - and when it does come up, the programmer most probably would opt for the latter example since it more clearly describes the plot. (The fact that it is also faster is just lucky). If on the other hand a situation presents itself where the obvious better plot description (i.e. the second query) was /slower/ - then the optimization (to rather do it like the faster first query) would be a more worthy cause. The fact that we can engineer a query that happens to be equivalent mathematically and is slower does not sufficiently call for an effort to improve the planner. In fact, we can engineer many such queries, and these kinds of things come up quite regularly on the forum. To be sure, they often do get optimized IF the slower query is the one that is semantically more sensible and the effort to implement plus the added code-weight and cpu cycles are justified by the gain in the general case. If the cost of doing so doesn't justify the effort, that could be documented. As good as the sqlite documentation is, it currently lacks this sort of higher-level guidance. I agree, perhaps some general description might be useful, though it's hard to imagine it mentioning specific scenarios. Consider that the amount of semantically-different-but-functionally-equivalent-yet-slower queries that can be engineered must be legion (the forum produces new ones almost monthly). It's hard to fathom documenting all of them - plus some of them disappear with improvements over time. Maybe you could volunteer a paragraph of documentation that would have adequately satisfied your question in this regard and in general - the devs often do amend documentation based on suggestions here. Cheers, Ryan ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables
Ok so lets assume your xConnect function declares something similar to Create table textfiles( result_path text, result_name text, contents blob, search_path text hidden, name_pattern text hidden, is_recursive hidden); So that you can SELECT contents from textfiles( 'mypath', NULL, 0); SELECT contents from textfiles where search_path = 'mypath' and is_recursive = 1; SLEECT contents from textfiles where search_path = 'mypath' and name_pattern IN ('*.txt','*.csv'); With xBestIndex returning costs 1, 2, and 2/3 (with IN and without IN) respectively. And you want to know how to make SQLite always call xFilter with ('mypath','*.txt') and ('mypath','*.csv') instead of just once with only ('mypath') and attempting to retrieve the undefined/empty name_pattern. Is this correct? -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Max Vlasov Gesendet: Mittwoch, 17. April 2019 13:03 An: SQLite mailing list Betreff: Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables On Wed, 17 Apr 2019 at 12:52, Hick Gunter wrote: > Your xBestIndex function should be returning a cost that is > proportional to the "effort required to fulfill the query". My own VT > implementations have been returning the total number of records for > queries with no constraints and assuming a constant fan-out factor for > each key field provided as a constraint (eg. 25 for a full table scan, > 5 if only the first of two key fields is provided and 1 if both are provided). > I suspect that you talk more about tables that possess and outputs data regardless of "incoming" parameters involved. I understand that in this case it is mostly about performance-wise issues when the worst-case scenario is a time penalty but still valid results. I'm here more about table-valued tables when for some of them not providing incoming parameters means not valid data at all. One of my examples is a virtual table outputting text files from a folder. Folder is a required parameter, but there's also a binary flag regarding whether the scan processes subfolders. If the flag is provided, the implementation wants it to be used in the constraint chosen. Otherwise the intention from the query will not be transferred and the scan will be folder-recursive despite the fact that the flag disabling it appeared in the query. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0 May be privileged. May be confidential. Please delete if not the addressee. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] slow join, fast subselect
On Wed, Apr 17, 2019 at 11:36:18AM +0200, R Smith wrote: > On 2019/04/17 10:55 AM, Poor Yorick wrote: [SNIP] > > In your above example you really wish to know all the a's which have an > entry in b. The first query asks to join and list all b's found for every a > (which works mathematically in this case by virtue of rowid uniqueness, but > isn't the real question and forces a lot of "join" algorithm checking on the > QP), the correct question is the second query: Show every a which can also > be found in b. It releases the QP of a lot of responsibility and let's it > follow a plan that is much faster. > > > Hope that makes sense :) > > Ryan > That's an apt and accessible description of the issue, but at the denotational level the meanings of the queries are in fact identical under the conditions you enumerated. Ideally sqlite would notice and adjust its query plan accordingly. If the cost of doing so doesn't justify the effort, that could be documented. As good as the sqlite documentation is, it currently lacks this sort of higher-level guidance. -- Poor Yorick ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables
On Wed, 17 Apr 2019 at 12:52, Hick Gunter wrote: > Your xBestIndex function should be returning a cost that is proportional > to the "effort required to fulfill the query". My own VT implementations > have been returning the total number of records for queries with no > constraints and assuming a constant fan-out factor for each key field > provided as a constraint (eg. 25 for a full table scan, 5 if only the first > of two key fields is provided and 1 if both are provided). > I suspect that you talk more about tables that possess and outputs data regardless of "incoming" parameters involved. I understand that in this case it is mostly about performance-wise issues when the worst-case scenario is a time penalty but still valid results. I'm here more about table-valued tables when for some of them not providing incoming parameters means not valid data at all. One of my examples is a virtual table outputting text files from a folder. Folder is a required parameter, but there's also a binary flag regarding whether the scan processes subfolders. If the flag is provided, the implementation wants it to be used in the constraint chosen. Otherwise the intention from the query will not be transferred and the scan will be folder-recursive despite the fact that the flag disabling it appeared in the query. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Opening a DataBase file with a custom "user data" parameter
On 4/16/19, Esenthel wrote: > > I never got any reply for this You have to sign up for the mailing list in order to get email replies. If you do not sign up, you have to poll for responses on one of the archives, such as https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/thrd5.html#114065 -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] slow join, fast subselect
On Wed, Apr 17, 2019 at 10:15:31AM +, Hick Gunter wrote: > Try EXPLAIN QUERY PLAN or even EXPLAIN to see what is going on > in each case. I already have, of course. The question is, how much effort would it be to get sqlite choose the better query plan in the "join" case as well? -- Poor Yorick ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Opening a DataBase file with a custom "user data" parameter
Hi, I never got any reply for this On Thu, 21 Feb 2019 at 10:31, Esenthel wrote: > I'm trying to use "sqlite3_open_v2" with a custom VFS (file system), and > pass a pointer "void *user_data" to the sqlite3_open_v2 function, so it > gets passed down to the "sqlite3_vfs::xOpen" > However there's no option for that, so how to do that? > > The reason is that for opening files in the custom callback, I need to use > a pointer to some helper "cipher" class object, to allow my own > encryption/decryption. > > However xOpen accepts only "const char *zName". > > I have one custom global VFS, and I want to use different ciphers / user > data for each database I use. > > What would solve the problem: > add new "void *user_data" parameter to functions "sqlite3_open_v2", and > "sqlite3_vfs::xOpen". > > Thanks, > > Greg > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] slow join, fast subselect
Try EXPLAIN QUERY PLAN or even EXPLAIN to see what is going on in each case. -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Poor Yorick Gesendet: Mittwoch, 17. April 2019 10:56 An: sqlite-users@mailinglists.sqlite.org Betreff: [EXTERNAL] [sqlite] slow join, fast subselect I've used the following two test queries in a version of sqlite built against a recent checkout of trunk, and also another recent version of sqlite. a.ref is indexed. The subselect query is faster than the join query -- two orders of magnitude faster on a larger dataset. Is sqlite missing some easy optimisation opportunity here? select a.rowid from a join b on a.rowid = b.rowid where a.ref = $x select a.rowid from a,b where a.ref = $x and a.rowid in (select rowid from b) -- Poor Yorick ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0 May be privileged. May be confidential. Please delete if not the addressee. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables
Your xBestIndex function should be returning a cost that is proportional to the "effort required to fulfill the query". My own VT implementations have been returning the total number of records for queries with no constraints and assuming a constant fan-out factor for each key field provided as a constraint (eg. 25 for a full table scan, 5 if only the first of two key fields is provided and 1 if both are provided). Newer releases of SQLite handle IN expressions differently. Older releases simply created an indexed ephemeral table and an Affinity opcode. Newer releases call xBestIndex twice; once with each IN transformed into an equality constraint, and once with (all of) these constraints disabled. Depending on the cost, the non-indexed ephemeral table would be placed on either side of the JOIN operation. I found it was necessary to increase the cost by at least 2.5% to discourage SQLite from putting the ephemeral table on the LHS (outer loop) of the join. -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Max Vlasov Gesendet: Mittwoch, 17. April 2019 10:44 An: SQLite mailing list Betreff: [EXTERNAL] [sqlite] "Optional" incoming parameters for virtual tables Hi, I'm trying to implement "optional incoming" parameters for my virtual tables. It's when there are columns required, but also there are some fine-tuning columns that should be noticed when provided in the query and assumed some default if not. The system that I tried seems to work, but I would like an opinion whether it is future-proof. - When the required parameters are not provided, xBestIndex returns a large cost for earlier sqlite version and additionallly SQLITE_CONSTRAINT for 3.26.0 onward. - When the required parameters provided and all optional provided the cost is 1. - For every optional parameter not provided, the cost is increased by 1. So if there are 3 optional parameters and no provided, the cost is 4, if one optional provided - the cost is 2. Should this always work as expected or sqlite might not always choose the lowest cost index amongst the ones with small values? I tested it with a couple of vt implementations, but the number of columns in both required/optional pool are low and I expect some trouble when the number will go up. Thanks, Max ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0 May be privileged. May be confidential. Please delete if not the addressee. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] sqlite3_db_config() documentation issue
The sqlite3_db_config() function is a variable argument one, taking a database handle, a config option, and then what the documentation says is 'Subsequent arguments vary depending on the configuration verb.'. Aside from SQLITE_DBCONFIG_MAINDBNAME and SQLITE_DBCONFIG_LOOKASIDE, they all require an int and an int* for those arguments. The descriptions of SQLITE_DBCONFIG_RESET_DATABASE and SQLITE_DBCONFIG_DEFENSIVE don't mention the need for the second argument (As I just found out the hard way with the latter option). ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] slow join, fast subselect
On 2019/04/17 10:55 AM, Poor Yorick wrote: I've used the following two test queries in a version of sqlite built against a recent checkout of trunk, and also another recent version of sqlite. a.ref is indexed. The subselect query is faster than the join query -- two orders of magnitude faster on a larger dataset. Is sqlite missing some easy optimisation opportunity here? select a.rowid from a join b on a.rowid = b.rowid where a.ref = $x select a.rowid from a,b where a.ref = $x and a.rowid in (select rowid from b) These queries have vastly different meanings and as such must have vastly different execution plans. I won't go into that because it's obvious, so I will just mention the assumptions. For the query engine to optimize this according to your suggestion (i.e. for it to really do the second thing when you've asked for the first thing) it has to assume the following: - 1. You do not need any of the fields from b in any part of the query (which IS the case here and can be deduced, but is an extremely small likelihood for JOINs in general queries), - 2. there are no special Collations on either of the columns (again, this can be deduced here), and - 3. it must assume that every b.rowid appears ONLY once in table b (I know this can be assumed for rowid's, but is something that cannot be easily known for any other field, where it could be true with or without a unique constraint). And, these are only the obvious outsider-viewpoint assumptions, I have no insight in what other checks might be needed internally from the QP. This means that this optimization carries a cost deficit in the general case. i.e. Wasting CPU cycles "looking" for this optimization opportunity in the general case, considering the high unlikelihood of finding it viable, will slow down many more queries than it will help. That said, this is the very kind of thing where we as engineering programmers or database queryers could optimize by asking our questions right, mind you - I am not advocating thinking for the Query Planner, but do state your question optimally. If you really want to know how many rabbits have been eaten by foxes, do not ask for a list of all rabbit names alphabetically and exclude every one that was NOT eaten by a fox, simply ask for the count of rabbits where eaten by fox = true. In your above example you really wish to know all the a's which have an entry in b. The first query asks to join and list all b's found for every a (which works mathematically in this case by virtue of rowid uniqueness, but isn't the real question and forces a lot of "join" algorithm checking on the QP), the correct question is the second query: Show every a which can also be found in b. It releases the QP of a lot of responsibility and let's it follow a plan that is much faster. Hope that makes sense :) Ryan ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] slow join, fast subselect
I've used the following two test queries in a version of sqlite built against a recent checkout of trunk, and also another recent version of sqlite. a.ref is indexed. The subselect query is faster than the join query -- two orders of magnitude faster on a larger dataset. Is sqlite missing some easy optimisation opportunity here? select a.rowid from a join b on a.rowid = b.rowid where a.ref = $x select a.rowid from a,b where a.ref = $x and a.rowid in (select rowid from b) -- Poor Yorick ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] "Optional" incoming parameters for virtual tables
Hi, I'm trying to implement "optional incoming" parameters for my virtual tables. It's when there are columns required, but also there are some fine-tuning columns that should be noticed when provided in the query and assumed some default if not. The system that I tried seems to work, but I would like an opinion whether it is future-proof. - When the required parameters are not provided, xBestIndex returns a large cost for earlier sqlite version and additionallly SQLITE_CONSTRAINT for 3.26.0 onward. - When the required parameters provided and all optional provided the cost is 1. - For every optional parameter not provided, the cost is increased by 1. So if there are 3 optional parameters and no provided, the cost is 4, if one optional provided - the cost is 2. Should this always work as expected or sqlite might not always choose the lowest cost index amongst the ones with small values? I tested it with a couple of vt implementations, but the number of columns in both required/optional pool are low and I expect some trouble when the number will go up. Thanks, Max ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users