Re: [sqlite] Use cases for sqlite3_value_frombind()?

2019-04-17 Thread Richard Hipp
On 4/17/19, Peter da Silva  wrote:
>  whether it's a
> bound parameter or a constant in the query string doesn't tell you if it's
> from a trusted source or not.

How do you get an SQL injection vulnerability to call sqlite3_bind()?
Worse case is that the attacker can make use of an existing parameter
value that was already available in the application.  So, for example,
if the application were written in TCL, then the attacker could inject
a parameter that is the name of any global TCL variable and cause the
value of that TCL variable to be bound.  But probably the attacker
could not control the value of that TCL variable - he has to make due
with what is already there.

So if an application-defined SQL function requires that one or more of
its input parameters be from a parameter, that restricts what an
attacker can do with that function using an SQL injection.  It does
not completely prevent mischief, but it makes it more difficult.
Without the sqlite3_value_frombind() restriction, an attacker that can
do SQL injection can trivially inject whatever value he desires into
the function parameter. With the sqlite3_value_fromblob() restriction,
the attackers choices are quite a much more restricted, and may be an
empty set, depending on the system.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Use cases for sqlite3_value_frombind()?

2019-04-17 Thread Peter da Silva
I don't think I would use this as a security indicator, whether it's a
bound parameter or a constant in the query string doesn't tell you if it's
from a trusted source or not. This is more an indicator that this value is
likely to change in subsequent queries.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Use cases for sqlite3_value_frombind()?

2019-04-17 Thread Keith Medcalf

Besides which, of course, you example is useless.  The sqlite3_value_frombind 
would be used to find out if the parameter came from a binding or not as in, 
within the SomeFunction function, to determine if the parameter were a "bound" 
parameter or not.  Example:

select SomeFunction(?);
select SomeFunction(42);


Then the implementation of SomeFunction can test whether or not the argument it 
received was a "bound" parameter (the first case) or not (the second case).

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Keith Medcalf
>Sent: Wednesday, 17 April, 2019 12:38
>To: SQLite mailing list
>Subject: Re: [sqlite] Use cases for sqlite3_value_frombind()?
>
>
>Simon,
>
>There are fields (columns) in your invoices table named 1.23 and
>7524?  Why did you do this (or did you just use the wrong quotes
>around text strings?)
>
>
>---
>The fact that there's a Highway to Hell but only a Stairway to Heaven
>says a lot about anticipated traffic volume.
>
>
>>-Original Message-
>>From: sqlite-users [mailto:sqlite-users-
>>boun...@mailinglists.sqlite.org] On Behalf Of Simon Slavin
>>Sent: Wednesday, 17 April, 2019 12:22
>>To: SQLite mailing list
>>Subject: Re: [sqlite] Use cases for sqlite3_value_frombind()?
>>
>>On 17 Apr 2019, at 6:37pm, Stephen Chrzanowski 
>>wrote:
>>
>>> What measures the trustworthiness?  At what point would the
>running
>>> application be notified that the statement was bound or injection
>>avenue?
>>
>>You can include parameters as text in your SQL command:
>>
>>UPDATE invoices SET toBePaid="1.23" WHERE customerId="7524"
>>
>>If someone is attacking your server using SQL injection on a whole
>>statement, that's what they'd do.  And sqlite3_value_frombind()
>would
>>return FALSE.  Of course, to detect this the application does need
>to
>>call sqlite3_value_frombind() on each parameter it cares about.
>>___
>>sqlite-users mailing list
>>sqlite-users@mailinglists.sqlite.org
>>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Use cases for sqlite3_value_frombind()?

2019-04-17 Thread Simon Slavin
On 17 Apr 2019, at 7:37pm, Keith Medcalf  wrote:

> There are fields (columns) in your invoices table named 1.23 and 7524?  Why 
> did you do this (or did you just use the wrong quotes around text strings?)

I used the wrong quotes.  Sorry, I've been doing things in other languages 
recently.  Keith is right, those " signs should have been ' .
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Use cases for sqlite3_value_frombind()?

2019-04-17 Thread Keith Medcalf

Simon,

There are fields (columns) in your invoices table named 1.23 and 7524?  Why did 
you do this (or did you just use the wrong quotes around text strings?)


---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Simon Slavin
>Sent: Wednesday, 17 April, 2019 12:22
>To: SQLite mailing list
>Subject: Re: [sqlite] Use cases for sqlite3_value_frombind()?
>
>On 17 Apr 2019, at 6:37pm, Stephen Chrzanowski 
>wrote:
>
>> What measures the trustworthiness?  At what point would the running
>> application be notified that the statement was bound or injection
>avenue?
>
>You can include parameters as text in your SQL command:
>
>UPDATE invoices SET toBePaid="1.23" WHERE customerId="7524"
>
>If someone is attacking your server using SQL injection on a whole
>statement, that's what they'd do.  And sqlite3_value_frombind() would
>return FALSE.  Of course, to detect this the application does need to
>call sqlite3_value_frombind() on each parameter it cares about.
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Use cases for sqlite3_value_frombind()?

2019-04-17 Thread Stephen Chrzanowski
So it becomes the responsibility of the application to acknowledge when
binding is allowed or not.  Got it.  Clear now.  Thanks.

I was thinking small in this case.  My own app, I know what's being fed, I
already bind most things, but I absolutely can see the use of this.  Time
to see if I can add this particular function to my wrapper... .. one of
these days.

On Wed, Apr 17, 2019 at 2:22 PM Simon Slavin  wrote:

> On 17 Apr 2019, at 6:37pm, Stephen Chrzanowski 
> wrote:
>
> > What measures the trustworthiness?  At what point would the running
> > application be notified that the statement was bound or injection avenue?
>
> You can include parameters as text in your SQL command:
>
> UPDATE invoices SET toBePaid="1.23" WHERE customerId="7524"
>
> If someone is attacking your server using SQL injection on a whole
> statement, that's what they'd do.  And sqlite3_value_frombind() would
> return FALSE.  Of course, to detect this the application does need to call
> sqlite3_value_frombind() on each parameter it cares about.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Use cases for sqlite3_value_frombind()?

2019-04-17 Thread Simon Slavin
On 17 Apr 2019, at 6:37pm, Stephen Chrzanowski  wrote:

> What measures the trustworthiness?  At what point would the running
> application be notified that the statement was bound or injection avenue?

You can include parameters as text in your SQL command:

UPDATE invoices SET toBePaid="1.23" WHERE customerId="7524"

If someone is attacking your server using SQL injection on a whole statement, 
that's what they'd do.  And sqlite3_value_frombind() would return FALSE.  Of 
course, to detect this the application does need to call 
sqlite3_value_frombind() on each parameter it cares about.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] slow join, fast subselect

2019-04-17 Thread Keith Medcalf

Your made up plans are intriguing.  The plan you show for the latter query omit 
to join a and b.  Are you just making things up?

sqlite> select a.rowid from a, b where a.ref=7 and a.rowid in (select rowid 
from b);
QUERY PLAN
|--SEARCH TABLE a USING COVERING INDEX aa (ref=? AND rowid=?) (~8 rows)
|--USING ROWID SEARCH ON TABLE b FOR IN-OPERATOR
`--SCAN TABLE b (~1048576 rows)

However, assuming that you typo'd the query (please learn about this 
new-fangled thing called cut-n-paste to avoid that error in the future) you get 
this if you specify "FROM a" rather than "FROM a, b".

sqlite> select rowid from a where ref=7 and rowid in (select rowid from b);
QUERY PLAN
|--SEARCH TABLE a USING COVERING INDEX aa (ref=? AND rowid=?) (~8 rows)
`--USING ROWID SEARCH ON TABLE b FOR IN-OPERATOR

In any case, upon closer examination the VDBE code for the two queries "solves" 
the query quite differently:

select a.rowid from a, b where a.rowid == b.rowid and a.ref == ?

places the index constrained by ref = ? in the outer loop, and then if a.rowid 
exists in table b, returns a result row.  This means that the number of times 
the outer loop is executed is dependent on the constraint ref == ? on the 
index.  Since there can be no possible statistic for the a.rowid == b.rowid 
condition, table a (or rather the index) will and must always be in the outer 
loop, even if the constraint ref == ? selects all rows in table a and there is 
only one row in table b ...

select rowid from a where ref == ? and rowid in (select rowid from b)

however scans table b in the outer loop and then does an index lookup on (ref, 
rowid) into the index and returns a result row whenever it is found.  This 
means that the number of times the outer loop is executed is dependent on the 
number of rows in table b (only).  Since the QP can determine the number of 
rows in a where ref == 7 and the number of rows in b, the plan will probably be 
optimized by having statistics available.  Do you have statistics available?  
Or do you just by happenstance have less rows in b than in a constrained by ref 
== ?.  Do you have statistics?  Have you run ANALYZE?

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Poor Yorick
>Sent: Wednesday, 17 April, 2019 08:48
>To: SQLite mailing list
>Subject: Re: [sqlite] [EXTERNAL] slow join, fast subselect
>
>On Wed, Apr 17, 2019 at 01:24:11PM +, David Raymond wrote:
>> Would you post what those explain query plans results are? All the
>other replies not withstanding I'm still curious as to why #2 would
>be faster (assuming "rowid" is indeed the actual rowid anyway)
>>
>> Also, is that a typo in #2, if you're not using b, why would you
>include it in the from clause? Wouldn't that introduce a whole bunch
>of duplicates? As in a copy of a.rowid for every single record in b?
>(Maybe my brain just hasn't finished warming up this morning)
>>
>> #1
>> select a.rowid
>> from a join b on a.rowid = b.rowid
>> where a.ref = $x
>>
>> #2
>> select a.rowid
>> from a,b
>> where a.ref = $x and a.rowid in (select rowid from b)
>>
>>
>
>3 0 0 {SEARCH TABLE a USING COVERING INDEX idx_ref (ref=?)}
>8 0 0 {SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?)}
>
>2 0 0 {SEARCH TABLE a USING COVERING INDEX idx_ref (ref=? AND
>rowid=?)}
>7 0 0 {USING ROWID SEARCH ON TABLE b FOR IN-OPERATOR}
>
>--
>Poor Yorick
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Use cases for sqlite3_value_frombind()?

2019-04-17 Thread Stephen Chrzanowski
What measures the trustworthiness?  At what point would the running
application be notified that the statement was bound or injection avenue?

On Wed, Apr 17, 2019 at 12:40 PM Richard Hipp  wrote:

> On 4/17/19, Jens Alfke  wrote:
> > The new sqlite3_value_frombind() function sounds intriguing — "True if
> value
> > originated from a bound parameter
> >  — but I’m drawing a blank thinking of use cases for it. Optimizations?
> > Security? What was the rationale for adding it?
>
> This facilities additional security measures.  If a value comes from a
> bind, then (at least in most systems) that means it did not come from
> an SQL injection from an attacker, and hence the value is more
> trustworthy.
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Use cases for sqlite3_value_frombind()?

2019-04-17 Thread Richard Hipp
On 4/17/19, Jens Alfke  wrote:
> The new sqlite3_value_frombind() function sounds intriguing — "True if value
> originated from a bound parameter
>  — but I’m drawing a blank thinking of use cases for it. Optimizations?
> Security? What was the rationale for adding it?

This facilities additional security measures.  If a value comes from a
bind, then (at least in most systems) that means it did not come from
an SQL injection from an attacker, and hence the value is more
trustworthy.
-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Use cases for sqlite3_value_frombind()?

2019-04-17 Thread Shawn Wagner
The commit that added it (
https://www3.sqlite.org/cgi/src/info/b3f2c3205a28dc21) says that it's used
to improve fts3_tokenizer(), so maybe look at the diffs of that function to
see how it's used there to get an idea for what it's intended for?

On Wed, Apr 17, 2019, 9:22 AM Jens Alfke  wrote:

> The new sqlite3_value_frombind() function sounds intriguing — "True if
> value originated from a bound parameter
>  — but I’m drawing a blank thinking of use cases for it. Optimizations?
> Security? What was the rationale for adding it?
>
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Use cases for sqlite3_value_frombind()?

2019-04-17 Thread Jens Alfke
The new sqlite3_value_frombind() function sounds intriguing — "True if value 
originated from a bound parameter
 — but I’m drawing a blank thinking of use cases for it. Optimizations? 
Security? What was the rationale for adding it?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables

2019-04-17 Thread Max Vlasov
On Wed, 17 Apr 2019 at 15:08, Hick Gunter  wrote:

>
> SLEECT contents from textfiles where search_path = 'mypath' and
> name_pattern IN ('*.txt','*.csv');
>
>
It's interesting, I implemented the mask and decided to give such a query a
try (having the same cost adjust I explained in the first post). With
newest versions and even older 3.15.1, it works ok with two cursors one
after another (two xOpen calls) providing constraints with two masks.
Version 3.8.3 visits constraints with optional parameters, but chooses a
different route without filemask in constraint (so the tables outputs with
a default mask). 3.6.10 both doesn't visit constraints with optional
parameters and also gives results with default mask. Looks like similar to
what you mentioned in your first reply.

The usage of IN allows many expressive applications to such queries, but I
should probably have in mind that sometimes it is safe to implement a data
piece in a single entity so file mask may acquire something like comma-list
(or semocolon) "*.txt;*.csv"

Max
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] slow join, fast subselect

2019-04-17 Thread Poor Yorick
On Wed, Apr 17, 2019 at 01:24:11PM +, David Raymond wrote:
> Would you post what those explain query plans results are? All the other 
> replies not withstanding I'm still curious as to why #2 would be faster 
> (assuming "rowid" is indeed the actual rowid anyway)
> 
> Also, is that a typo in #2, if you're not using b, why would you include it 
> in the from clause? Wouldn't that introduce a whole bunch of duplicates? As 
> in a copy of a.rowid for every single record in b? (Maybe my brain just 
> hasn't finished warming up this morning)
> 
> #1
> select a.rowid
> from a join b on a.rowid = b.rowid
> where a.ref = $x 
> 
> #2
> select a.rowid
> from a,b
> where a.ref = $x and a.rowid in (select rowid from b)
> 
> 

3 0 0 {SEARCH TABLE a USING COVERING INDEX idx_ref (ref=?)}
8 0 0 {SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?)}

2 0 0 {SEARCH TABLE a USING COVERING INDEX idx_ref (ref=? AND rowid=?)}
7 0 0 {USING ROWID SEARCH ON TABLE b FOR IN-OPERATOR}

-- 
Poor Yorick
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] slow join, fast subselect

2019-04-17 Thread radovan5

Second select is not correct. You should have "from a" only not "from a,b".

R.Antloga

On 17.04.2019 10:55, Poor Yorick wrote:

I've used the following two test queries in a version of sqlite built against a
recent checkout of trunk, and also another recent version of sqlite.  a.ref is
indexed.  The subselect query is faster than the join query -- two orders of
magnitude faster on a larger dataset.  Is sqlite missing some easy optimisation
opportunity here?


select a.rowid
from a join b on a.rowid = b.rowid
where a.ref = $x


select a.rowid
from a,b
where a.ref = $x and a.rowid in (select rowid from b)




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] slow join, fast subselect

2019-04-17 Thread David Raymond
Would you post what those explain query plans results are? All the other 
replies not withstanding I'm still curious as to why #2 would be faster 
(assuming "rowid" is indeed the actual rowid anyway)

Also, is that a typo in #2, if you're not using b, why would you include it in 
the from clause? Wouldn't that introduce a whole bunch of duplicates? As in a 
copy of a.rowid for every single record in b? (Maybe my brain just hasn't 
finished warming up this morning)

#1
select a.rowid
from a join b on a.rowid = b.rowid
where a.ref = $x 

#2
select a.rowid
from a,b
where a.ref = $x and a.rowid in (select rowid from b)



-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of Poor Yorick
Sent: Wednesday, April 17, 2019 6:32 AM
To: SQLite mailing list
Subject: Re: [sqlite] [EXTERNAL] slow join, fast subselect


On Wed, Apr 17, 2019 at 10:15:31AM +, Hick Gunter wrote:
> Try EXPLAIN QUERY PLAN  or even EXPLAIN  to see what is going on 
> in each case.


I already have, of course.  The question is, how much effort would it be to get
sqlite choose the better query plan in the "join" case as well?

-- 
Poor Yorick
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables

2019-04-17 Thread Max Vlasov
On Wed, 17 Apr 2019 at 15:08, Hick Gunter  wrote:

> ...
>
> SELECT contents from textfiles( 'mypath', NULL, 0);
> SELECT contents from textfiles where search_path = 'mypath' and
> is_recursive = 1;
> SLEECT contents from textfiles where search_path = 'mypath' and
> name_pattern IN ('*.txt','*.csv');
>
> With xBestIndex returning costs 1, 2, and 2/3 (with IN and without IN)
> respectively.
>
> And you want to know how to make SQLite always call xFilter with
> ('mypath','*.txt') and ('mypath','*.csv') instead of just once with only
> ('mypath') and attempting to retrieve the undefined/empty name_pattern. Is
> this correct?
>
>
>
Yes, you're correct. The name_pattern is also a good illustrative extension
to the case. And indeed, IN operator is really a tricky case
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] slow join, fast subselect

2019-04-17 Thread R Smith
Also, let me just add (in case it sounded different) - We definitely do 
not wish to document the "why an optimization opportunity might not be 
feasible" in any way that would discourage anyone from submitting such a 
possible optimization opportunity. That would work against the axiomatic 
premise of this forum, sqlite and open source communities in general.



On 2019/04/17 1:23 PM, Poor Yorick wrote:





___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] slow join, fast subselect

2019-04-17 Thread R Smith

On 2019/04/17 1:23 PM, Poor Yorick wrote:


That's an apt and accessible description of the issue, but at the denotational
level the meanings of the queries are in fact identical under the conditions
you enumerated.  Ideally sqlite would notice and adjust its query plan
accordingly.


Ideally yes, but the heart of the matter, I would like to reiterate, is 
that the query in question has to be just so for the optimization to 
work, anything you change (using b.anything anywhere, having duplicate b 
values or in any way not using rowids or at least unique columns of some 
flavour, using a collation, etc.) would render the optimization 
opportunity void. The opportunity hangs by the tiniest of threads, or 
put another way: the decision tree + checks to arrive at this 
optimization opportunity is very long, and I doubt a similar query is 
encountered by the QP more than one in a few million times (with a 
strong possibility that that should read "in a few billion times") ever 
- and when it does come up, the programmer most probably would opt for 
the latter example since it more clearly describes the plot. (The fact 
that it is also faster is just lucky).


If on the other hand a situation presents itself where the obvious 
better plot description (i.e. the second query) was /slower/  - then the 
optimization (to rather do it like the faster first query) would be a 
more worthy cause.


The fact that we can engineer a query that happens to be equivalent 
mathematically and is slower does not sufficiently call for an effort to 
improve the planner. In fact, we can engineer many such queries, and 
these kinds of things come up quite regularly on the forum. To be sure, 
they often do get optimized IF the slower query is the one that is 
semantically more sensible and the effort to implement plus the added 
code-weight and cpu cycles are justified by the gain in the general case.




   If the cost of doing so doesn't justify the effort, that could be
documented.  As good as the sqlite documentation is, it currently lacks this
sort of higher-level guidance.


I agree, perhaps some general description might be useful, though it's 
hard to imagine it mentioning specific scenarios. Consider that the 
amount of semantically-different-but-functionally-equivalent-yet-slower 
queries that can be engineered must be legion (the forum produces new 
ones almost monthly). It's hard to fathom documenting all of them - plus 
some of them disappear with improvements over time.


Maybe you could volunteer a paragraph of documentation that would have 
adequately satisfied your question in this regard and in general - the 
devs often do amend documentation based on suggestions here.



Cheers,
Ryan

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables

2019-04-17 Thread Hick Gunter
Ok so lets assume your xConnect function declares something similar to

Create table textfiles( result_path text, result_name text, contents blob, 
search_path text hidden, name_pattern text hidden, is_recursive hidden);

So that you can

SELECT contents from textfiles( 'mypath', NULL, 0);
SELECT contents from textfiles where search_path = 'mypath' and is_recursive = 
1;
SLEECT contents from textfiles where search_path = 'mypath' and name_pattern IN 
('*.txt','*.csv');

With xBestIndex returning costs 1, 2, and 2/3 (with IN and without IN) 
respectively.

And you want to know how to make SQLite always call xFilter with 
('mypath','*.txt') and ('mypath','*.csv') instead of just once with only 
('mypath') and attempting to retrieve the undefined/empty name_pattern. Is this 
correct?

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Max Vlasov
Gesendet: Mittwoch, 17. April 2019 13:03
An: SQLite mailing list 
Betreff: Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual 
tables

On Wed, 17 Apr 2019 at 12:52, Hick Gunter  wrote:

> Your xBestIndex function should be returning a cost that is
> proportional to the "effort required to fulfill the query". My own VT
> implementations have been returning the total number of records for
> queries with no constraints and assuming a constant fan-out factor for
> each key field provided as a constraint (eg. 25 for a full table scan,
> 5 if only the first of two key fields is provided and 1 if both are provided).
>

I suspect that you talk more about tables that possess and outputs data 
regardless of "incoming" parameters involved. I understand that in this case it 
is mostly about performance-wise issues when the worst-case scenario is a time 
penalty but still valid results. I'm here more about table-valued tables when 
for some of them not providing incoming parameters means not valid data at all. 
One of my examples is a virtual table outputting text files from a folder. 
Folder is a required parameter, but there's also a binary flag regarding 
whether the scan processes subfolders.
If the flag is provided, the implementation wants it to be used in the 
constraint chosen. Otherwise the intention from the query will not be 
transferred and the scan will be folder-recursive despite the fact that the 
flag disabling it appeared in the query.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick | Software Engineer | Scientific Games International GmbH | 
Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 
1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] slow join, fast subselect

2019-04-17 Thread Poor Yorick
On Wed, Apr 17, 2019 at 11:36:18AM +0200, R Smith wrote:
> On 2019/04/17 10:55 AM, Poor Yorick wrote:
[SNIP]
> 
> In your above example you really wish to know all the a's which have an
> entry in b. The first query asks to join and list all b's found for every a
> (which works mathematically in this case by virtue of rowid uniqueness, but
> isn't the real question and forces a lot of "join" algorithm checking on the
> QP), the correct question is the second query: Show every a which can also
> be found in b. It releases the QP of a lot of responsibility and let's it
> follow a plan that is much faster.
> 
> 
> Hope that makes sense :)
> 
> Ryan
> 

That's an apt and accessible description of the issue, but at the denotational
level the meanings of the queries are in fact identical under the conditions
you enumerated.  Ideally sqlite would notice and adjust its query plan
accordingly.  If the cost of doing so doesn't justify the effort, that could be
documented.  As good as the sqlite documentation is, it currently lacks this
sort of higher-level guidance.

-- 
Poor Yorick
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables

2019-04-17 Thread Max Vlasov
On Wed, 17 Apr 2019 at 12:52, Hick Gunter  wrote:

> Your xBestIndex function should be returning a cost that is proportional
> to the "effort required to fulfill the query". My own VT implementations
> have been returning the total number of records for queries with no
> constraints and assuming a constant fan-out factor for each key field
> provided as a constraint (eg. 25 for a full table scan, 5 if only the first
> of two key fields is provided and 1 if both are provided).
>

I suspect that you talk more about tables that possess and outputs data
regardless of "incoming" parameters involved. I understand that in this
case it is mostly about performance-wise issues when the worst-case
scenario is a time penalty but still valid results. I'm here more about
table-valued tables when for some of them not providing incoming parameters
means not valid data at all. One of my examples is a virtual table
outputting text files from a folder. Folder is a required parameter, but
there's also a binary flag regarding whether the scan processes subfolders.
If the flag is provided, the implementation wants it to be used in the
constraint chosen. Otherwise the intention from the query will not be
transferred and the scan will be folder-recursive despite the fact that the
flag disabling it appeared in the query.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Opening a DataBase file with a custom "user data" parameter

2019-04-17 Thread Richard Hipp
On 4/16/19, Esenthel  wrote:
>
> I never got any reply for this

You have to sign up for the mailing list in order to get email
replies.  If you do not sign up, you have to poll for responses on one
of the archives, such as
https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/thrd5.html#114065

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] slow join, fast subselect

2019-04-17 Thread Poor Yorick

On Wed, Apr 17, 2019 at 10:15:31AM +, Hick Gunter wrote:
> Try EXPLAIN QUERY PLAN  or even EXPLAIN  to see what is going on 
> in each case.


I already have, of course.  The question is, how much effort would it be to get
sqlite choose the better query plan in the "join" case as well?

-- 
Poor Yorick
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Opening a DataBase file with a custom "user data" parameter

2019-04-17 Thread Esenthel
Hi,

I never got any reply for this

On Thu, 21 Feb 2019 at 10:31, Esenthel  wrote:

> I'm trying to use "sqlite3_open_v2" with a custom VFS (file system), and
> pass a pointer "void *user_data" to the sqlite3_open_v2 function, so it
> gets passed down to the "sqlite3_vfs::xOpen"
> However there's no option for that, so how to do that?
>
> The reason is that for opening files in the custom callback, I need to use
> a pointer to some helper "cipher" class object, to allow my own
> encryption/decryption.
>
> However xOpen accepts only "const char *zName".
>
> I have one custom global VFS, and I want to use different ciphers / user
> data for each database I use.
>
> What would solve the problem:
> add new "void *user_data" parameter to functions "sqlite3_open_v2", and
> "sqlite3_vfs::xOpen".
>
> Thanks,
>
> Greg
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] slow join, fast subselect

2019-04-17 Thread Hick Gunter
Try EXPLAIN QUERY PLAN  or even EXPLAIN  to see what is going on in 
each case.

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Poor Yorick
Gesendet: Mittwoch, 17. April 2019 10:56
An: sqlite-users@mailinglists.sqlite.org
Betreff: [EXTERNAL] [sqlite] slow join, fast subselect

I've used the following two test queries in a version of sqlite built against a 
recent checkout of trunk, and also another recent version of sqlite.  a.ref is 
indexed.  The subselect query is faster than the join query -- two orders of 
magnitude faster on a larger dataset.  Is sqlite missing some easy optimisation 
opportunity here?


select a.rowid
from a join b on a.rowid = b.rowid
where a.ref = $x


select a.rowid
from a,b
where a.ref = $x and a.rowid in (select rowid from b)


--
Poor Yorick

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick | Software Engineer | Scientific Games International GmbH | 
Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 
1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables

2019-04-17 Thread Hick Gunter
Your xBestIndex function should be returning a cost that is proportional to the 
"effort required to fulfill the query". My own VT implementations have been 
returning the total number of records for queries with no constraints and 
assuming a constant fan-out factor for each key field provided as a constraint 
(eg. 25 for a full table scan, 5 if only the first of two key fields is 
provided and 1 if both are provided).

Newer releases of SQLite handle IN expressions differently. Older releases 
simply created an indexed ephemeral table and an Affinity opcode. Newer 
releases call xBestIndex twice; once with each IN transformed into an equality 
constraint, and once with (all of) these constraints disabled. Depending on the 
cost, the non-indexed ephemeral table would be placed on either side of the 
JOIN operation. I found it was necessary to increase the cost by at least 2.5% 
to discourage SQLite from putting the ephemeral table on the LHS (outer loop) 
of the join.


-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Max Vlasov
Gesendet: Mittwoch, 17. April 2019 10:44
An: SQLite mailing list 
Betreff: [EXTERNAL] [sqlite] "Optional" incoming parameters for virtual tables

Hi,

I'm trying to implement "optional incoming" parameters for my virtual tables. 
It's when there are columns required, but also there are some fine-tuning 
columns that should be noticed when provided in the query and assumed some 
default if not. The system that I tried seems to work, but I would like an 
opinion whether it is future-proof.
- When the required parameters are not provided, xBestIndex returns a large 
cost for earlier sqlite version and additionallly SQLITE_CONSTRAINT for
3.26.0 onward.
- When the required parameters provided and all optional provided the cost is 1.
- For every optional parameter not provided, the cost is increased by 1. So if 
there are 3 optional parameters and no provided, the cost is 4, if one optional 
provided - the cost is 2.

Should this always work as expected or sqlite might not always choose the 
lowest cost index amongst the ones with small values? I tested it with a couple 
of vt implementations, but the  number of columns in both required/optional 
pool are low and I expect some trouble when the number will go up.

Thanks,

Max
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick | Software Engineer | Scientific Games International GmbH | 
Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 
1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] sqlite3_db_config() documentation issue

2019-04-17 Thread Shawn Wagner
The sqlite3_db_config() function is a variable argument one, taking a
database handle, a config option, and then what the documentation says
is 'Subsequent
arguments vary depending on the configuration verb.'.

Aside from SQLITE_DBCONFIG_MAINDBNAME and SQLITE_DBCONFIG_LOOKASIDE, they
all require an int and an int* for those arguments. The descriptions
of SQLITE_DBCONFIG_RESET_DATABASE
and SQLITE_DBCONFIG_DEFENSIVE don't mention the need for the second
argument (As I just found out the hard way with the latter option).
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] slow join, fast subselect

2019-04-17 Thread R Smith

On 2019/04/17 10:55 AM, Poor Yorick wrote:

I've used the following two test queries in a version of sqlite built against a
recent checkout of trunk, and also another recent version of sqlite.  a.ref is
indexed.  The subselect query is faster than the join query -- two orders of
magnitude faster on a larger dataset.  Is sqlite missing some easy optimisation
opportunity here?


select a.rowid
from a join b on a.rowid = b.rowid
where a.ref = $x


select a.rowid
from a,b
where a.ref = $x and a.rowid in (select rowid from b)



These queries have vastly different meanings and as such must have 
vastly different execution plans. I won't go into that because it's 
obvious, so I will just mention the assumptions.


For the query engine to optimize this according to your suggestion (i.e. 
for it to really do the second thing when you've asked for the first 
thing) it has to assume the following:


- 1. You do not need any of the fields from b in any part of the query 
(which IS the case here and can be deduced, but is an extremely small 
likelihood for JOINs in general queries),
- 2. there are no special Collations on either of the columns (again, 
this can be deduced here), and
- 3. it must assume that every b.rowid appears ONLY once in table b (I 
know this can be assumed for rowid's, but is something that cannot be 
easily known for any other field, where it could be true with or without 
a unique constraint).


And, these are only the obvious outsider-viewpoint assumptions, I have 
no insight in what other checks might be needed internally from the QP. 
This means that this optimization carries a cost deficit in the general 
case. i.e. Wasting CPU cycles "looking" for this optimization 
opportunity in the general case, considering the high unlikelihood of 
finding it viable, will slow down many more queries than it will help.


That said, this is the very kind of thing where we as engineering 
programmers or database queryers could optimize by asking our questions 
right, mind you - I am not advocating thinking for the Query Planner, 
but do state your question optimally. If you really want to know how 
many rabbits have been eaten by foxes, do not ask for a list of all 
rabbit names alphabetically and exclude every one that was NOT eaten by 
a fox, simply ask for the count of rabbits where eaten by fox = true.


In your above example you really wish to know all the a's which have an 
entry in b. The first query asks to join and list all b's found for 
every a (which works mathematically in this case by virtue of rowid 
uniqueness, but isn't the real question and forces a lot of "join" 
algorithm checking on the QP), the correct question is the second query: 
Show every a which can also be found in b. It releases the QP of a lot 
of responsibility and let's it follow a plan that is much faster.



Hope that makes sense :)

Ryan


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] slow join, fast subselect

2019-04-17 Thread Poor Yorick
I've used the following two test queries in a version of sqlite built against a
recent checkout of trunk, and also another recent version of sqlite.  a.ref is
indexed.  The subselect query is faster than the join query -- two orders of
magnitude faster on a larger dataset.  Is sqlite missing some easy optimisation
opportunity here?


select a.rowid
from a join b on a.rowid = b.rowid
where a.ref = $x 


select a.rowid
from a,b
where a.ref = $x and a.rowid in (select rowid from b)


-- 
Poor Yorick

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] "Optional" incoming parameters for virtual tables

2019-04-17 Thread Max Vlasov
Hi,

I'm trying to implement "optional incoming" parameters for my virtual
tables. It's when there are columns required, but also there are some
fine-tuning columns that should be noticed when provided in the query and
assumed some default if not. The system that I tried seems to work, but I
would like an opinion whether it is future-proof.
- When the required parameters are not provided, xBestIndex returns a large
cost for earlier sqlite version and additionallly SQLITE_CONSTRAINT for
3.26.0 onward.
- When the required parameters provided and all optional provided the cost
is 1.
- For every optional parameter not provided, the cost is increased by 1. So
if there are 3 optional parameters and no provided, the cost is 4, if one
optional provided - the cost is 2.

Should this always work as expected or sqlite might not always choose the
lowest cost index amongst the ones with small values? I tested it with a
couple of vt implementations, but the  number of columns in both
required/optional pool are low and I expect some trouble when the number
will go up.

Thanks,

Max
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users