Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables

2019-04-18 Thread Max Vlasov
On Thu, 18 Apr 2019 at 09:48, Hick Gunter  wrote:

> So the QP is attemopting to determine which cost product is less:
>
> My guess is it will probably choose the undesired, invalid plan. A linear
> cost penalty for desired but optional inputs is probably not going to cut
> it. In this case, the returned cost needs to be greater than 4 (more if
> there are more records in the IN set). I feel there needs to be a
> multiplicative penalty of at least 10.
>
>
>
As I mentioned in other post, the direct test with my current
implementation and IN operator gave a good result, but I understand this
can't be a rule. I see your points about cost adjustments so will give them
a try.

I also decided to look at the sources, estimatedCost as usable input only
mentioned once, it goes to rRun field being translated with
sqlite3LogEstFromDouble call. The LogEst value is nicely explained in the
sources (below is the compacted version of the comment in the sources)

Estimated quantities used for query planning are stored as 16-bit
> logarithms.  For quantity X, the value stored is 10*log2(X).  This  gives a
> possible range of values of approximately 1.0e986 to 1e-986. But the
> allowed values are "grainy".  Not every value is representable.  For
> example, quantities 16 and 17 are both represented by a LogEst of 40.
> However, since LogEst quantities are suppose to be estimates, not exact
> values, this imprecision is not a problem.
>  "LogEst" is short for "Logarithmic Estimate".
>  Examples:
>   1 -> 0  20 -> 43  1 -> 132
>   2 -> 10 25 -> 46  25000 -> 146
>   3 -> 16100 -> 66100 -> 199
>   4 -> 20   1000 -> 991048576 -> 200
>  10 -> 33   1024 -> 1004294967296 -> 320
>  The LogEst can be negative to indicate fractional values.
>  Examples:
> 0.5 -> -10   0.1 -> -330.0625 -> -40
>

It seems that as a general rule it's not good to have close estimatedCost
values since they will sometimes yield the same LogEst  values.
rRun evaluations in the sources are too complex to analyze

Max
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables

2019-04-18 Thread Hick Gunter
So the QP is attemopting to determine which cost product is less:

Outer loop IN table (2 records, assuming cost 2) * Inner loop VT (2) = 4

Outer loop VT (3) * inner loop IN lookup (1 record, assuming cost 1) = 3

My guess is it will probably choose the undesired, invalid plan. A linear cost 
penalty for desired but optional inputs is probably not going to cut it. In 
this case, the returned cost needs to be greater than 4 (more if there are more 
records in the IN set). I feel there needs to be a multiplicative penalty of at 
least 10.

Also, you could check the colUsed mask to check if the name_pattern field is 
referenced and react accordingly.

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Max Vlasov
Gesendet: Mittwoch, 17. April 2019 14:56
An: SQLite mailing list 
Betreff: Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual 
tables

On Wed, 17 Apr 2019 at 15:08, Hick Gunter  wrote:

> ...
>
> SELECT contents from textfiles( 'mypath', NULL, 0); SELECT contents
> from textfiles where search_path = 'mypath' and is_recursive = 1;
> SLEECT contents from textfiles where search_path = 'mypath' and
> name_pattern IN ('*.txt','*.csv');
>
> With xBestIndex returning costs 1, 2, and 2/3 (with IN and without IN)
> respectively.
>
> And you want to know how to make SQLite always call xFilter with
> ('mypath','*.txt') and ('mypath','*.csv') instead of just once with
> only
> ('mypath') and attempting to retrieve the undefined/empty
> name_pattern. Is this correct?
>
>
>
Yes, you're correct. The name_pattern is also a good illustrative extension to 
the case. And indeed, IN operator is really a tricky case 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick | Software Engineer | Scientific Games International GmbH | 
Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 
1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables

2019-04-17 Thread Max Vlasov
On Wed, 17 Apr 2019 at 15:08, Hick Gunter  wrote:

>
> SLEECT contents from textfiles where search_path = 'mypath' and
> name_pattern IN ('*.txt','*.csv');
>
>
It's interesting, I implemented the mask and decided to give such a query a
try (having the same cost adjust I explained in the first post). With
newest versions and even older 3.15.1, it works ok with two cursors one
after another (two xOpen calls) providing constraints with two masks.
Version 3.8.3 visits constraints with optional parameters, but chooses a
different route without filemask in constraint (so the tables outputs with
a default mask). 3.6.10 both doesn't visit constraints with optional
parameters and also gives results with default mask. Looks like similar to
what you mentioned in your first reply.

The usage of IN allows many expressive applications to such queries, but I
should probably have in mind that sometimes it is safe to implement a data
piece in a single entity so file mask may acquire something like comma-list
(or semocolon) "*.txt;*.csv"

Max
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables

2019-04-17 Thread Max Vlasov
On Wed, 17 Apr 2019 at 15:08, Hick Gunter  wrote:

> ...
>
> SELECT contents from textfiles( 'mypath', NULL, 0);
> SELECT contents from textfiles where search_path = 'mypath' and
> is_recursive = 1;
> SLEECT contents from textfiles where search_path = 'mypath' and
> name_pattern IN ('*.txt','*.csv');
>
> With xBestIndex returning costs 1, 2, and 2/3 (with IN and without IN)
> respectively.
>
> And you want to know how to make SQLite always call xFilter with
> ('mypath','*.txt') and ('mypath','*.csv') instead of just once with only
> ('mypath') and attempting to retrieve the undefined/empty name_pattern. Is
> this correct?
>
>
>
Yes, you're correct. The name_pattern is also a good illustrative extension
to the case. And indeed, IN operator is really a tricky case
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables

2019-04-17 Thread Hick Gunter
Ok so lets assume your xConnect function declares something similar to

Create table textfiles( result_path text, result_name text, contents blob, 
search_path text hidden, name_pattern text hidden, is_recursive hidden);

So that you can

SELECT contents from textfiles( 'mypath', NULL, 0);
SELECT contents from textfiles where search_path = 'mypath' and is_recursive = 
1;
SLEECT contents from textfiles where search_path = 'mypath' and name_pattern IN 
('*.txt','*.csv');

With xBestIndex returning costs 1, 2, and 2/3 (with IN and without IN) 
respectively.

And you want to know how to make SQLite always call xFilter with 
('mypath','*.txt') and ('mypath','*.csv') instead of just once with only 
('mypath') and attempting to retrieve the undefined/empty name_pattern. Is this 
correct?

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Max Vlasov
Gesendet: Mittwoch, 17. April 2019 13:03
An: SQLite mailing list 
Betreff: Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual 
tables

On Wed, 17 Apr 2019 at 12:52, Hick Gunter  wrote:

> Your xBestIndex function should be returning a cost that is
> proportional to the "effort required to fulfill the query". My own VT
> implementations have been returning the total number of records for
> queries with no constraints and assuming a constant fan-out factor for
> each key field provided as a constraint (eg. 25 for a full table scan,
> 5 if only the first of two key fields is provided and 1 if both are provided).
>

I suspect that you talk more about tables that possess and outputs data 
regardless of "incoming" parameters involved. I understand that in this case it 
is mostly about performance-wise issues when the worst-case scenario is a time 
penalty but still valid results. I'm here more about table-valued tables when 
for some of them not providing incoming parameters means not valid data at all. 
One of my examples is a virtual table outputting text files from a folder. 
Folder is a required parameter, but there's also a binary flag regarding 
whether the scan processes subfolders.
If the flag is provided, the implementation wants it to be used in the 
constraint chosen. Otherwise the intention from the query will not be 
transferred and the scan will be folder-recursive despite the fact that the 
flag disabling it appeared in the query.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick | Software Engineer | Scientific Games International GmbH | 
Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 
1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables

2019-04-17 Thread Max Vlasov
On Wed, 17 Apr 2019 at 12:52, Hick Gunter  wrote:

> Your xBestIndex function should be returning a cost that is proportional
> to the "effort required to fulfill the query". My own VT implementations
> have been returning the total number of records for queries with no
> constraints and assuming a constant fan-out factor for each key field
> provided as a constraint (eg. 25 for a full table scan, 5 if only the first
> of two key fields is provided and 1 if both are provided).
>

I suspect that you talk more about tables that possess and outputs data
regardless of "incoming" parameters involved. I understand that in this
case it is mostly about performance-wise issues when the worst-case
scenario is a time penalty but still valid results. I'm here more about
table-valued tables when for some of them not providing incoming parameters
means not valid data at all. One of my examples is a virtual table
outputting text files from a folder. Folder is a required parameter, but
there's also a binary flag regarding whether the scan processes subfolders.
If the flag is provided, the implementation wants it to be used in the
constraint chosen. Otherwise the intention from the query will not be
transferred and the scan will be folder-recursive despite the fact that the
flag disabling it appeared in the query.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] "Optional" incoming parameters for virtual tables

2019-04-17 Thread Hick Gunter
Your xBestIndex function should be returning a cost that is proportional to the 
"effort required to fulfill the query". My own VT implementations have been 
returning the total number of records for queries with no constraints and 
assuming a constant fan-out factor for each key field provided as a constraint 
(eg. 25 for a full table scan, 5 if only the first of two key fields is 
provided and 1 if both are provided).

Newer releases of SQLite handle IN expressions differently. Older releases 
simply created an indexed ephemeral table and an Affinity opcode. Newer 
releases call xBestIndex twice; once with each IN transformed into an equality 
constraint, and once with (all of) these constraints disabled. Depending on the 
cost, the non-indexed ephemeral table would be placed on either side of the 
JOIN operation. I found it was necessary to increase the cost by at least 2.5% 
to discourage SQLite from putting the ephemeral table on the LHS (outer loop) 
of the join.


-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Max Vlasov
Gesendet: Mittwoch, 17. April 2019 10:44
An: SQLite mailing list 
Betreff: [EXTERNAL] [sqlite] "Optional" incoming parameters for virtual tables

Hi,

I'm trying to implement "optional incoming" parameters for my virtual tables. 
It's when there are columns required, but also there are some fine-tuning 
columns that should be noticed when provided in the query and assumed some 
default if not. The system that I tried seems to work, but I would like an 
opinion whether it is future-proof.
- When the required parameters are not provided, xBestIndex returns a large 
cost for earlier sqlite version and additionallly SQLITE_CONSTRAINT for
3.26.0 onward.
- When the required parameters provided and all optional provided the cost is 1.
- For every optional parameter not provided, the cost is increased by 1. So if 
there are 3 optional parameters and no provided, the cost is 4, if one optional 
provided - the cost is 2.

Should this always work as expected or sqlite might not always choose the 
lowest cost index amongst the ones with small values? I tested it with a couple 
of vt implementations, but the  number of columns in both required/optional 
pool are low and I expect some trouble when the number will go up.

Thanks,

Max
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick | Software Engineer | Scientific Games International GmbH | 
Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 
1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users