Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-12-09 Thread Petite Abeille

On Dec 9, 2013, at 8:01 PM, Warren Young  wrote:

>> I remember reading an essay by a user of controlled substances
> 
> Your next reading assignment is a book[3] on a functional programming 
> language,

So… do you make your functions wear a purity ring? To keep them, hmmm, chaste?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-12-09 Thread Warren Young

On 12/7/2013 12:53, James K. Lowden wrote:

On Thu, 05 Dec 2013 17:52:47 -0700
Warren Young  wrote:


To prove my point, I decided to divide the SQLite commands[1] into
those that modify the DB and those that do not:


Oh, let me help you out here: these aren't functions.


I was careful to call them commands, and to treat what SQL calls 
"functions" separately.


Nevertheless, I think you're trying to draw a dictionary-based line here 
instead of looking at fundamental concepts.  A proper mathematical 
function takes N arguments and returns a single constant result for 
those arguments.  If you have a static SQLite DB file, any SELECT 
statement against it involving only tables and the pure SQL functions 
qualifies as a pure function itself.


The point of this exercise was to dig down to this conceptual level, 
bypassing the fuzzy terminology.


SQL is a mixed bag of true functions and non-functional [1] elements. 
This thread is about one confusion that can result when these two 
aspects of SQL intermix[2] in unexpected ways.



I remember reading an essay by a user of controlled substances


Your next reading assignment is a book[3] on a functional programming 
language, preferably one with immutable-by-default values.  Haskell is 
the current hotness, but Erlang would work just as well.


There are less pure FP languages that can teach the same lessons, if you 
diligently avoid the impure bits: the ML family[4], Scala, Scheme...


The rest of your post I answered indirectly in my reply to your other 
message in this thread.





Footnotes:

[1] In the mathematical sense.  I.e. not meaning "broken".

[2] e.g. "SELECT ... date('now')"

[3] Free online FP books:

http://learnyouahaskell.com/chapters
http://learnyousomeerlang.com/content
http://ocaml.org/learn/books.html
https://en.wikibooks.org/wiki/F_Sharp_Programming
http://www.scala-lang.org/documentation/books.html
http://www.scheme.com/tspl4/
https://mitpress.mit.edu/sicp/

[4] OCaml and F# are the most-used flavors of ML in practice currently. 
 Academia still has a lot of Standard ML holdouts.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-12-09 Thread Warren Young

On 12/7/2013 12:15, James K. Lowden wrote:

On Wed, 04 Dec 2013 12:04:07 -0700
Warren Young  wrote:


Determinism is a property of a function; there is no such
thing as a function that is sometimes deterministic and sometimes
not.


databases are about as far from side-effect-free as you can get.


I'm not sure what you're referring to.


I think your sense of the term "side effect" comes from the everyday 
use, which is most influenced by medical side effects.  i.e., something 
bad and unintended.


The term means something rather different in CS:

https://en.wikipedia.org/wiki/Side_effect_%28computer_science%29

Specifically here, I mean that most SQL statements other than SELECT 
modify global state: the SQLite DB file.  Any statement that modifies 
the DB file has the potential to change the result from *any* SQL 
statement, including SELECT.


Example:

SELECT * FROM foo WHERE id=42;
UPDATE foo SET bar='qux' where id=42;
SELECT * FROM foo WHERE id=42;

The first and third statements return different results, even though 
they are side effect free, because UPDATE is not side effect free.


Consider also that the UPDATE could come from another process, at an 
indeterminate time.  This is why concerns over side effects -- in the CS 
sense -- matter.


SQLite offers many ways to *control* this indeterminacy, features 
generally grouped under the acronym ACID, but you can't say "DBMS X is 
ACID compliant therefore it will never surprise me with unexpected results."

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-12-07 Thread James K. Lowden
On Thu, 05 Dec 2013 17:52:47 -0700
Warren Young  wrote:

> To prove my point, I decided to divide the SQLite commands[1] into
> those that modify the DB and those that do not:
> 
>  Has side effects Limited side effects   No side effects
>  ~~      ~~~
>  ALTER TABLE  ATTACH DATABASEANALYZE
>  CREATE TRIGGER   CREATE TABLE   CREATE INDEX
>  DELETE   CREATE VIEWDROP INDEX
>  DETACH DATABASE[2]   CREATE VIRTUAL TABLE   EXPLAIN
>  DROP TABLE   DROP TRIGGER[3]PRAGMA
>  DROP VIEW   REINDEX
>  INSERT  SELECT
>  REPLACE VACUUM
>  UPDATE

Oh, let me help you out here: these aren't functions.  I suppose you
could think of them as functions, in the sense that strcpy(3) and 
unlink(2) are functions.  But they're more commonly called verbs or
commands or operations, components of a query or statement.  

I remember reading an essay by a user of controlled substances in which
he observed that when we speak of drugs having effects and side-effects
the distinction is wholly subjective: the side-effect is an *effect*,
just not the desired one.  

ISTM that e.g. INSERT doesn't have a "side effect", but rather just an
effect, to the extent it does what it says on the tin.  If it also
prints a message saying your father smells of elderberries, that would
be a side effect, I would think.  

> Commands in the first column clearly have side effects: they can
> affect the results of another process's SELECT queries.

That's not supposed to be the case.  An INSERT, say, might change the
results of the same SELECT query executed before and after it, but it's
not supposed to affect a SELECT while it's in progress.  

--jkl

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-12-07 Thread James K. Lowden
On Wed, 04 Dec 2013 12:04:07 -0700
Warren Young  wrote:

> > Determinism is a property of a function; there is no such
> > thing as a function that is sometimes deterministic and sometimes
> > not.
> 
> databases are about as far from side-effect-free as you can get.

I'm not sure what you're referring to.  The functions I'm thinking of
are things like COUNT and DATE.  One is deterministic, the other not.
Their determinism is not affected by how they are used.  

Unless you have an example of a function whose determinism is, um,
syntatically determined, why introduce syntax around it?  

Regarding side-effects, every standard SQL function is side-effect
free.  True, in SQLite a user-defined function could send flowers on
Valentine's day.  But at that point it's surely wandered off the
reservation.  

> Anyway, all this arguing over how SQLite *should* behave seems
> misguided to me.  What matters is how SQL is specified.  SQLite
> should follow the spec in areas like this.

Yes.  Better to color inside the lines.  

--jkl


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-12-05 Thread Warren Young

On 12/5/2013 14:45, Klaas V wrote:

Warren wrote 4 dec 2013:

| There are tiny corners of the programming world (FP) where this is not the 
case, but then you get  into
|   questions of purity, and databases are about as far from side-effect-free 
as you can get.

That's a wee bit exaggerated,


To prove my point, I decided to divide the SQLite commands[1] into those 
that modify the DB and those that do not:


Has side effects Limited side effects   No side effects
~~      ~~~
ALTER TABLE  ATTACH DATABASEANALYZE
CREATE TRIGGER   CREATE TABLE   CREATE INDEX
DELETE   CREATE VIEWDROP INDEX
DETACH DATABASE[2]   CREATE VIRTUAL TABLE   EXPLAIN
DROP TABLE   DROP TRIGGER[3]PRAGMA
DROP VIEW   REINDEX
INSERT  SELECT
REPLACE VACUUM
UPDATE

Commands in the first column clearly have side effects: they can affect 
the results of another process's SELECT queries.


The second column is for commands that are unlikely to affect another 
process's queries, but they do alter the user-visible DB schema, so it's 
hard to argue that they're side-effect-free.


The third column looks longer than it really is.  You can look at it as 
SELECT plus a bunch of "DBA commands."  Those latter commands merely 
affect how well the SQLite DB engine runs, and they're unlikely to be 
used very often once the DB is set up and working well.


If you strip away the setup, DBA, and rarely-used commands from the 
other columns, too, you're still left with SELECT on one side vs about 
half a dozen SQL commands with side effects on the other.  That's what I 
was getting at with my quoted comment above.


We should also consider SQLite's "functions".

Most of the core functions[4] are pure, but there are several impure 
ones: changes(), last_insert_rowid(), random(), randomblob(), and 
total_changes().


The date and time functions[5] are pure, unless you pass 'now', which is 
what all the argument here is about, of course.


The aggregate functions[6] are all pure.

I think my point stands: SQL is awfully impure, from an FP/mathematical 
standpoint.






-

Footnotes:

[1] From https://www.sqlite.org/lang.html but leaving out the keywords 
that aren't independent commands.  I also left out the TRANSACTION and 
SAVEPOINT commands, since they just group other commands.


[2] I put the DETACH and DROP commands in the first column even though 
their corresponding ATTACH and CREATE commands are in the second because 
there is nothing stopping these destructive commands from affecting 
another process's queries.


[3] DROP TRIGGER is interesting: From the perspective of figuring out 
whether it could affect another process through side effects, this 
command actually /stops/ future side effects from occurring, even though 
it modifies the DB file to do so.  Thus, it goes into the second column, 
not the first, where all the other DROP commands are.


[4] https://www.sqlite.org/lang_corefunc.html

[5] https://www.sqlite.org/lang_datefunc.html

[6] https://www.sqlite.org/lang_aggfunc.html
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-12-05 Thread Klaas V
Warren wrote 4 dec 2013:

| There are tiny corners of the programming world (FP) where this is not the 
case, but then you get  into 
|   questions of purity, and databases are about as far from side-effect-free 
as you can get.

That's a wee bit exaggerated, but who am I to start a discussion; it's a valid, 
even respectable opinion

and
| Anyway, all this arguing over how SQLite *should* behave seems misguided to 
me.  What 
|   matters is how SQL is specified.  SQLite should follow the spec in areas 
like this.

Not even this me seems. The developers of SQLite can choose not to "obey" 
rules, advices and guidelines of SQL standard at will if one of the three guys, 
one of us users feels it's appropriate and fits their philosophy better or for 
whatever reason even rewrite (some of) the standard and create a new or adapted 
version = skipping, changibg adding things =  let's call it SQL13 (14 etc) or a 
less prozaic name. What about iSQL or SQLNT?  ;-) 

Klaas `Z4us` V

The fun with standards is there are so many to choose from
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-12-04 Thread Warren Young

On 12/3/2013 17:29, James K. Lowden wrote:


Determinism is a property of a function; there is no such
thing as a function that is sometimes deterministic and sometimes not.


Unless you're new to this computing thing, you must have noticed that 
software developers almost never mean the same thing as mathematicians 
when we use the word "function".


There are tiny corners of the programming world (FP) where this is not 
the case, but then you get into questions of purity, and databases are 
about as far from side-effect-free as you can get.


Anyway, all this arguing over how SQLite *should* behave seems misguided 
to me.  What matters is how SQL is specified.  SQLite should follow the 
spec in areas like this.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-12-03 Thread James K. Lowden
On Tue, 3 Dec 2013 11:29:03 -0800
Scott Hess  wrote:

> Probably we should add a new API that allows the application
> > to state auxiliary properties about application-defined functions
> > (such as whether or not it is "constant", whether or not it can
> > return NULL, whether or not it might change the encoding of its
> > input parameters, etc.)  But that has not been done yet.
> 
> 
> IMHO, the default should be that a given function with given
> parameters should be assumed to return a single value which is good
> across the entire statement [...].
> 
> That said, I'm a little nervous about having that be an attribute of
> the function rather than the statement.  

Why nervous?  Determinism is a property of a function; there is no such
thing as a function that is sometimes deterministic and sometimes not.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-27 Thread Luuk

On 27-11-2013 03:55, James K. Lowden wrote:

On Mon, 25 Nov 2013 19:44:15 +0100
Luuk  wrote:


On 25-11-2013 13:41, Simon Slavin wrote:

I'm wondering whether there's an argument that it should be
evaluated just once for a transaction.


I'm still thinking about this question,

i can think of some benefits if its evaluated just once per
transaction.

but i hope anyone has some examples why its better to evalutate it
once per statement.


The easy answer is: that's how SQL is defined.

A more complicated answer is that, inside a user-defined transaction,
you have control over the meaning of "now".  You can insert "now" into
a table and re-use it as often as you like, without fear of it being
changed by another process.  If, on the other hand, you'd like to
execute several statements in a transaction, you might also like to
know when "now" is for each one.  The per-statement definition of "now"
supports that use.

There's actually nothing special about "now".  Other functions, e.g.
changes(), may vary between statements in a transaction.  (It happens
SQLite doesn't have many such.)  You really just want to preserve the
function's defintion as "returns correct output each time it's
called".

HTH.



ok, thanks for the answer.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-26 Thread James K. Lowden
On Mon, 25 Nov 2013 19:44:15 +0100
Luuk  wrote:

> On 25-11-2013 13:41, Simon Slavin wrote:
> > I'm wondering whether there's an argument that it should be
> > evaluated just once for a transaction.
> 
> I'm still thinking about this question,
> 
> i can think of some benefits if its evaluated just once per
> transaction.
> 
> but i hope anyone has some examples why its better to evalutate it
> once per statement.

The easy answer is: that's how SQL is defined.  

A more complicated answer is that, inside a user-defined transaction,
you have control over the meaning of "now".  You can insert "now" into
a table and re-use it as often as you like, without fear of it being
changed by another process.  If, on the other hand, you'd like to
execute several statements in a transaction, you might also like to
know when "now" is for each one.  The per-statement definition of "now"
supports that use.  

There's actually nothing special about "now".  Other functions, e.g.
changes(), may vary between statements in a transaction.  (It happens
SQLite doesn't have many such.)  You really just want to preserve the
function's defintion as "returns correct output each time it's
called".  

HTH.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-25 Thread Luuk

On 25-11-2013 13:41, Simon Slavin wrote:

I'm wondering whether there's an argument that it should be evaluated just once 
for a transaction.


I'm still thinking about this question,

i can think of some benefits if its evaluated just once per transaction.

but i hope anyone has some examples why its better to evalutate it once 
per statement.



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-25 Thread Nico Williams
If a function is deterministic for then it can be executed once and
its result can be memoized for the rest of the statement (or
transaction even).

If a function is idempotent for a statement then it can be executed
once per-statement, and its result(s) can be memoized and reused for
the life of that statement.

If a function is neither idempotent (even if it is deterministic in
the sense of not being random, for example, a monotonically increasing
counter) nor deterministic (e.g., a true random()) then it should be
called exactly as many times as the statement seems to imply, no more
and no fewer (e.g., once per-row of a correlated sub-query, ...).

Can idempotence and determinism be considered synonymous for the
purposes of an RDBMS engine?  I think not quite: deterministic should
imply that a memoization cache can be used always, across many
statements and even transactions, whereas idempotence might imply only
that memoization is permitted (but not required) on a per-statement
basis.

Another desirable attribute might be whether the function is fast or
slow: for fast enough deterministic functions there will be no point
in having a large memoization cache, or even any memoization cache.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-25 Thread Simon Slavin

On 25 Nov 2013, at 11:58am, Luuk  wrote:

> On 24-11-2013 19:36, Petite Abeille wrote:
>> 
>> On Nov 24, 2013, at 7:10 PM, Valentin Davydov  wrote:
>> 
>>> Wait a second... and you'll get different value of datetime('now'). In this
>>> sense datetime() is as deterministic as random(): it may give the same 
>>> result
>>> next invocation or may not, dependng on various circumstances not related to
>>> the function itself.
>> 
>> Nah… in SQL… ‘now’ is supposed to be computed once per statement. And that’s 
>> all.
> 
> Then 3.7.11, on windows is wrong, luckily this works OK on 3.8.1

Yes.  This characteristic ('now' is evaluated once per statement) was discussed 
and corrected recently.  I too was surprise that SQL was specific about it 
being evaluated just once for a statement.  Though now I know that that, I'm 
wondering whether there's an argument that it should be evaluated just once for 
a transaction.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-25 Thread Luuk

On 24-11-2013 19:36, Petite Abeille wrote:


On Nov 24, 2013, at 7:10 PM, Valentin Davydov  wrote:


Wait a second... and you'll get different value of datetime('now'). In this
sense datetime() is as deterministic as random(): it may give the same result
next invocation or may not, dependng on various circumstances not related to
the function itself.


Nah… in SQL… ‘now’ is supposed to be computed once per statement. And that’s 
all.




Then 3.7.11, on windows is wrong, luckily this works OK on 3.8.1

C:\temp>\util\sqlite3 test.sqlite3
SQLite version 3.7.11 2012-03-20 11:35:50
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> update test set d=datetime('now');
sqlite> select d, count(*) from test group by d;
2013-11-25 11:53:40|94941
2013-11-25 11:53:41|164850
2013-11-25 11:53:42|152478
2013-11-25 11:53:43|156193
2013-11-25 11:53:44|160673
2013-11-25 11:53:45|172547
2013-11-25 11:53:46|163959
2013-11-25 11:53:47|166997
2013-11-25 11:53:48|170299
2013-11-25 11:53:49|170700
2013-11-25 11:53:50|173197
2013-11-25 11:53:51|171121
2013-11-25 11:53:52|113143
sqlite> .quit

C:\temp>sqlite3 test.sqlite3
SQLite version 3.8.1 2013-10-17 12:57:35
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> update test set d=datetime('now');
sqlite> select d, count(*) from test group by d;
2013-11-25 11:56:22|2031098
sqlite>


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-24 Thread James K. Lowden
On Sun, 24 Nov 2013 16:53:01 +0200
RSmith  wrote:

> Similarly if one was to add a function which returns a date-dependant
> value, such as 'dayOfWeek(x)' and mark it as deterministic for inside
> a single query, that would make sense, even though it would be very
> indeterministic (or referentially opaque, if you will) between
> queries.  This is all dandy unless you have queries (as seen on this
> forum) that run for three days or more... would it matter then?

A SQL statement is atomic irrespective of how long it runs.  Your 
wday()  function would return to the day of the week as of the time the
statement was evaluated, even if it took a month of Sundays to
execute.  ;-)

Another way to think about it: a deterministic function always returns
the same output for a given input.  In an SQL statement, a function
is provided input only once, regardless of how long it runs.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-24 Thread James K. Lowden
On Sun, 24 Nov 2013 09:17:20 -0500
Doug Currie  wrote:

> in computer science we have referential transparency
> 
> http://en.wikipedia.org/wiki/Referential_transparency_
> (computer_science)
> 
> and pure functions
> 
> http://en.wikipedia.org/wiki/Pure_function

https://en.wikipedia.org/wiki/Deterministic_algorithm

In computer science we also have deterministic and nondeterministic.
Those are also IIRC the terms used in the SQL standard.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-24 Thread Petite Abeille

On Nov 24, 2013, at 7:10 PM, Valentin Davydov  wrote:

> Wait a second... and you'll get different value of datetime('now'). In this 
> sense datetime() is as deterministic as random(): it may give the same result
> next invocation or may not, dependng on various circumstances not related to
> the function itself.

Nah… in SQL… ‘now’ is supposed to be computed once per statement. And that’s 
all.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-24 Thread Valentin Davydov
On Sat, Nov 23, 2013 at 06:18:29AM -0500, Richard Hipp wrote:
> On Sat, Nov 23, 2013 at 5:26 AM, Pepijn Van Eeckhoudt <
> pep...@vaneeckhoudt.net> wrote:
> 
> > Is datetime special in thuis context or will constant expression hoisting
> > like this happen for any function?
> 
> SQLite must know that the function always gives the same output given the
> same inputs.  No every function works that way.  Counterexamples include
> random() and last_insert_rowid().  But most built-in functions are
> factorable in the same way that datetime() is.

Wait a second... and you'll get different value of datetime('now'). In this 
sense datetime() is as deterministic as random(): it may give the same result
next invocation or may not, dependng on various circumstances not related to
the function itself.

> Currently there is no API to designate an application-defined function as
> being "constant" in the sense that it always generates the same output
> given the same inputs.  Hence, SQLite assumes the worst about
> application-defined functions and never tries to factor them out of the
> inner loop.  Probably we should add a new API that allows the application
> to state auxiliary properties about application-defined functions (such as
> whether or not it is "constant", whether or not it can return NULL, whether
> or not it might change the encoding of its input parameters, etc.)  But
> that has not been done yet.

To my opinion, the most general solution is to let to the application 
programmer to decide whether to calcucale the function once (say, at 
the beginning of a transaction), store the result and then access 
the stored value, or to make the new call to the function each 
iteration, depending on the application semantics.

Valentin Davydov.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-24 Thread Constantine Yannakopoulos
On Sun, Nov 24, 2013 at 4:30 PM, Petite Abeille wrote:

>
> On Nov 24, 2013, at 3:17 PM, Doug Currie  wrote:
>
> > There is value in compatibility, but those adjectives are awful.
>
> FWIW, DETERMINISTIC is what Oracle uses:
>
>
> http://docs.oracle.com/cd/E11882_01/appdev.112/e25519/function.htm#LNPLS1183
>

There's also this:
http://docs.oracle.com/cd/B13789_01/appdev.101/b10807/13_elems039.htm that
specifies various levels of "purity", i.e. if the package function reads
from or writes to the database or global state. Each level allows for more
aggressive optimization when the function takes part in a query.

It's been a while since I've coded in PL/SQL but I think that if the
RESTRICT_REFERENCES pragma is not specified for a package function the SQL
engine will not accept calls to it from an SQL statement. Also, if the
implementation of a function violates its RESTRICT_REFERENCES pragma the
PL/SQL compiler will not compile it.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-24 Thread RSmith

Ugh, my last thought was not well-formed - apologies.

When I said:
"...can add a function to replace an SQL function to improve it many times for the 
specific purpose".

This would of course hardly matter if the SQL (or SQLite specifically) function was already deterministic (read: cached). My 
thinking was more towards enforcing determinism on something that isn't usually, if it does not affect your query's outcome (i.e 
specific purpose).


Hope that is more clear - thanks.
Ryan


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-24 Thread RSmith
Agreed - also some functions might not be intrinsically deterministic, but it may well be so for the duration of a query. There may 
need to be some thinking on this.


I refer back to a discussion earlier (and subsequent SQLite adaption) which made a date-time reference deterministic within a single 
query for transactional integrity.  Similarly if one was to add a function which returns a date-dependant value, such as 
'dayOfWeek(x)' and mark it as deterministic for inside a single query, that would make sense, even though it would be very 
indeterministic (or referentially opaque, if you will) between queries.  This is all dandy unless you have queries (as seen on this 
forum) that run for three days or more... would it matter then?


I for one would very much like the ability to specify added functions as deterministic or not (or whatever term would indicate 
'cacheability of the result') as this might be an enormous efficiency improvement in itself, and also allow you to now add things to 
improve speed in some standard queries.  Think of the people who have queries running for ages and can add a function to replace an 
SQL function to improve it many times for the specific purpose.*


Actually, thinking about it, this was always achievable with some clever programming and managing the result cache yourself - but I 
still think this addition would be an improvement.


That's my 2 cents.
Have a great day all!
Ryan


*This is not to say the SQL way is not good, but it often has to cater for a wide variety of things where a user-added function 
might be very tuned to a specific purpose - ridding a lot of CPU-time fat.



On 2013/11/24 16:30, Petite Abeille wrote:

On Nov 24, 2013, at 3:17 PM, Doug Currie  wrote:


There is value in compatibility, but those adjectives are awful.

FWIW, DETERMINISTIC is what Oracle uses:

http://docs.oracle.com/cd/E11882_01/appdev.112/e25519/function.htm#LNPLS1183

I would personally stick to that if such functionality was ever introduced in 
SQLite.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-24 Thread Petite Abeille

On Nov 24, 2013, at 3:17 PM, Doug Currie  wrote:

> There is value in compatibility, but those adjectives are awful. 

FWIW, DETERMINISTIC is what Oracle uses:

http://docs.oracle.com/cd/E11882_01/appdev.112/e25519/function.htm#LNPLS1183

I would personally stick to that if such functionality was ever introduced in 
SQLite.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-24 Thread Doug Currie

On Nov 24, 2013, at 6:47 AM, Alek Paunov  wrote:
> 
> BTW, I see the term "deterministic" in the SQL99 BNFs:
> …
> but different in PostgreSQL ("immutable", "stable", etc):


There is value in compatibility, but those adjectives are awful. In computer 
science we have referential transparency

http://en.wikipedia.org/wiki/Referential_transparency_(computer_science)

and pure functions

http://en.wikipedia.org/wiki/Pure_function

e


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-24 Thread Alek Paunov

On 23.11.2013 13:18, Richard Hipp wrote:

SQLite must know that the function always gives the same output given the
same inputs.  No every function works that way.  Counterexamples include
random() and last_insert_rowid().  But most built-in functions are
factorable in the same way that datetime() is.


BTW, I see the term "deterministic" in the SQL99 BNFs:

http://savage.net.au/SQL/sql-99.bnf.html#deterministic%20characteristic
http://savage.net.au/SQL/sql-2003-2.bnf.html#deterministic%20characteristic

aslo found in MySQL:

http://dev.mysql.com/doc/refman/5.6/en/create-procedure.html

but different in PostgreSQL ("immutable", "stable", etc):

http://www.postgresql.org/docs/9.3/static/sql-createfunction.html

I think "deterministic" is used also in the Prolog, whit the same meaning.

Kind Regards,
Alek

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-23 Thread Richard Hipp
On Sat, Nov 23, 2013 at 5:26 AM, Pepijn Van Eeckhoudt <
pep...@vaneeckhoudt.net> wrote:

> Is datetime special in thuis context or will constant expression hoisting
> like this happen for any function?
>

SQLite must know that the function always gives the same output given the
same inputs.  No every function works that way.  Counterexamples include
random() and last_insert_rowid().  But most built-in functions are
factorable in the same way that datetime() is.

Currently there is no API to designate an application-defined function as
being "constant" in the sense that it always generates the same output
given the same inputs.  Hence, SQLite assumes the worst about
application-defined functions and never tries to factor them out of the
inner loop.  Probably we should add a new API that allows the application
to state auxiliary properties about application-defined functions (such as
whether or not it is "constant", whether or not it can return NULL, whether
or not it might change the encoding of its input parameters, etc.)  But
that has not been done yet.


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-23 Thread Pepijn Van Eeckhoudt
Is datetime special in thuis context or will constant expression hoisting like 
this happen for any function?

Pepijn

> Op 22-nov.-2013 om 15:35 heeft Richard Hipp  het volgende 
> geschreven:
> 
> The www.sqlite.org server logs are stored in an SQLite database (of
> course).  We have a script that is run daily on that database with a dozen
> or so queries that look something like this:
> 
> .print
> .print  Downloads in the past 24 hours 
> .mode column
> .width -6 -6 100
> .header on
> SELECT count(*) AS count, count(distinct ip) AS IPs, url
>  FROM log
> WHERE date>datetime('now','-24 hours')
>AND (url GLOB '*/2013/*'
> OR url GLOB '*/snapshot/*'
> OR url GLOB '*/xfer'
> OR url GLOB '*/blob/*')
> GROUP BY url
> ORDER BY count DESC;
> 
> Today's database contains 6 days worth of data, is 590MB in size, and is
> just short of 2 million records.  There are no indices, so each query is a
> full table scan.  Using SQLite 3.8.1, the query above took 1.793 seconds on
> a recent test run.  But the code on trunk (and the latest snapshots at
> http://www.sqlite.org/download.html) took only 0.686 seconds.  The
> difference is that the string constants and the datetime() function call
> are factored out of the inner loop in 3.8.2.
> 
> To be fair:  Our production script does not contain exactly the SQL shown
> above. Rather than using the bare datetime() call, the real script says:
> "(SELECT datetime('now','-24 hours'))".  Putting the datetime() call inside
> a subquery is a trick that causes the datetime function to only be invoked
> once even without the new constant-function factoring optimization of
> 3.8.2.  And with that trick, the performance difference is not nearly so
> dramatic (though 3.8.2 is still faster by about 10%).  Also, the
> performance difference here would only apply to a full table scan that
> spends a lot of time looping inside of the virtual machine.  There would
> not be nearly as dramatic a difference if the database held an index on the
> "log.date" column.
> 
> Nevertheless, we suspect that queries like the above are common in the
> wild, and so we hope that the upcoming 3.8.2 release will make a big
> difference for some people.
> 
> If you try the 3.8.2 pre-release snapshot in your application and see a
> performance improvement, we'd appreciate hearing from you.
> 
> For testing and comparison purposes, the optimization that doubles the
> performance of the query above can be disabled using the following API:
> 
>sqlite3_test_control(SQLITE_TESTCTRL_OPTIMIZATIONS, db, 8);
> 
> where "db" is the database connection create by sqlite3_open(), or if you
> are using the command-line shell:
> 
>.testctrl optimizations 8
> 
> -- 
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-22 Thread Richard Hipp
The www.sqlite.org server logs are stored in an SQLite database (of
course).  We have a script that is run daily on that database with a dozen
or so queries that look something like this:

.print
.print  Downloads in the past 24 hours 
.mode column
.width -6 -6 100
.header on
SELECT count(*) AS count, count(distinct ip) AS IPs, url
  FROM log
 WHERE date>datetime('now','-24 hours')
AND (url GLOB '*/2013/*'
 OR url GLOB '*/snapshot/*'
 OR url GLOB '*/xfer'
 OR url GLOB '*/blob/*')
 GROUP BY url
 ORDER BY count DESC;

Today's database contains 6 days worth of data, is 590MB in size, and is
just short of 2 million records.  There are no indices, so each query is a
full table scan.  Using SQLite 3.8.1, the query above took 1.793 seconds on
a recent test run.  But the code on trunk (and the latest snapshots at
http://www.sqlite.org/download.html) took only 0.686 seconds.  The
difference is that the string constants and the datetime() function call
are factored out of the inner loop in 3.8.2.

To be fair:  Our production script does not contain exactly the SQL shown
above. Rather than using the bare datetime() call, the real script says:
"(SELECT datetime('now','-24 hours'))".  Putting the datetime() call inside
a subquery is a trick that causes the datetime function to only be invoked
once even without the new constant-function factoring optimization of
3.8.2.  And with that trick, the performance difference is not nearly so
dramatic (though 3.8.2 is still faster by about 10%).  Also, the
performance difference here would only apply to a full table scan that
spends a lot of time looping inside of the virtual machine.  There would
not be nearly as dramatic a difference if the database held an index on the
"log.date" column.

Nevertheless, we suspect that queries like the above are common in the
wild, and so we hope that the upcoming 3.8.2 release will make a big
difference for some people.

If you try the 3.8.2 pre-release snapshot in your application and see a
performance improvement, we'd appreciate hearing from you.

For testing and comparison purposes, the optimization that doubles the
performance of the query above can be disabled using the following API:

sqlite3_test_control(SQLITE_TESTCTRL_OPTIMIZATIONS, db, 8);

where "db" is the database connection create by sqlite3_open(), or if you
are using the command-line shell:

.testctrl optimizations 8

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users