[sqlite] order by not working in combination with random()

Scott Hess Thu, 27 Aug 2015 07:50:50 -0700

I keep thinking I remember a thread from years ago where a lot of this was
hashed out, but I cannot find it.

I seem to remember one point which made sense was that while most functions
with no parameters were reasonably considered static across the entire
statement's execution, RANDOM() needed to be a special case to be useful
for much.  For instance:
   SELECT RANDOM();
would be an obvious way to select a random value, with no ambiguity, while
in either of these two cases:
   SELECT *, RANDOM() FROM t;
   SELECT * FROM t ORDER BY RANDOM() LIMIT 1;
the statements make no sense if RANDOM() is lifted as constant across the
entire execution.  But only in the second case is SQLite really in a
privileged position - in the first case, the client code could trivially
augment the result set with random values, while in the second case SQLite
can internally optimize to return a random subset efficiently.

Now, something like:
   SELECT RANDOM(), * FROM t ORDER BY 1 LIMIT 1;
is ambiguous.  On the one hand, the author's intent seems obvious, to pick
a random row and keep the value used to pick the row.  On the other hand,
you have to stretch a bit to construct a logical reason why the client code
cares what random values were being used by SQLite for this.  Likewise for
using RANDOM() in the WHERE clause - what possible difference could it make
to the client?

Where I think that ends up is a principle-of-least-surprise position.  If a
statement uses RANDOM() in multiple places, each of those should be
independent each time within a row, and also each time across rows.  But
RANDOM() AS r should allow r to be logically consistent when used in a
WHERE clause or ORDER BY.

There is already some precedent for this, because ORDER BY RANDOM() must
internally be holding the random values used fixed during the course of the
sort, rather than regenerating them each time it looks the value up.

-scott

On Thu, Aug 27, 2015 at 4:11 AM, R.Smith <rsmith at rsweb.co.za> wrote:

>
>
> On 2015-08-27 03:03 AM, James K. Lowden wrote:
>
>> On Wed, 26 Aug 2015 13:39:09 +0100
>> Simon Slavin <slavins at bigfraud.org> wrote:
>>
>> On 26 Aug 2015, at 11:27am, tonyp at acm.org wrote:
>>>
>>> Plus, it apparently recognizes if the random() expression in the
>>>> ORDER BY is the same as the SELECT one and again sort correctly
>>>> (without re-evaluating) and without needing an alias.
>>>>
>>> Ah, but I would call /that/ a bug !
>>>
>> I think I can convince you Postgres is right and SQLite is not.  :-)
>> I'll give you 4 reasons:
>>
>> 1.  Theory
>> 2.  Authority
>> 3.  Atomicity
>> 4.  Consistency
>>
>
> Thanks for the informative post, and I do agree with it except for
> specifically the Random() function.
>
> Random() is not a function you use when you expect reproducible results.
> The very nature of the word suggests the opposite.
>
> Further to this, among other posts there seem to be some confusion about
> the purpose of aliasing - especially since people think in programming
> terms. An alias isn't a variable or an array, it doesn't magically hold a
> state - it simply is a nick-name for a function.
>
> This statement:
>   SELECT Random() AS rr ORDER BY rr
>
> cannot ever be functionally different to:
>   SELECT Random() ORDER BY Random()
>
> To use programming terms: rr is not a variable - it does not "hold" a
> value for the Random() function, it simply is a alphanumeric "pointer" to
> the the Random() function (in this case).
>
> I wouldn't go so far as to call the behavior in Postgres a "bug" but just
> an implementation detail - I mean the values do end up being random, do
> they not? They just chose to make the sorting possible by keeping the state
> information - a good thing since it produces what most people (especially
> programmers) seem to expect, but in no way (I think) an absolute adherence
> to a standard.
>
> By James' standard, the Query:
> SELECT Random() FROM t;  -- imagine table t hold 10 tuples
>
> Should provide:
> Random()
> --------------
> 76398763987097830
> 76398763987097830
> 76398763987097830
> 76398763987097830
> 76398763987097830
> 76398763987097830
> 76398763987097830
> 76398763987097830
> 76398763987097830
> 76398763987097830
>
> And I happen to agree if we do apply strict relational theory. The thing
> is though, when people invoke the Random() function, they really do not
> wish to see the above (as opposed to when the reference the Now() or
> similar functions, the above would be exactly what they wish to see). So
> SQLite, like PostGres, offer an alternate implementation detail in this
> regard.
>
> The debate on whether either is absolutely "right" or "wrong" is just a
> silly one.  Postgres has the luxury of not being "Lite" nor needing to run
> on a phone, so the added code and memory to deal with saving a state of
> randomness for an entire result-set is not really problematic in Postgres,
> and not really in the design goals of SQLite.
>
>
> FWIW,
> Ryan
>
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

[sqlite] order by not working in combination with random()

Reply via email to