On 2016-05-15 10:50 PM, J Decker wrote:
> On Sun, May 15, 2016 at 10:02 PM, Darren Duncan <darren at darrenduncan.net> 
> wrote:
>> On 2016-05-15 9:56 PM, J Decker wrote:
>>>
>>> On Sun, May 15, 2016 at 9:29 PM, Darren Duncan <darren at darrenduncan.net>
>>> wrote:
>>>>
>>>> On 2016-05-15 12:35 AM, Simon Slavin wrote:
>>>>>
>>>>> All true.  But it brings up a question.  Suppose the following:
>>>>>
>>>>> first   second
>>>>> -----   ------
>>>>> Mark    Spark
>>>>> Emily   Spark
>>>>> Mary    Soper
>>>>> Brian   Soper
>>>>>
>>>>> SELECT first,second FROM members ORDER BY second LIMIT 3
>>>>>
>>>>> Without looking up either a standards document for SQL or the
>>>>> documentation for your favourite implementations of SQL, answer this
>>>>> question:
>>>>>
>>>>> Does the documentation for your favourite implementation of SQL state
>>>>> that
>>>>> you'll get the same rows every time you execute the above "SELECT" ?
>>>>
>>>>
>>>> I think a proper solution for this then is to treat the LIMIT as
>>>> approximate
>>>> rather than exact; it indicates a desire rather than a promise.
>>>>
>>>> In the scenario you describe, the query should return either 2 rows or 4
>>>> rows, so that ALL of the rows whose second field value of "Spark" are, or
>>>> are not, returned.  Projecting this to there not being an ORDER BY
>>>> clause,
>>>> either all rows are returned or zero rows are returned.  Thus the result
>>>> is
>>>> deterministic.
>>>
>>> even if it did for 'spark' rows (which either case you suggest would
>>> be horrible) 'soper' would still be non-deterministic, and rebuilding
>>> indexes could reorder the results.
>>
>> No, it is still deterministic.
>>
>> The ORDER BY clause specified a partial order of the results, not a total
>> order.
>>
>> What I specified returns only complete groups of rows where within each
>> group the rows are unordered but the groups as a whole are ordered relative
>> to each other.
>>
>> The fact this is deterministic would probably be more clear if the result
>> rows were nested, one outer row per "group" that I mentioned.  But even if
>> not, the caller knew that they were only ordering by second but selecting
>> first, so if they see multiple rows with the same second value, they know
>> that those rows are not sorted between themselves, only that rows with
>> different second values are sorted relative to each other.
>>
>> So fully deterministic.
>
> 'SELECT first,second FROM members ORDER BY second LIMIT 3' (that's
> mysql format right?)
>
> I don't see a full set as a requirement (such that the output would be
> 2 or 4 records and not the 3 I asked for...) .  the query implies 3
> rows, not 3 sets.

I never said interpret it as 3 sets, I said round the number of rows to the 
nearest whole set (in either the up or down direction).

> SELECT first,second FROM members ORDER BY second LIMIT 3,3 (for the
> next 3 lines I'm displaying on a form for instance)
>
> and specifying that the result set includes a first name, the result
> sets taken as a hole are not guaranteed equal (procedurally and in
> practice they may be, but pessimistically...).

If someone is doing pagination, then at least one of these scenarios should be 
true:

1.  The ORDER BY is being done on a superkey, which guarantees a total order.

2.  Row groups of > 1 row that cross a boundary are returned by BOTH queries, 
eg 
"ORDER BY second LIMIT 3 OFFSET 0" and "ORDER BY second LIMIT 3 OFFSET 3" would 
both return both "Spark" rows, thus ensuring there are no holes.

Clearly, option 1 is better, but failing that, having pages that overlap is NOT 
a bad thing and even has precedent (assuming users are made aware pages can 
overlap), such as often happens in say a GUI scrolling list.

On a further note...

While this behavior isn't something easily portable across DBMSs, SQLite can 
give deterministic results for itself, at least for "with rowid" tables, if it 
implicitly sorts by the rowid after the explicit terms, as if the user had 
written "rowid" at the end of the ORDER BY list.  Since the rowid is an actual 
table column exposed to the user and not just a hidden implementation detail, 
this is consistent with the relational model.

-- Darren Duncan

Reply via email to