Re: [sqlite] How to enforce a specific order of group_concat?

2020-03-03 Thread Dominique Devienne
On Mon, Mar 2, 2020 at 6:35 PM Keith Medcalf  wrote:
> Well, in theory an order by in a nested select means that the result of the 
> operation is an ordered projection and not merely a set of rows.
> For this particular case (a nested select with an order by and the outer 
> query with an aggregate) the query will not be flattened (#16)

OK. I was more trying to find out whether such nested "ordered"
projections were a standard-SQL thing or not.

> select x,y from (select x, y from t order by y) order by x;
> will do two order-by sorts to obtain the result even though the query could 
> be (in this particular case) re-written as "select x, y from t order by x, y"

That's assuming the sort is "stable" :)  Stable-sort is typically
slower than non-stable-sort, that's why the STL has std::sort and
std::stable_sort.

> This is why putting an "order by" in a view will usually preclude query 
> flattening because the view is not merely producing a "set of rows" it is 
> producing an "ordered projection" and the ordering must be significant else 
> it would not be there.

I would actually prefer these nested order-by to be ignored, and the
"set of rows" being assumed, forcing the outer query to do its own
ordering.
The very notion of "ordered projection" for nested query sounds more
like an implementation detail, to word-around the lack of window
functions,
than something "official" from the SQL standard or relational theory.

I'm not disputing how SQLite implements things, for historical or
practical reasons, I just want to understand whether such "ordered
projection"
is an official concept from SQL or just an SQLite thing. --DD
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to enforce a specific order of group_concat?

2020-03-02 Thread Keith Medcalf

On Monday, 2 March, 2020 09:20, Dominique Devienne  wrote:

>On Mon, Mar 2, 2020 at 5:09 PM Keith Medcalf  wrote:

>> select group_concat(value) from (select distinct value from test order by 
>> value);

>But is that guaranteed to be ordered correctly "forever" instead of by
>"happenstance" from current implementation details? 

>My point was that the Window Function version is ordered "by design", 
>and not an implementation detail (as I think
>the simpler version is).

>Your subquery returns rows in a given order too, but "who" says
>they'll be processed in that order?

>Tables are just "sets of rows" after all, and the relational model is
>about set-theory, no? order by in subquery therefore make little to no 
>sense in nested SQL (in theory...). --DD

Well, in theory an order by in a nested select means that the result of the 
operation is an ordered projection and not merely a set of rows.  For this 
particular case (a nested select with an order by and the outer query with an 
aggregate) the query will not be flattened (#16)

https://sqlite.org/optoverview.html#flattening

Consider that

create table t (x,y);
insert into t values (1,1),(2,2),(3,3),(4,1),(4,2),(4,3),(4,4);
select x,y from (select x, y from t order by y) order by x;

will do two order-by sorts to obtain the result even though the query could be 
(in this particular case) re-written as "select x, y from t order by x, y" 
because a query with an order-by in both the outer and nested query cannot be 
flattened (#11).

This is why putting an "order by" in a view will usually preclude query 
flattening because the view is not merely producing a "set of rows" it is 
producing an "ordered projection" and the ordering must be significant else it 
would not be there.

Of course in the case of the original statement:

select group_concat(distinct value) from test order by value;

The "order by value" might in fact be used to select the use of an index on 
value to scan, rather than the table test, if that results in less I/O that 
scanning the table.  However, the Query Planner doesn't believe aggregate 
functions (including group_concat) results are non-commutative so determines 
that the "order by value" clause is superfluous (perhaps this is an error, 
since other functions such as sum(), total(), avg() can also be non-commutative 
in certain pathological cases and have varying results depending on the 
ordering of the data sent to them, especially avg() since it merely returns 
sum()/count() rather than a successive approximation to the mean, though 
successive approximation still has pathological cases for non-sorted input, 
they are fewer than the sum()/count() method).

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to enforce a specific order of group_concat?

2020-03-02 Thread Dominique Devienne
On Mon, Mar 2, 2020 at 5:09 PM Keith Medcalf  wrote:
> select group_concat(value) from (select distinct value from test order by 
> value);

But is that guaranteed to be ordered correctly "forever" instead of by
"happenstance"
from current implementation details? My point was that the Window
Function version
is ordered "by design", and not an implementation detail (as I think
the simpler version is).

Your subquery returns rows in a given order too, but "who" says
they'll be processed in that order?
Tables are just "sets of rows" after all, and the relational model is
about set-theory, no? order by
in subquery therefore make little to no sense in nested SQL (in theory...). --DD
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to enforce a specific order of group_concat?

2020-03-02 Thread Keith Medcalf

You mean like:

select group_concat(value) over (order by value rows between unbounded 
preceding and unbounded following) from (select distinct value from test) limit 
1;
and
select group_concat(value) over (order by value desc rows between unbounded 
preceding and unbounded following) from (select distinct value from test) limit 
1;

which seems far more convoluted than just:

select group_concat(value) from (select distinct value from test order by 
value);
and
select group_concat(value) from (select distinct value from test order by value 
desc);


-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-Original Message-
>From: sqlite-users  On
>Behalf Of Dominique Devienne
>Sent: Monday, 2 March, 2020 08:02
>To: SQLite mailing list 
>Subject: Re: [sqlite] How to enforce a specific order of group_concat?
>
>On Sun, Mar 1, 2020 at 10:58 PM mailing lists 
>wrote:
>> Are there any other solutions / possibilities?
>
>I thought someone more knowledgeable than I about Window Functions [1]
>would answer,
>but since nobody mentioned them so far, I'll do it, as I believe this
>is the "SQL native" way
>to achieve what you want (modulo DISTINCT perhaps). Notably (from the
>doc):
>
>Every aggregate window function can also work as a ordinary aggregate
>function,
>simply by omitting the OVER and FILTER clauses. Furthermore, all of
>the built-in aggregate
>functions of SQLite can be used as an aggregate window function by
>adding an appropriate OVER clause
>
>[2] has an example with group_concat() and OVER (ORDER BY ...). I
>assume that's what you need,
>someone better at Window Functions then me (not difficult!) can
>confirm or not that. --DD
>
>[1] https://www.sqlite.org/windowfunctions.html
>[2] https://www.sqlite.org/windowfunctions.html#aggwinfunc
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to enforce a specific order of group_concat?

2020-03-02 Thread Dominique Devienne
On Sun, Mar 1, 2020 at 10:58 PM mailing lists  wrote:
> Are there any other solutions / possibilities?

I thought someone more knowledgeable than I about Window Functions [1]
would answer,
but since nobody mentioned them so far, I'll do it, as I believe this
is the "SQL native" way
to achieve what you want (modulo DISTINCT perhaps). Notably (from the doc):

Every aggregate window function can also work as a ordinary aggregate function,
simply by omitting the OVER and FILTER clauses. Furthermore, all of
the built-in aggregate
functions of SQLite can be used as an aggregate window function by
adding an appropriate OVER clause

[2] has an example with group_concat() and OVER (ORDER BY ...). I
assume that's what you need,
someone better at Window Functions then me (not difficult!) can
confirm or not that. --DD

[1] https://www.sqlite.org/windowfunctions.html
[2] https://www.sqlite.org/windowfunctions.html#aggwinfunc
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to enforce a specific order of group_concat?

2020-03-02 Thread Jean-Luc Hainaut

On 1/03/2020 22:57, mailing lists wrote:

Assume I create the following table:

CREATE TABLE Test (ID INTEGER PRIMARY KEY, Value TEXT);
INSERT INTO Test (Value) VALUES('Alpha');
INSERT INTO Test (Value) VALUES('Beta');
INSERT INTO Test (Value) VALUES('Beta');
INSERT INTO Test (Value) VALUES('Alpha');

According to the documentation of group_concat the order is undefined, indeed:

SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value ASC;
SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value DESC;

Both queries result in Alpha,Beta.

Changing the queries to

WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value ASC) SELECT 
group_concat(x) FROM Result;
WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value DESC) SELECT 
group_concat(x) FROM Result;

leads to the results Alpha,Beta, respectively Beta,Alpha.

Is this a coincidence or is this guaranteed to work?

Are there any other solutions / possibilities?


I also sometimes need a deterministic version of group_concat(). For 
such cases, I have written the short UDF below (in Python, but I guess 
the algorithm can easily be translated in other languages):


JLH

class group_concat2:
# Rewriting of "group_concat" of SQLite to simulate that of MySQL.
# Implements "distinct", "order by", "descending" and "separator".
# Interprets "null" values "intuitively"
#
# Format: group_concat2(value,distinct,sortkey,direction,separator)
#   value: char or numeric SQL expression; if numeric, 
converted into char;

#  the next value to concatenate;
#  discarded if None (Python translation of SQL null).
#   distinct:  numeric or char SQL expression; if char, converted 
into integer;

#  uniqueness indicator;
#  if 1, duplicates ignored; if 0, duplicates allowed.
#   sortkey:   char or numeric SQL expression (no conversion);
#  the order key value for the current "value" instance;
#  If None or u'', the current "value" instance is used 
instead.
#   direction: numeric or char SQL expression; if char, converted 
into integer;

#  ordering direction (1 = asc; 2 = desc).
#   sep:   char or numeric SQL expression; if numeric, 
converted into char;

#  value separator;
#  If None, = default u','.
# Example:
#   select City,group_concat2(lower(CustID),1,Account,'2','; ') as 
Customers

#   from   CUSTOMER group by City;

def __init__(self):
# Initialize
self.number  = 0# number of values added
self.valList = []   # List of values to concatenate
self.orderby = []   # list of values of the order key
self.distinct = 0   # whether "valList" values must be unique 
(0 = no; 1 = yes)

self.direction = 1  # ordering direction (1 = asc; 2 = desc)
self.sep = u',' # separator

def step(self,value,distinct,sortkey,direction,sep):
# Adding a new value to concatenate.
# Each call of this method may specify different values of
#  (distinct,sortkey,direction,sep) parameters.
# However, only those specified by the call of the first 
"value" instance

# will be considered, the others being ignored.

import numbers

self.number += 1

# Initialize user values of "distinct", "direction" and "sep"
if self.number == 1:
if distinct in [1,u'1']:
self.distinct = 1
if direction in [1,2,u'1',u'2']:
self.direction = int(direction)
if sep is not None:
if isinstance(sep,numbers.Number):
self.sep = unicode(sep)
else:
self.sep = sep

if sortkey is None:
sortkey = value if value is not None else 1
elif sortkey == u'':
sortkey = value if value is not None else 1

if value is not None:
if isinstance(value,numbers.Number):
value = unicode(value)

if self.distinct:
if value not in self.valList:
self.valList.append(value)
self.orderby.append(sortkey)
else:
self.valList.append(value)
self.orderby.append(sortkey)
else:
# value discarded
pass

def finalize(self):
if self.direction == 1:
self.valList = [y for x,y in 
sorted(zip(self.orderby,self.valList),reverse=False)]

else:
self.valList = [y for x,y in 
sorted(zip(self.orderby,self.valList),reverse=True)]

return self.sep.join(self.valList)








___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to enforce a specific order of group_concat?

2020-03-02 Thread mailing lists
Hi Keith,

thanks for the explanation.

PS: I used a CTE because official examples (e.g. Mandelbrot) also used CTEs in 
combination with group_concat. Although the incorporation of group_concat was 
not the primary reason to use CTEs.
PPS: Is it possible to rephrase the documentation for group_concat in the sense 
that it does not mention that the order is (always) arbitrary? 

Regards,
Hardy

> Am 2020-03-02 um 04:46 schrieb Keith Medcalf  >:
> 
> 
> On Sunday, 1 March, 2020 14:58, mailing lists  > wrote:
> 
>> Assume I create the following table:
> 
>> CREATE TABLE Test (ID INTEGER PRIMARY KEY, Value TEXT);
>> INSERT INTO Test (Value) VALUES('Alpha');
>> INSERT INTO Test (Value) VALUES('Beta');
>> INSERT INTO Test (Value) VALUES('Beta');
>> INSERT INTO Test (Value) VALUES('Alpha');
> 
>> According to the documentation of group_concat the order is undefined,
>> indeed:
> 
>> SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value ASC;
>> SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value DESC;
> 
>> Both queries result in Alpha,Beta.
> 
>> Changing the queries to
> 
>> WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value ASC)
>> SELECT group_concat(x) FROM Result;
>> WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value DESC)
>> SELECT group_concat(x) FROM Result;
> 
>> leads to the results Alpha,Beta, respectively Beta,Alpha.
> 
>> Is this a coincidence or is this guaranteed to work?
> 
>> Are there any other solutions / possibilities?
> 
> group_concat builds a "group" by concatenating the values sent to it in the 
> order they are sent.  If you do not know this order then for all intents and 
> purposes the order is "undefined" because it is defined as the order in which 
> the query planner decides to visit the rows forming the group.  SQLite3 
> believes that all aggregate functions are commutative and that the order in 
> which rows are fed into them is immaterial to the result and there 
> (presently) is no way to specify that this is not the case.
> 
> So in the rare case where the aggregate is not commutative and you depend on 
> the presentation order, then you must specify it.  The only built-in 
> aggregate that is not commutative is the group_concat function.  If you were 
> to write another non-commutative aggregate function, lets say SHA1(...), that 
> computed the SHA1 hash of the values fed into it, you would also have to 
> control the presentation order or the result would be "undefined".
> 
> select group_concat(value) from (select distinct value from test order by 
> value desc); will do that.  (rephrasing as a CTE makes no difference)
> 
> This works because the query as phrased cannot be flattened since the outer 
> query contains an aggregate and the inner query contains an order by.
> 
> Moving the distinct does not alter the fact that the query cannot be 
> flattened.
> 
> select group_concat(distinct value) from (select value from test order by 
> value desc);
> 
> Whether the query planner will always not flatten a query where the outer 
> query contains an aggregate and the inner query contains an order by is 
> something on which I cannot comment other than to say that is does not 
> flatten such a query up to now.
> 
> -- 
> The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
> lot about anticipated traffic volume.
> 
> 
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org 
> 
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users 
> 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to enforce a specific order of group_concat?

2020-03-01 Thread Keith Medcalf

On Sunday, 1 March, 2020 14:58, mailing lists  wrote:

>Assume I create the following table:

>CREATE TABLE Test (ID INTEGER PRIMARY KEY, Value TEXT);
>INSERT INTO Test (Value) VALUES('Alpha');
>INSERT INTO Test (Value) VALUES('Beta');
>INSERT INTO Test (Value) VALUES('Beta');
>INSERT INTO Test (Value) VALUES('Alpha');

>According to the documentation of group_concat the order is undefined,
>indeed:

>SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value ASC;
>SELECT group_concat(DISTINCT Value) FROM Test ORDER BY Value DESC;

>Both queries result in Alpha,Beta.

>Changing the queries to

>WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value ASC)
>SELECT group_concat(x) FROM Result;
>WITH Result(x) AS (SELECT DISTINCT Value FROM Test ORDER BY Value DESC)
>SELECT group_concat(x) FROM Result;

>leads to the results Alpha,Beta, respectively Beta,Alpha.

>Is this a coincidence or is this guaranteed to work?

>Are there any other solutions / possibilities?

group_concat builds a "group" by concatenating the values sent to it in the 
order they are sent.  If you do not know this order then for all intents and 
purposes the order is "undefined" because it is defined as the order in which 
the query planner decides to visit the rows forming the group.  SQLite3 
believes that all aggregate functions are commutative and that the order in 
which rows are fed into them is immaterial to the result and there (presently) 
is no way to specify that this is not the case.

So in the rare case where the aggregate is not commutative and you depend on 
the presentation order, then you must specify it.  The only built-in aggregate 
that is not commutative is the group_concat function.  If you were to write 
another non-commutative aggregate function, lets say SHA1(...), that computed 
the SHA1 hash of the values fed into it, you would also have to control the 
presentation order or the result would be "undefined".

select group_concat(value) from (select distinct value from test order by value 
desc); will do that.  (rephrasing as a CTE makes no difference)

This works because the query as phrased cannot be flattened since the outer 
query contains an aggregate and the inner query contains an order by.

Moving the distinct does not alter the fact that the query cannot be flattened.

select group_concat(distinct value) from (select value from test order by value 
desc);

Whether the query planner will always not flatten a query where the outer query 
contains an aggregate and the inner query contains an order by is something on 
which I cannot comment other than to say that is does not flatten such a query 
up to now.

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users