Re: JSONB_AGG: aggregate function calls cannot be nested

2021-02-20 Thread Thomas Kellerer

Alexander Farber schrieb am 20.02.2021 um 19:39:

So I am trying:

# SELECT
                 JSONB_AGG(TO_CHAR(finished, '-MM-DD')) AS day,
                 JSONB_AGG(SUM(CASE WHEN reason='regular' or reason='resigned' 
THEN 1 ELSE 0 END)::int) AS completed,
                 JSONB_AGG(SUM(CASE WHEN reason='expired' THEN 1 ELSE 0 
END)::int) AS expired
         FROM words_games
         WHERE finished > CURRENT_TIMESTAMP - INTERVAL '2 week'
         GROUP BY day
         ORDER BY day;
ERROR:  aggregate function calls cannot be nested
LINE 3:                 JSONB_AGG(SUM(CASE WHEN reason='regular' or ...


You need a second level of grouping:

select day as day,
   jsonb_agg(completed) as completed,
   jsonb_agg(expired) as expired)
from (
  SELECT TO_CHAR(finished, '-MM-DD') AS day,
 SUM(CASE WHEN reason='regular' or reason='resigned' THEN 1 ELSE 0 
END)::int AS completed,
 SUM(CASE WHEN reason='expired' THEN 1 ELSE 0 END)::int) AS expired
  FROM words_games
  WHERE finished > CURRENT_TIMESTAMP - INTERVAL '2 week'
  GROUP BY day
) t
GROUP BY day
ORDER BY day;


Btw:

SUM(CASE WHEN reason='regular' or reason='resigned' THEN 1 ELSE 0 END)::int 
AS completed,

can also be written as

count(*) filter (where reason in ('regular', 'resigned') as completed





Re: JSONB_AGG: aggregate function calls cannot be nested

2021-02-20 Thread David G. Johnston
On Sat, Feb 20, 2021 at 12:34 PM Alexander Farber <
alexander.far...@gmail.com> wrote:

> Ah, thank you...
>
> JSON support in PostgreSQL is cool and seems to be extended with each
> release.
>
> But standard tasks of returning a JSON map of lists or JSON list of list
> seem to be difficult to use.
>
>
With experience it just becomes verbose - at least for non-trivial cases.

David J.


Re: JSONB_AGG: aggregate function calls cannot be nested

2021-02-20 Thread Alexander Farber
Ah, thank you...

JSON support in PostgreSQL is cool and seems to be extended with each
release.

But standard tasks of returning a JSON map of lists or JSON list of list
seem to be difficult to use.

Greetings
Alex


Re: JSONB_AGG: aggregate function calls cannot be nested

2021-02-20 Thread David G. Johnston
On Sat, Feb 20, 2021 at 11:46 AM Alexander Farber <
alexander.far...@gmail.com> wrote:

> Then I have to split the query in 3 similar ones (with same condition)?
>
> I try:
>
> SELECT
> JSONB_AGG(TO_CHAR(finished, '-MM-DD')) AS day
> FROM words_games
> WHERE finished > CURRENT_TIMESTAMP - INTERVAL '2 week'
> GROUP BY day
> ORDER BY day;
> ERROR:  aggregate functions are not allowed in GROUP BY
> LINE 2: JSONB_AGG(TO_CHAR(finished, '-MM-DD')) A...
> ^
>

That's a whole different misunderstanding of aggregates that you are seeing.

I mis-spoke in the prior response though.  Its not that you only get one
column of an aggregate function per select - you only get to use a single
aggregate in each expression in a select/group-by.  array_agg(sum(...)) is
two aggregates in a single expression.

David J.


Re: JSONB_AGG: aggregate function calls cannot be nested

2021-02-20 Thread Alexander Farber
Then I have to split the query in 3 similar ones (with same condition)?

I try:

SELECT
JSONB_AGG(TO_CHAR(finished, '-MM-DD')) AS day
FROM words_games
WHERE finished > CURRENT_TIMESTAMP - INTERVAL '2 week'
GROUP BY day
ORDER BY day;
ERROR:  aggregate functions are not allowed in GROUP BY
LINE 2: JSONB_AGG(TO_CHAR(finished, '-MM-DD')) A...
^


Re: JSONB_AGG: aggregate function calls cannot be nested

2021-02-20 Thread David G. Johnston
On Sat, Feb 20, 2021 at 11:39 AM Alexander Farber <
alexander.far...@gmail.com> wrote:

>
> Or is the syntax error about being able to use JSONB_AGG only once per
> SELECT query?
>
>
That.

David J.


JSONB_AGG: aggregate function calls cannot be nested

2021-02-20 Thread Alexander Farber
Good evening,

I have the following query in 13.2:

# SELECT
TO_CHAR(finished, '-MM-DD') AS day,
SUM(CASE WHEN reason='regular' or reason='resigned' THEN 1
ELSE 0 END)::int AS completed,
SUM(CASE WHEN reason='expired' THEN 1 ELSE 0 END)::int AS
expired
FROM words_games
WHERE finished > CURRENT_TIMESTAMP - INTERVAL '2 week'
GROUP BY day
ORDER BY day;
day | completed | expired
+---+-
 2021-02-06 |   167 |  71
 2021-02-07 |   821 | 189
 2021-02-08 |   816 | 323
 2021-02-09 |   770 | 263
 2021-02-10 |   864 | 230
 2021-02-11 |   792 | 184
 2021-02-12 |   838 | 231
 2021-02-13 |   853 | 293
 2021-02-14 |   843 | 231
 2021-02-15 |   767 | 203
 2021-02-16 |   744 | 237
 2021-02-17 |   837 | 206
 2021-02-18 |   751 | 196
 2021-02-19 |   745 | 257
 2021-02-20 |   654 | 135
(15 rows)

It works well, but I would like to transform it into a JSONB map with 3
arrays.

So I am trying:

# SELECT
JSONB_AGG(TO_CHAR(finished, '-MM-DD')) AS day,
JSONB_AGG(SUM(CASE WHEN reason='regular' or
reason='resigned' THEN 1 ELSE 0 END)::int) AS completed,
JSONB_AGG(SUM(CASE WHEN reason='expired' THEN 1 ELSE 0
END)::int) AS expired
FROM words_games
WHERE finished > CURRENT_TIMESTAMP - INTERVAL '2 week'
GROUP BY day
ORDER BY day;
ERROR:  aggregate function calls cannot be nested
LINE 3: JSONB_AGG(SUM(CASE WHEN reason='regular' or ...
  ^

Shouldn't I use JSONB_AGG here, to build the 3 JSON arrays?

Or is the syntax error about being able to use JSONB_AGG only once per
SELECT query?

Greetings
Alex


Re: Please help to improve SELECT JOIN performance, EXPLAIN ANALYZE attached

2021-02-20 Thread Alexander Farber
With shared_buffers=16GB, pg_top shows:


last pid:  2613;  load avg:  0.49,  0.45,  0.37;   up 0+00:19:21
 16:41:16
16 processes: 16 sleeping
CPU states:  9.2% user,  0.0% nice,  0.2% system, 90.4% idle,  0.1% iowait
Memory: 21G used, 42G free, 29M buffers, 18G cached
DB activity:  39 tps,  0 rollbs/s,   0 buffer r/s, 100 hit%,   3625 row
r/s,   11 row w/s
DB I/O: 0 reads/s, 0 KB/s, 0 writes/s, 0 KB/s
DB disk: 0.0 GB total, 0.0 GB free (100% used)
Swap: 32G free

  PID USERNAME PRI NICE  SIZE   RES STATE   TIME   WCPUCPU COMMAND
 2114 postgres  200   17G 6378M sleep   2:11  0.77% 44.80% postgres:
words words_ru [local] idle
 2012 postgres  200   17G 6005M sleep   0:55  0.45% 26.30% postgres:
words words_ru [local] idle
 2107 postgres  200   17G 6175M sleep   1:16  0.02%  0.97% postgres:
words words_ru [local] idle
 1184 postgres  200   17G   22M sleep   0:00  0.02%  0.97% postgres:
walwriter
 2030 postgres  200   17G 5948M sleep   1:18  0.00%  0.00% postgres:
words words_ru [local] idle
 2013 postgres  200   17G 5724M sleep   0:57  0.00%  0.00% postgres:
words words_ru [local] idle
 2103 postgres  200   17G 5823M sleep   0:52  0.00%  0.00% postgres:
words words_ru [local] idle
 2031 postgres  200   17G 5619M sleep   0:41  0.00%  0.00% postgres:
words words_ru [local] idle
 2029 postgres  200   17G 1128M sleep   0:00  0.00%  0.00% postgres:
words words_ru [local] idle
 2104 postgres  200   17G  948M sleep   0:00  0.00%  0.00% postgres:
words words_ru [local] idle
 2106 postgres  200   17G 1257M sleep   0:00  0.00%  0.00% postgres:
words words_ru [local] idle
 1182 postgres  200   17G  231M sleep   0:00  0.00%  0.00% postgres:
checkpointer
 1183 postgres  200   17G  135M sleep   0:00  0.00%  0.00% postgres:
background writer
 1185 postgres  200   17G 8696K sleep   0:00  0.00%  0.00% postgres:
autovacuum launcher
 2614 postgres  200   17G   22M sleep   0:00  0.00%  0.00% postgres:
words words_ru [local] idle
 1187 postgres  200   17G 6764K sleep   0:00  0.00%  0.00% postgres:
logical replication launcher


On Sat, Feb 20, 2021 at 4:15 PM Alexander Farber 
wrote:

> Thank you all, I will try at first
>
> shared_buffers = 16GBand
>
> index on words_scores(uid, length(word) desc)
>
>
>


Re: Please help to improve SELECT JOIN performance, EXPLAIN ANALYZE attached

2021-02-20 Thread Alexander Farber
Thank you all, I will try at first

shared_buffers = 16GBand

index on words_scores(uid, length(word) desc)


Re: Please help to improve SELECT JOIN performance, EXPLAIN ANALYZE attached

2021-02-20 Thread Peter J. Holzer
On 2021-02-20 06:43:28 -0700, Michael Lewis wrote:
> Using auto_explain may be needed to catch the plan when it is slow, if the
> other suggestions don't resolve things for you.

Yes, auto_explain is very helpful. Unfortunately, by default it only
does an explain, not the equivalent of an explain analyze. So you know
what the plan was and what the total time was, but you still don't know
where it spent that time.

You can set auto_explain.log_analyze = on, but the documentation warns
that "this can have an extremely negative impact on performance". Does
anybody have experience with trying this in a real-world workload? (I
was never brave enough)

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Please help to improve SELECT JOIN performance, EXPLAIN ANALYZE attached

2021-02-20 Thread Michael Lewis
Using auto_explain may be needed to catch the plan when it is slow, if the
other suggestions don't resolve things for you.


Re: Please help to improve SELECT JOIN performance, EXPLAIN ANALYZE attached

2021-02-20 Thread Peter J. Holzer
On 2021-02-19 21:25:36 +0100, Alexander Farber wrote:
> In the PostgreSQL log I have noticed that the duration for 2 particular 
> queries
> have increased, especially in the evenings:
[...]
> One of the queries is actually a simple SELECT ... LIMIT 15 to find the 15
> longest words played by a user (they are displayed at the player profile 
> page).
> 
> I have run the EXPLAIN ANALYZE here: https://explain.depesz.com/s/IcqN

Well, I wouldn't call a query with 8 joins "simple". 

I notice two things:

1) It does a lot more than the description ("Select 15 longest words
   played by the user id 5") suggests. It not only selects those words, but
   also the games in which they were played and some information about both
   players.

2) The query seems to spend most of the time in
https://explain.depesz.com/s/IcqN#l11, retrieving all the words ever
played by that user via a parallel bitmap heap scan, only to sort
them and discard all but the 15 longest. I think using an index
should make that much more efficient.

Therefore, three suggestions:

1) Add an index on words_scores(uid, length(word) desc).

2) If postgresql still isn't using that, isolate the core query by
   putting it into a CTE:
with s as (select * from words_scores where uid = 5 order by length(word))
select ...
from s
join words_moves m ...

3) If that shifts the bottleneck towards that extra per-user info
(location, photo, ...) try to cache that in the application. That
probably doesn't change very often and doesn't have to be retrieved
from the database every time.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Please help to improve SELECT JOIN performance, EXPLAIN ANALYZE attached

2021-02-20 Thread Peter J. Holzer
On 2021-02-20 10:24:36 +0100, Alexander Farber wrote:
> The query when I try it does run in 15ms, but evening logs show the query (I
> think only small percentage of it) running 1-3s.
> 
> At the same time my CentOS 8 server with 64 GB RAM is never loaded, the load
> average show by top is 0.35-0.45.
> 
> What could be the reason, does PostgreSQL 13.2 has some scheduled task maybe?

One very common reason for wildly different execution times is where the
data is: Accessing data in RAM is a lot faster than accessing it on disk
(even for SSDs, but especially for rotating disks). 

So it might be that when you executed the EXPLAIN the data just happened
to be in RAM (that's almost certainly the case if you executed the same
query without EXPLAIN before that) while in those cases where it was
slow, the data (or a large part of it) had to be loaded from disk first.

Data might not be in RAM because it wasn't needed for some time, because
some other query or process which needed lots of RAM displaced it, or
because the server was just rebootet.


> I have only changed few lines in the postgresql.conf:

I don't see shared_buffers here. That's quite small by default so you
might want to increase it. The usual recommendation is to start with 25%
of the memory (that would be 16 GB in your case) and then see if it gets
better if decrease or increase it.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Please help to improve SELECT JOIN performance, EXPLAIN ANALYZE attached

2021-02-20 Thread Alexander Farber
Yes, Michael, that I have noticed too, but should have written more in my
original mail.

The query when I try it does run in 15ms, but evening logs show the query
(I think only small percentage of it) running 1-3s.

At the same time my CentOS 8 server with 64 GB RAM is never loaded, the
load average show by top is 0.35-0.45.

What could be the reason, does PostgreSQL 13.2 has some scheduled task
maybe?

I have only changed few lines in the postgresql.conf:

# diff postgresql.conf.OLD postgresql.conf
64c64
< max_connections = 100 # (change requires restart)
---
> max_connections = 120 # (change requires restart)
130c130
< #work_mem = 4MB   # min 64kB
---
> work_mem = 8MB# min 64kB
132c132
< #maintenance_work_mem = 64MB  # min 1MB
---
> maintenance_work_mem = 128MB  # min 1MB
393c393
< #effective_cache_size = 4GB
---
> effective_cache_size = 8GB
410,411c410,411
< #from_collapse_limit = 8
< #join_collapse_limit = 8  # 1 disables collapsing of explicit
---
> from_collapse_limit = 24
> join_collapse_limit = 24  # 1 disables collapsing of explicit
469c469
< #log_min_messages = warning   # values in order of decreasing
detail:
---
> log_min_messages = notice # values in order of decreasing
detail:
497c497
< #log_min_duration_statement = -1  # -1 is disabled, 0 logs all
statements
---
> log_min_duration_statement = 1000 # -1 is disabled, 0 logs all
statements

And I have pgbouncer in front of the PostgreSQL:

 diff pgbouncer.ini.OLD pgbouncer.ini
12a13,15
> words_de = host=/tmp user=xxx password=xxx dbname=words_de
> words_en = host=/tmp user=xxx  password=xxx dbname=words_en
> words_ru = host=/tmp user=xxx password=xxx dbname=words_ru
115a119
> ;; auth_type = md5
148c152
< ;server_reset_query = DISCARD ALL
---
> server_reset_query = DISCARD ALL
156c160
< ;ignore_startup_parameters = extra_float_digits
---
> ignore_startup_parameters = extra_float_digits
180c184
< ;max_client_conn = 100
---
> max_client_conn = 1000
185c189
< ;default_pool_size = 20
---
> default_pool_size = 100

Or is maybe everything ok...? The daily gzipped dump grows big:

939343358 Dec 31 01:33 words_ru-Dec.sql.gz
1221682336 Jan 31 01:33 words_ru-Jan.sql.gz
1423324283 Feb 20 01:34 words_ru-Feb.sql.gz

and being an inexperienced pg admin I am a bit worried.

Thank you
Alex

>