Re: Please help to improve SELECT JOIN performance, EXPLAIN ANALYZE attached
On 2021-02-21 10:14:04 -0700, Michael Lewis wrote: > No issues for us. We have used a low sample rate of 1% or so and gotten some > very useful data. Oh, somehow I never noticed the auto_explain.sample_rate parameter in the docs. Good to know. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Please help to improve SELECT JOIN performance, EXPLAIN ANALYZE attached
No issues for us. We have used a low sample rate of 1% or so and gotten some very useful data. Particularly with logging nested statements so we can profile stored procs more easily than adding a large number of raise notice statements.
Re: Please help to improve SELECT JOIN performance, EXPLAIN ANALYZE attached
With shared_buffers=16GB, pg_top shows: last pid: 2613; load avg: 0.49, 0.45, 0.37; up 0+00:19:21 16:41:16 16 processes: 16 sleeping CPU states: 9.2% user, 0.0% nice, 0.2% system, 90.4% idle, 0.1% iowait Memory: 21G used, 42G free, 29M buffers, 18G cached DB activity: 39 tps, 0 rollbs/s, 0 buffer r/s, 100 hit%, 3625 row r/s, 11 row w/s DB I/O: 0 reads/s, 0 KB/s, 0 writes/s, 0 KB/s DB disk: 0.0 GB total, 0.0 GB free (100% used) Swap: 32G free PID USERNAME PRI NICE SIZE RES STATE TIME WCPUCPU COMMAND 2114 postgres 200 17G 6378M sleep 2:11 0.77% 44.80% postgres: words words_ru [local] idle 2012 postgres 200 17G 6005M sleep 0:55 0.45% 26.30% postgres: words words_ru [local] idle 2107 postgres 200 17G 6175M sleep 1:16 0.02% 0.97% postgres: words words_ru [local] idle 1184 postgres 200 17G 22M sleep 0:00 0.02% 0.97% postgres: walwriter 2030 postgres 200 17G 5948M sleep 1:18 0.00% 0.00% postgres: words words_ru [local] idle 2013 postgres 200 17G 5724M sleep 0:57 0.00% 0.00% postgres: words words_ru [local] idle 2103 postgres 200 17G 5823M sleep 0:52 0.00% 0.00% postgres: words words_ru [local] idle 2031 postgres 200 17G 5619M sleep 0:41 0.00% 0.00% postgres: words words_ru [local] idle 2029 postgres 200 17G 1128M sleep 0:00 0.00% 0.00% postgres: words words_ru [local] idle 2104 postgres 200 17G 948M sleep 0:00 0.00% 0.00% postgres: words words_ru [local] idle 2106 postgres 200 17G 1257M sleep 0:00 0.00% 0.00% postgres: words words_ru [local] idle 1182 postgres 200 17G 231M sleep 0:00 0.00% 0.00% postgres: checkpointer 1183 postgres 200 17G 135M sleep 0:00 0.00% 0.00% postgres: background writer 1185 postgres 200 17G 8696K sleep 0:00 0.00% 0.00% postgres: autovacuum launcher 2614 postgres 200 17G 22M sleep 0:00 0.00% 0.00% postgres: words words_ru [local] idle 1187 postgres 200 17G 6764K sleep 0:00 0.00% 0.00% postgres: logical replication launcher On Sat, Feb 20, 2021 at 4:15 PM Alexander Farber wrote: > Thank you all, I will try at first > > shared_buffers = 16GBand > > index on words_scores(uid, length(word) desc) > > >
Re: Please help to improve SELECT JOIN performance, EXPLAIN ANALYZE attached
Thank you all, I will try at first shared_buffers = 16GBand index on words_scores(uid, length(word) desc)
Re: Please help to improve SELECT JOIN performance, EXPLAIN ANALYZE attached
On 2021-02-20 06:43:28 -0700, Michael Lewis wrote: > Using auto_explain may be needed to catch the plan when it is slow, if the > other suggestions don't resolve things for you. Yes, auto_explain is very helpful. Unfortunately, by default it only does an explain, not the equivalent of an explain analyze. So you know what the plan was and what the total time was, but you still don't know where it spent that time. You can set auto_explain.log_analyze = on, but the documentation warns that "this can have an extremely negative impact on performance". Does anybody have experience with trying this in a real-world workload? (I was never brave enough) hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Please help to improve SELECT JOIN performance, EXPLAIN ANALYZE attached
Using auto_explain may be needed to catch the plan when it is slow, if the other suggestions don't resolve things for you.
Re: Please help to improve SELECT JOIN performance, EXPLAIN ANALYZE attached
On 2021-02-19 21:25:36 +0100, Alexander Farber wrote: > In the PostgreSQL log I have noticed that the duration for 2 particular > queries > have increased, especially in the evenings: [...] > One of the queries is actually a simple SELECT ... LIMIT 15 to find the 15 > longest words played by a user (they are displayed at the player profile > page). > > I have run the EXPLAIN ANALYZE here: https://explain.depesz.com/s/IcqN Well, I wouldn't call a query with 8 joins "simple". I notice two things: 1) It does a lot more than the description ("Select 15 longest words played by the user id 5") suggests. It not only selects those words, but also the games in which they were played and some information about both players. 2) The query seems to spend most of the time in https://explain.depesz.com/s/IcqN#l11, retrieving all the words ever played by that user via a parallel bitmap heap scan, only to sort them and discard all but the 15 longest. I think using an index should make that much more efficient. Therefore, three suggestions: 1) Add an index on words_scores(uid, length(word) desc). 2) If postgresql still isn't using that, isolate the core query by putting it into a CTE: with s as (select * from words_scores where uid = 5 order by length(word)) select ... from s join words_moves m ... 3) If that shifts the bottleneck towards that extra per-user info (location, photo, ...) try to cache that in the application. That probably doesn't change very often and doesn't have to be retrieved from the database every time. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Please help to improve SELECT JOIN performance, EXPLAIN ANALYZE attached
On 2021-02-20 10:24:36 +0100, Alexander Farber wrote: > The query when I try it does run in 15ms, but evening logs show the query (I > think only small percentage of it) running 1-3s. > > At the same time my CentOS 8 server with 64 GB RAM is never loaded, the load > average show by top is 0.35-0.45. > > What could be the reason, does PostgreSQL 13.2 has some scheduled task maybe? One very common reason for wildly different execution times is where the data is: Accessing data in RAM is a lot faster than accessing it on disk (even for SSDs, but especially for rotating disks). So it might be that when you executed the EXPLAIN the data just happened to be in RAM (that's almost certainly the case if you executed the same query without EXPLAIN before that) while in those cases where it was slow, the data (or a large part of it) had to be loaded from disk first. Data might not be in RAM because it wasn't needed for some time, because some other query or process which needed lots of RAM displaced it, or because the server was just rebootet. > I have only changed few lines in the postgresql.conf: I don't see shared_buffers here. That's quite small by default so you might want to increase it. The usual recommendation is to start with 25% of the memory (that would be 16 GB in your case) and then see if it gets better if decrease or increase it. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Please help to improve SELECT JOIN performance, EXPLAIN ANALYZE attached
Yes, Michael, that I have noticed too, but should have written more in my original mail. The query when I try it does run in 15ms, but evening logs show the query (I think only small percentage of it) running 1-3s. At the same time my CentOS 8 server with 64 GB RAM is never loaded, the load average show by top is 0.35-0.45. What could be the reason, does PostgreSQL 13.2 has some scheduled task maybe? I have only changed few lines in the postgresql.conf: # diff postgresql.conf.OLD postgresql.conf 64c64 < max_connections = 100 # (change requires restart) --- > max_connections = 120 # (change requires restart) 130c130 < #work_mem = 4MB # min 64kB --- > work_mem = 8MB# min 64kB 132c132 < #maintenance_work_mem = 64MB # min 1MB --- > maintenance_work_mem = 128MB # min 1MB 393c393 < #effective_cache_size = 4GB --- > effective_cache_size = 8GB 410,411c410,411 < #from_collapse_limit = 8 < #join_collapse_limit = 8 # 1 disables collapsing of explicit --- > from_collapse_limit = 24 > join_collapse_limit = 24 # 1 disables collapsing of explicit 469c469 < #log_min_messages = warning # values in order of decreasing detail: --- > log_min_messages = notice # values in order of decreasing detail: 497c497 < #log_min_duration_statement = -1 # -1 is disabled, 0 logs all statements --- > log_min_duration_statement = 1000 # -1 is disabled, 0 logs all statements And I have pgbouncer in front of the PostgreSQL: diff pgbouncer.ini.OLD pgbouncer.ini 12a13,15 > words_de = host=/tmp user=xxx password=xxx dbname=words_de > words_en = host=/tmp user=xxx password=xxx dbname=words_en > words_ru = host=/tmp user=xxx password=xxx dbname=words_ru 115a119 > ;; auth_type = md5 148c152 < ;server_reset_query = DISCARD ALL --- > server_reset_query = DISCARD ALL 156c160 < ;ignore_startup_parameters = extra_float_digits --- > ignore_startup_parameters = extra_float_digits 180c184 < ;max_client_conn = 100 --- > max_client_conn = 1000 185c189 < ;default_pool_size = 20 --- > default_pool_size = 100 Or is maybe everything ok...? The daily gzipped dump grows big: 939343358 Dec 31 01:33 words_ru-Dec.sql.gz 1221682336 Jan 31 01:33 words_ru-Jan.sql.gz 1423324283 Feb 20 01:34 words_ru-Feb.sql.gz and being an inexperienced pg admin I am a bit worried. Thank you Alex >
Re: Please help to improve SELECT JOIN performance, EXPLAIN ANALYZE attached
Your explain analyze is showing an example that runs in less than 15ms, and your logs indicate queries taking 2-3 seconds. I am missing part of the picture here. Can you clarify?