[GitHub] [guacamole-client] Hepri commented on pull request #711: GUACAMOLE-1488: Slow selection of users on scale on Postgres

GitBox Mon, 04 Apr 2022 04:00:13 -0700


Hepri commented on PR #711:
URL: https://github.com/apache/guacamole-client/pull/711#issuecomment-1087411325


   Fixed commit message.
   
   Currently requests accepts list of users, which usually means that number of 
passed user names is considered as small. 
   As for N unique subqueries, it's not true actually - postgres using `Index 
Only Scan` at final step of step and it works as usual join.
   I did some tests on real database, 963k users, 17.5m user_history records.
   
   Here is the plan for selecting all users with left join + group by + max
   
   ```
   Finalize GroupAggregate  (cost=14831818.70..16921374.20 rows=17301844 
width=1373) (actual time=42046.398..45327.396 rows=963527 loops=1)
   "  Group Key: guacamole_user.user_id, guacamole_entity.entity_id"
   "  Buffers: shared hit=7827722 read=133911 dirtied=5910, temp read=1025423 
written=1026599"
     I/O Timings: read=32732.059
     ->  Gather Merge  (cost=14831818.70..16640219.23 rows=14418204 width=1373) 
(actual time=42046.380..44885.919 rows=963527 loops=1)
           Workers Planned: 2
           Workers Launched: 2
   "        Buffers: shared hit=7827722 read=133911 dirtied=5910, temp 
read=1025423 written=1026599"
           I/O Timings: read=32732.059
           ->  Partial GroupAggregate  (cost=14830818.68..14975000.72 
rows=7209102 width=1373) (actual time=39179.294..40950.738 rows=321176 loops=3)
   "              Group Key: guacamole_user.user_id, guacamole_entity.entity_id"
   "              Buffers: shared hit=7827722 read=133911 dirtied=5910, temp 
read=1025423 written=1026599"
                 I/O Timings: read=32732.059
                 ->  Sort  (cost=14830818.68..14848841.43 rows=7209102 
width=1373) (actual time=38626.534..39682.340 rows=5905855 loops=3)
   "                    Sort Key: guacamole_user.user_id, 
guacamole_entity.entity_id"
                       Sort Method: external merge  Disk: 1145272kB
                       Worker 0:  Sort Method: external merge  Disk: 639824kB
                       Worker 1:  Sort Method: external merge  Disk: 642448kB
   "                    Buffers: shared hit=7827722 read=133911 dirtied=5910, 
temp read=1025423 written=1026599"
                       I/O Timings: read=32732.059
                       ->  Merge Left Join  (cost=209055.40..1073389.42 
rows=7209102 width=1373) (actual time=2177.709..32539.696 rows=5905855 loops=3)
                             Merge Cond: (guacamole_user.user_id = 
guacamole_user_history.user_id)
                             Buffers: shared hit=7827714 read=133911 
dirtied=5910
                             I/O Timings: read=32732.059
                             ->  Nested Loop  (cost=0.85..245764.48 rows=398933 
width=1365) (actual time=0.521..1734.830 rows=321176 loops=3)
                                   Buffers: shared hit=4422640 read=19582 
dirtied=1
                                   I/O Timings: read=2302.272
                                   ->  Parallel Index Scan using 
guacamole_user_pkey on guacamole_user  (cost=0.42..48764.29 rows=398933 
width=1341) (actual time=0.233..608.004 rows=321176 loops=3)
                                         Buffers: shared hit=568501 read=15572 
dirtied=1
                                         I/O Timings: read=1280.976
                                   ->  Index Scan using guacamole_entity_pkey 
on guacamole_entity  (cost=0.42..0.49 rows=1 width=28) (actual 
time=0.003..0.003 rows=1 loops=963527)
                                         Index Cond: (entity_id = 
guacamole_user.entity_id)
                                         Filter: (type = 
'USER'::guacamole_entity_type)
                                         Buffers: shared hit=3854139 read=4010
                                         I/O Timings: read=1021.296
                             ->  Index Only Scan using 
guacamole_user_history_user_id_start_date on guacamole_user_history  
(cost=0.56..720887.62 rows=17301844 width=12) (actual time=0.023..28557.541 
rows=17506069 loops=3)
                                   Heap Fetches: 3571677
                                   Buffers: shared hit=3405074 read=114329 
dirtied=5909
                                   I/O Timings: read=30429.786
   Planning Time: 0.319 ms
   Execution Time: 45976.878 ms
   ```
   
   And here is the plan for nested query
   
   
   ```
   Group  (cost=1911971.99..2510148.36 rows=957439 width=1373) (actual 
time=2859.696..5722.555 rows=963531 loops=1)
   "  Group Key: guacamole_user.user_id, guacamole_entity.entity_id"
   "  Buffers: shared hit=4973236 read=4785 dirtied=18, temp read=33234 
written=33302"
     I/O Timings: read=537.094
     ->  Sort  (cost=1911971.99..1914365.59 rows=957439 width=1365) (actual 
time=2859.650..3068.616 rows=963531 loops=1)
   "        Sort Key: guacamole_user.user_id, guacamole_entity.entity_id"
           Sort Method: external merge  Disk: 129712kB
   "        Buffers: shared hit=918611 read=4472 dirtied=18, temp read=33234 
written=33302"
           I/O Timings: read=532.826
           ->  Merge Join  (cost=0.99..108611.07 rows=957439 width=1365) 
(actual time=0.013..1899.534 rows=963531 loops=1)
                 Merge Cond: (guacamole_user.entity_id = 
guacamole_entity.entity_id)
                 Buffers: shared hit=918611 read=4472 dirtied=18
                 I/O Timings: read=532.826
                 ->  Index Scan using guacamole_user_single_entity on 
guacamole_user  (cost=0.42..53808.28 rows=957439 width=1341) (actual 
time=0.005..943.326 rows=963531 loops=1)
                       Buffers: shared hit=471018 read=4472 dirtied=1
                       I/O Timings: read=532.826
                 ->  Index Scan using guacamole_entity_pkey on guacamole_entity 
 (cost=0.42..40865.32 rows=959338 width=28) (actual time=0.005..352.519 
rows=963531 loops=1)
                       Filter: (type = 'USER'::guacamole_entity_type)
                       Buffers: shared hit=447593 dirtied=17
     SubPlan 1
       ->  Limit  (cost=0.56..0.62 rows=1 width=8) (actual time=0.002..0.002 
rows=1 loops=963531)
             Buffers: shared hit=4054625 read=313
             I/O Timings: read=4.268
             ->  Index Only Scan Backward using 
guacamole_user_history_user_id_start_date on guacamole_user_history  
(cost=0.56..35.83 rows=644 width=8) (actual time=0.002..0.002 rows=1 
loops=963531)
                   Index Cond: (user_id = guacamole_user.user_id)
                   Heap Fetches: 30627
                   Buffers: shared hit=4054625 read=313
                   I/O Timings: read=4.268
   Planning Time: 0.288 ms
   Execution Time: 5796.778 ms
   ```
   
   And there is the example on selecting 10 users:
   Left join
   
   ```
   HashAggregate  (cost=337.53..339.34 rows=181 width=1373) (actual 
time=0.263..0.267 rows=10 loops=1)
   "  Group Key: guacamole_user.user_id, guacamole_entity.entity_id"
     Buffers: shared hit=126
     ->  Nested Loop Left Join  (cost=1.41..336.17 rows=181 width=1373) (actual 
time=0.033..0.210 rows=258 loops=1)
           Buffers: shared hit=126
           ->  Nested Loop  (cost=0.85..148.63 rows=10 width=1365) (actual 
time=0.024..0.114 rows=10 loops=1)
                 Buffers: shared hit=80
                 ->  Index Scan using guacamole_entity_name_scope on 
guacamole_entity  (cost=0.42..64.20 rows=10 width=28) (actual time=0.017..0.081 
rows=10 loops=1)
   "                    Index Cond: ((type = 'USER'::guacamole_entity_type) AND 
((name)::text = ANY 
('{user1,user2,user3,user4,user5,user6,user7,user8,user9,user10}'::text[])))"
                       Buffers: shared hit=39
                 ->  Index Scan using guacamole_user_single_entity on 
guacamole_user  (cost=0.42..8.44 rows=1 width=1341) (actual time=0.002..0.002 
rows=1 loops=10)
                       Index Cond: (entity_id = guacamole_entity.entity_id)
                       Buffers: shared hit=41
           ->  Index Only Scan using guacamole_user_history_user_id_start_date 
on guacamole_user_history  (cost=0.56..12.31 rows=644 width=12) (actual 
time=0.003..0.006 rows=26 loops=10)
                 Index Cond: (user_id = guacamole_user.user_id)
                 Heap Fetches: 0
                 Buffers: shared hit=46
   Planning Time: 0.329 ms
   Execution Time: 0.302 ms
   ```
   
   Nested join
   
   ```
   Group  (cost=148.79..155.04 rows=10 width=1373) (actual time=0.176..0.213 
rows=10 loops=1)
   "  Group Key: guacamole_user.user_id, guacamole_entity.entity_id"
     Buffers: shared hit=124
     ->  Sort  (cost=148.79..148.82 rows=10 width=1365) (actual 
time=0.163..0.165 rows=10 loops=1)
   "        Sort Key: guacamole_user.user_id, guacamole_entity.entity_id"
           Sort Method: quicksort  Memory: 27kB
           Buffers: shared hit=80
           ->  Nested Loop  (cost=0.85..148.63 rows=10 width=1365) (actual 
time=0.034..0.137 rows=10 loops=1)
                 Buffers: shared hit=80
                 ->  Index Scan using guacamole_entity_name_scope on 
guacamole_entity  (cost=0.42..64.20 rows=10 width=28) (actual time=0.025..0.097 
rows=10 loops=1)
   "                    Index Cond: ((type = 'USER'::guacamole_entity_type) AND 
((name)::text = ANY 
('{user1,user2,user3,user4,user5,user6,user7,user8,user9,user10}'::text[])))"
                       Buffers: shared hit=39
                 ->  Index Scan using guacamole_user_single_entity on 
guacamole_user  (cost=0.42..8.44 rows=1 width=1341) (actual time=0.003..0.003 
rows=1 loops=10)
                       Index Cond: (entity_id = guacamole_entity.entity_id)
                       Buffers: shared hit=41
     SubPlan 1
       ->  Limit  (cost=0.56..0.62 rows=1 width=8) (actual time=0.004..0.004 
rows=1 loops=10)
             Buffers: shared hit=44
             ->  Index Only Scan Backward using 
guacamole_user_history_user_id_start_date on guacamole_user_history  
(cost=0.56..35.83 rows=644 width=8) (actual time=0.004..0.004 rows=1 loops=10)
                   Index Cond: (user_id = guacamole_user.user_id)
                   Heap Fetches: 0
                   Buffers: shared hit=44
   Planning Time: 0.244 ms
   Execution Time: 0.253 ms
   ```
   
   As you can see cost is bigger in case of left join in both of cases
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [guacamole-client] Hepri commented on pull request #711: GUACAMOLE-1488: Slow selection of users on scale on Postgres

Reply via email to