Hepri commented on PR #711:
URL: https://github.com/apache/guacamole-client/pull/711#issuecomment-1087411325
Fixed commit message.
Currently requests accepts list of users, which usually means that number of
passed user names is considered as small.
As for N unique subqueries, it's not true actually - postgres using `Index
Only Scan` at final step of step and it works as usual join.
I did some tests on real database, 963k users, 17.5m user_history records.
Here is the plan for selecting all users with left join + group by + max
```
Finalize GroupAggregate (cost=14831818.70..16921374.20 rows=17301844
width=1373) (actual time=42046.398..45327.396 rows=963527 loops=1)
" Group Key: guacamole_user.user_id, guacamole_entity.entity_id"
" Buffers: shared hit=7827722 read=133911 dirtied=5910, temp read=1025423
written=1026599"
I/O Timings: read=32732.059
-> Gather Merge (cost=14831818.70..16640219.23 rows=14418204 width=1373)
(actual time=42046.380..44885.919 rows=963527 loops=1)
Workers Planned: 2
Workers Launched: 2
" Buffers: shared hit=7827722 read=133911 dirtied=5910, temp
read=1025423 written=1026599"
I/O Timings: read=32732.059
-> Partial GroupAggregate (cost=14830818.68..14975000.72
rows=7209102 width=1373) (actual time=39179.294..40950.738 rows=321176 loops=3)
" Group Key: guacamole_user.user_id, guacamole_entity.entity_id"
" Buffers: shared hit=7827722 read=133911 dirtied=5910, temp
read=1025423 written=1026599"
I/O Timings: read=32732.059
-> Sort (cost=14830818.68..14848841.43 rows=7209102
width=1373) (actual time=38626.534..39682.340 rows=5905855 loops=3)
" Sort Key: guacamole_user.user_id,
guacamole_entity.entity_id"
Sort Method: external merge Disk: 1145272kB
Worker 0: Sort Method: external merge Disk: 639824kB
Worker 1: Sort Method: external merge Disk: 642448kB
" Buffers: shared hit=7827722 read=133911 dirtied=5910,
temp read=1025423 written=1026599"
I/O Timings: read=32732.059
-> Merge Left Join (cost=209055.40..1073389.42
rows=7209102 width=1373) (actual time=2177.709..32539.696 rows=5905855 loops=3)
Merge Cond: (guacamole_user.user_id =
guacamole_user_history.user_id)
Buffers: shared hit=7827714 read=133911
dirtied=5910
I/O Timings: read=32732.059
-> Nested Loop (cost=0.85..245764.48 rows=398933
width=1365) (actual time=0.521..1734.830 rows=321176 loops=3)
Buffers: shared hit=4422640 read=19582
dirtied=1
I/O Timings: read=2302.272
-> Parallel Index Scan using
guacamole_user_pkey on guacamole_user (cost=0.42..48764.29 rows=398933
width=1341) (actual time=0.233..608.004 rows=321176 loops=3)
Buffers: shared hit=568501 read=15572
dirtied=1
I/O Timings: read=1280.976
-> Index Scan using guacamole_entity_pkey
on guacamole_entity (cost=0.42..0.49 rows=1 width=28) (actual
time=0.003..0.003 rows=1 loops=963527)
Index Cond: (entity_id =
guacamole_user.entity_id)
Filter: (type =
'USER'::guacamole_entity_type)
Buffers: shared hit=3854139 read=4010
I/O Timings: read=1021.296
-> Index Only Scan using
guacamole_user_history_user_id_start_date on guacamole_user_history
(cost=0.56..720887.62 rows=17301844 width=12) (actual time=0.023..28557.541
rows=17506069 loops=3)
Heap Fetches: 3571677
Buffers: shared hit=3405074 read=114329
dirtied=5909
I/O Timings: read=30429.786
Planning Time: 0.319 ms
Execution Time: 45976.878 ms
```
And here is the plan for nested query
```
Group (cost=1911971.99..2510148.36 rows=957439 width=1373) (actual
time=2859.696..5722.555 rows=963531 loops=1)
" Group Key: guacamole_user.user_id, guacamole_entity.entity_id"
" Buffers: shared hit=4973236 read=4785 dirtied=18, temp read=33234
written=33302"
I/O Timings: read=537.094
-> Sort (cost=1911971.99..1914365.59 rows=957439 width=1365) (actual
time=2859.650..3068.616 rows=963531 loops=1)
" Sort Key: guacamole_user.user_id, guacamole_entity.entity_id"
Sort Method: external merge Disk: 129712kB
" Buffers: shared hit=918611 read=4472 dirtied=18, temp read=33234
written=33302"
I/O Timings: read=532.826
-> Merge Join (cost=0.99..108611.07 rows=957439 width=1365)
(actual time=0.013..1899.534 rows=963531 loops=1)
Merge Cond: (guacamole_user.entity_id =
guacamole_entity.entity_id)
Buffers: shared hit=918611 read=4472 dirtied=18
I/O Timings: read=532.826
-> Index Scan using guacamole_user_single_entity on
guacamole_user (cost=0.42..53808.28 rows=957439 width=1341) (actual
time=0.005..943.326 rows=963531 loops=1)
Buffers: shared hit=471018 read=4472 dirtied=1
I/O Timings: read=532.826
-> Index Scan using guacamole_entity_pkey on guacamole_entity
(cost=0.42..40865.32 rows=959338 width=28) (actual time=0.005..352.519
rows=963531 loops=1)
Filter: (type = 'USER'::guacamole_entity_type)
Buffers: shared hit=447593 dirtied=17
SubPlan 1
-> Limit (cost=0.56..0.62 rows=1 width=8) (actual time=0.002..0.002
rows=1 loops=963531)
Buffers: shared hit=4054625 read=313
I/O Timings: read=4.268
-> Index Only Scan Backward using
guacamole_user_history_user_id_start_date on guacamole_user_history
(cost=0.56..35.83 rows=644 width=8) (actual time=0.002..0.002 rows=1
loops=963531)
Index Cond: (user_id = guacamole_user.user_id)
Heap Fetches: 30627
Buffers: shared hit=4054625 read=313
I/O Timings: read=4.268
Planning Time: 0.288 ms
Execution Time: 5796.778 ms
```
And there is the example on selecting 10 users:
Left join
```
HashAggregate (cost=337.53..339.34 rows=181 width=1373) (actual
time=0.263..0.267 rows=10 loops=1)
" Group Key: guacamole_user.user_id, guacamole_entity.entity_id"
Buffers: shared hit=126
-> Nested Loop Left Join (cost=1.41..336.17 rows=181 width=1373) (actual
time=0.033..0.210 rows=258 loops=1)
Buffers: shared hit=126
-> Nested Loop (cost=0.85..148.63 rows=10 width=1365) (actual
time=0.024..0.114 rows=10 loops=1)
Buffers: shared hit=80
-> Index Scan using guacamole_entity_name_scope on
guacamole_entity (cost=0.42..64.20 rows=10 width=28) (actual time=0.017..0.081
rows=10 loops=1)
" Index Cond: ((type = 'USER'::guacamole_entity_type) AND
((name)::text = ANY
('{user1,user2,user3,user4,user5,user6,user7,user8,user9,user10}'::text[])))"
Buffers: shared hit=39
-> Index Scan using guacamole_user_single_entity on
guacamole_user (cost=0.42..8.44 rows=1 width=1341) (actual time=0.002..0.002
rows=1 loops=10)
Index Cond: (entity_id = guacamole_entity.entity_id)
Buffers: shared hit=41
-> Index Only Scan using guacamole_user_history_user_id_start_date
on guacamole_user_history (cost=0.56..12.31 rows=644 width=12) (actual
time=0.003..0.006 rows=26 loops=10)
Index Cond: (user_id = guacamole_user.user_id)
Heap Fetches: 0
Buffers: shared hit=46
Planning Time: 0.329 ms
Execution Time: 0.302 ms
```
Nested join
```
Group (cost=148.79..155.04 rows=10 width=1373) (actual time=0.176..0.213
rows=10 loops=1)
" Group Key: guacamole_user.user_id, guacamole_entity.entity_id"
Buffers: shared hit=124
-> Sort (cost=148.79..148.82 rows=10 width=1365) (actual
time=0.163..0.165 rows=10 loops=1)
" Sort Key: guacamole_user.user_id, guacamole_entity.entity_id"
Sort Method: quicksort Memory: 27kB
Buffers: shared hit=80
-> Nested Loop (cost=0.85..148.63 rows=10 width=1365) (actual
time=0.034..0.137 rows=10 loops=1)
Buffers: shared hit=80
-> Index Scan using guacamole_entity_name_scope on
guacamole_entity (cost=0.42..64.20 rows=10 width=28) (actual time=0.025..0.097
rows=10 loops=1)
" Index Cond: ((type = 'USER'::guacamole_entity_type) AND
((name)::text = ANY
('{user1,user2,user3,user4,user5,user6,user7,user8,user9,user10}'::text[])))"
Buffers: shared hit=39
-> Index Scan using guacamole_user_single_entity on
guacamole_user (cost=0.42..8.44 rows=1 width=1341) (actual time=0.003..0.003
rows=1 loops=10)
Index Cond: (entity_id = guacamole_entity.entity_id)
Buffers: shared hit=41
SubPlan 1
-> Limit (cost=0.56..0.62 rows=1 width=8) (actual time=0.004..0.004
rows=1 loops=10)
Buffers: shared hit=44
-> Index Only Scan Backward using
guacamole_user_history_user_id_start_date on guacamole_user_history
(cost=0.56..35.83 rows=644 width=8) (actual time=0.004..0.004 rows=1 loops=10)
Index Cond: (user_id = guacamole_user.user_id)
Heap Fetches: 0
Buffers: shared hit=44
Planning Time: 0.244 ms
Execution Time: 0.253 ms
```
As you can see cost is bigger in case of left join in both of cases
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]