In our sharded cluster project we are trying to use logical relication for 
providing HA (maintaining redundant shard copies).
Using asynchronous logical replication has not so much sense in context of HA. 
This is why we try to use synchronous logical replication.
Unfortunately it shows very bad performance. With 50 shards and level of 
redundancy=1 (just one copy) cluster is 20 times slower then without logical 
replication.
With asynchronous replication it is "only" two times slower.

As far as I understand, the reason of such bad performance is that synchronous 
replication mechanism was originally developed for streaming replication, when 
all replicas have the same content and LSNs. When it is used for logical 
replication, it behaves very inefficiently. Commit has to wait confirmations 
from all receivers mentioned in "synchronous_standby_names" list. So we are 
waiting not only for our own single logical replication standby, but all other 
standbys as well. Number of synchronous standbyes is equal to number of shards 
divided by number of nodes. To provide uniform distribution number of shards 
should >> than number of nodes, for example for 10 nodes we usually create 100 
shards. As a result we get awful performance and blocking of any replication 
channel blocks all backends.

So my question is whether my understanding is correct and synchronous logical 
replication can not be efficiently used in such manner.
If so, the next question is how difficult it will be to make synchronous 
replication mechanism for logical replication more efficient and are there some 
plans to  work in this direction?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to