Re: Fwd: [GENERAL] Streaming replication bandwith per table

2017-07-05 Thread Maeldron T.
Thank you.

Maybe it would help, but recently I had another issue with the tables
having large arrays. I likely will redesign that part of the application,
and I’ll see if it helps as a side effect.



On Thu, Jun 22, 2017 at 5:55 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 6/21/17 22:04, Maeldron T. wrote:
> > * Logical replication is in 10.0 Beta 1. I might be oldschool but I
> > would install 10.1 or maybe 10.0.2 into production
>
> There are also other logical replication options such as pglogical and
> londiste.
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: Fwd: [GENERAL] Streaming replication bandwith per table

2017-06-21 Thread Peter Eisentraut
On 6/21/17 22:04, Maeldron T. wrote:
> * Logical replication is in 10.0 Beta 1. I might be oldschool but I
> would install 10.1 or maybe 10.0.2 into production

There are also other logical replication options such as pglogical and
londiste.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Fwd: [GENERAL] Streaming replication bandwith per table

2017-06-21 Thread Maeldron T.
On Tue, Jun 20, 2017 at 3:06 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

>
> Not easily.  You could play around with pg_xlogdump to see what's going
> on in the WAL.  But even if you figure it out, there is not much you can
> do about it.
>

I could do a lot. For example, if I could confirm what I expect, that the
upgrades on the table with arrays generates much traffic, then I could
redesign the table.


>
> Try perhaps logical replication.  That would save you the bandwidth for
> updating all the indexes at least.  It might work for you.
>

Problems:

* I can’t really "try" things on production. The was the starting point of
my email

* Logical replication is in 10.0 Beta 1. I might be oldschool but I would
install 10.1 or maybe 10.0.2 into production

* If I am right about the arrays, logical replication would no help. The
full arrays would still be  sent on each update, because it’s a single
column, and the indices on the array tables are much smaller

Yes, there would be some benefit from not transferring the indices and the
vacuum, but I’m not into random attempts on a live system.

What I could do is:

1. Dump production and restore it on a test machine

2. Log the queries on production (as plain text) after the dump was
started, maybe for a day

3. "grep" out the queries for the suspicious tables one by one

4. Replay the queries on the restored dump (on a test machine)

5. See how much WAL is written for each table (executing only the related
updates from table to table)

This, by the way, would be easier to do with logical replication. I could
use single table subscriptions, but then we are back to an earlier problem:
it’s in beta.

It will be painful, but at least now I know I have to do it as I can’t
easily read the WAL.

Thank you.

M


Re: [GENERAL] Streaming replication bandwith per table

2017-06-19 Thread Andreas Kretschmer
Am 20. Juni 2017 03:06:05 MESZ schrieb Peter Eisentraut 
:
>On 6/19/17 20:50, Maeldron T. wrote:
>> 
>
>Not easily.  You could play around with pg_xlogdump to see what's going
>on in the WAL.  But even if you figure it out, there is not much you
>can
>do about it.
>
>Try perhaps logical replication.  That would save you the bandwidth for
>updating all the indexes at least.  It might work for you.

Not only the traffic for indexes, for Vacuum too.
(and that can be a lot)

Regards, Andreas


-- 
2ndQuadrant - The PostgreSQL Support Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Streaming replication bandwith per table

2017-06-19 Thread Peter Eisentraut
On 6/19/17 20:50, Maeldron T. wrote:
> Streaming replication generates too much traffic to set it up between
> different regions for financial reasons. The streaming replication would
> cost more than every other hosting expense altogether (including every
> the traffic, even though it’s web and huge amount of emails).
> 
> Is there a way to see in the log how much bandwidth is used per table? 

Not easily.  You could play around with pg_xlogdump to see what's going
on in the WAL.  But even if you figure it out, there is not much you can
do about it.

Try perhaps logical replication.  That would save you the bandwidth for
updating all the indexes at least.  It might work for you.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Streaming replication bandwith per table

2017-06-19 Thread Maeldron T.
Hello,

tl;dr

Streaming replication generates too much traffic to set it up between
different regions for financial reasons. The streaming replication would
cost more than every other hosting expense altogether (including every the
traffic, even though it’s web and huge amount of emails).

Is there a way to see in the log how much bandwidth is used per table?

Details:

My only idea is to improve the design.

I believe the source of the issue is tables involved in many to many
relations with frequent updates.

The structure varies. Sometimes it’s:

table_a_id_1 | table_a_id_2

other times:

table_b_id_1 | table_b_id_2 | float value

and:

integer value | table_c_id


It’s simple, but the performance is a key factor.

Depending on the relation (whether it had an additional value or not), the
expected usage, and my mood, I implemented them using either classical
pairs or arrays with gin indices:

table_a_id_1 | [table_a_id_2, table_a_id_7, table_a_id_9,...]

and:

integer value | [table_a_id_1, table_a_id_5, table_a_id_6, ...]

There are millions of records in the tables with arrays. The "normal" pairs
have tens of million and above. One table could have about 40 billion
values in theory (it doesn’t but it’s growing).


I can guess which tables are problematic and what to change, but:

* It’s difficult to simulate real-life usage
* The usage patterns are different from table to table
* If I’m wrong, I waste time and resources (and downtime) to make it even
worse

I know the updates on the arrays cost much more (it’s also a performance
issue) but the table takes magnitudes less space this way. I even
considered jsonb when there are also float values for each pair.

What to change in the design depends on the real-life use. How can I
measure the bandwidth usage per table in streaming replication? I don’t see
a way to emulate it with realistic results.

M

PS: except the updates on the arrays, the performance itself is satisfying
for each table. It’s only the bandwidth usage that would hurt