Re: Fwd: [GENERAL] Streaming replication bandwith per table
Thank you. Maybe it would help, but recently I had another issue with the tables having large arrays. I likely will redesign that part of the application, and I’ll see if it helps as a side effect. On Thu, Jun 22, 2017 at 5:55 AM, Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > On 6/21/17 22:04, Maeldron T. wrote: > > * Logical replication is in 10.0 Beta 1. I might be oldschool but I > > would install 10.1 or maybe 10.0.2 into production > > There are also other logical replication options such as pglogical and > londiste. > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Fwd: [GENERAL] Streaming replication bandwith per table
On Tue, Jun 20, 2017 at 3:06 AM, Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > > Not easily. You could play around with pg_xlogdump to see what's going > on in the WAL. But even if you figure it out, there is not much you can > do about it. > I could do a lot. For example, if I could confirm what I expect, that the upgrades on the table with arrays generates much traffic, then I could redesign the table. > > Try perhaps logical replication. That would save you the bandwidth for > updating all the indexes at least. It might work for you. > Problems: * I can’t really "try" things on production. The was the starting point of my email * Logical replication is in 10.0 Beta 1. I might be oldschool but I would install 10.1 or maybe 10.0.2 into production * If I am right about the arrays, logical replication would no help. The full arrays would still be sent on each update, because it’s a single column, and the indices on the array tables are much smaller Yes, there would be some benefit from not transferring the indices and the vacuum, but I’m not into random attempts on a live system. What I could do is: 1. Dump production and restore it on a test machine 2. Log the queries on production (as plain text) after the dump was started, maybe for a day 3. "grep" out the queries for the suspicious tables one by one 4. Replay the queries on the restored dump (on a test machine) 5. See how much WAL is written for each table (executing only the related updates from table to table) This, by the way, would be easier to do with logical replication. I could use single table subscriptions, but then we are back to an earlier problem: it’s in beta. It will be painful, but at least now I know I have to do it as I can’t easily read the WAL. Thank you. M
[GENERAL] Streaming replication bandwith per table
Hello, tl;dr Streaming replication generates too much traffic to set it up between different regions for financial reasons. The streaming replication would cost more than every other hosting expense altogether (including every the traffic, even though it’s web and huge amount of emails). Is there a way to see in the log how much bandwidth is used per table? Details: My only idea is to improve the design. I believe the source of the issue is tables involved in many to many relations with frequent updates. The structure varies. Sometimes it’s: table_a_id_1 | table_a_id_2 other times: table_b_id_1 | table_b_id_2 | float value and: integer value | table_c_id It’s simple, but the performance is a key factor. Depending on the relation (whether it had an additional value or not), the expected usage, and my mood, I implemented them using either classical pairs or arrays with gin indices: table_a_id_1 | [table_a_id_2, table_a_id_7, table_a_id_9,...] and: integer value | [table_a_id_1, table_a_id_5, table_a_id_6, ...] There are millions of records in the tables with arrays. The "normal" pairs have tens of million and above. One table could have about 40 billion values in theory (it doesn’t but it’s growing). I can guess which tables are problematic and what to change, but: * It’s difficult to simulate real-life usage * The usage patterns are different from table to table * If I’m wrong, I waste time and resources (and downtime) to make it even worse I know the updates on the arrays cost much more (it’s also a performance issue) but the table takes magnitudes less space this way. I even considered jsonb when there are also float values for each pair. What to change in the design depends on the real-life use. How can I measure the bandwidth usage per table in streaming replication? I don’t see a way to emulate it with realistic results. M PS: except the updates on the arrays, the performance itself is satisfying for each table. It’s only the bandwidth usage that would hurt
Re: [GENERAL] FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
Hello Thomas, > > (Maybe database clusters should have a header that wouldn’t allow > > incompatible server versions to process the existing data. I wonder if it > > would take more than 8 bytes per server. But I guess it was not know to > be > > incompatible. Even my two CIs didn’t show it.) > > I had some thoughts along those lines too[3]. I thought about > checksumming libc and all relevant collation files (ie OS specific > files) so you could notice when anything that could bite you changes > (that is just some sketched out ideas, not production code). Some > have suggested that PostgreSQL should stop using OS collations and > adopt ICU[4] and then use its versioning metadata. Of course some > people already use ICU on FreeBSD because the old strcoll > implementation didn't play well with Unicode, but those users are > still presumably exposed to silent corruption when ICU changes because > AFAIK that didn't keep track of ICU versions. > > I have been using PostgreSQL for 12 years. *Every* issue that I personally ever had was locale-related. I define "issue" as not totally my fault and it took more than a hour to fix. That’s why I was a bit sarcastic with my comments about the year 2016. In the 80’s I have grown up on sci-fi movies about spaceships make wormholes to travel to the unknown parts of the Universe, even capable of shifting dimensions, but today, one of the biggest issue in IT is when one wants to use UTF-8. I think I spent more time on fixing locale related issues than on fixing all the rest together. (Not counting the bugs in my own code.) Once I tried DragonflyBSD. It could not compile ICU so I deleted it. I used to run PostgreSQL with ICU on FreeBSD, but I switched from ports to binary packages. Mixing them isn’t fun. (Well, currently I have it mixed.) Anyway, ICU is turned on for PostgreSQL 9.6 even in the pkg version. Hurray. M
[GENERAL] Re: FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
I forgot to add that when I turned off index scan, select lower() still didn’t find some existing rows. On Sat, Dec 3, 2016 at 2:52 AM, Maeldron T. <maeld...@gmail.com> wrote: > Hello, > > I’ll be as short as I can as I have broken my arm and I’m not supposed to > type. This isn’t a joke. > > However, I learned the following the hard way and I want to save you from > the same. > > Upgrading FreeBSD from 10 to 11 might break your database. It probably > won’t be corrupted but it will be useless until you dump-import it, and you > might need to do manual fixing. > > My app has more than 1000 automated tests, and the upgrade itself was > explicitly tested. The affected columns are tested many times. It was > tested on two different machines (including the 10 => 11) before done in > production. But the issue happens only at random on large scale. I could > not reproduce it with inserting a few rows. I could reproduce it with real > data. > > I didn’t debug much as I did not sleep for two days until I fixed it (live > systems, with left hand). > > I removed noise from queries, the real tables have dozens of not null > columns. The edited queries may have syntax errors, but they were copied > from real world output. So were the errors. I use 'xxx' and '123' to hide > private info but the strings are real in general. This matters as the issue > might be locale related. > > I started seeing these in the logs: > > PG::UniqueViolation: ERROR: duplicate key value violates unique constraint > "index_users_unique_on_lower_email" > DETAIL: Key (lower(email::text))=(andy.m...@xxx.com > <andy.moore...@gmail.com>) already exists. > : UPDATE "users" SET "online_at" = '2016-12-01 00:23:33.245594', > "visible_online_at" = '2016-12-01 00:23:33.245633' WHERE "users"."id" = $1 > > PG::UniqueViolation: ERROR: duplicate key value violates unique constraint > "index_users_unique_on_lower_username" > DETAIL: Key (lower(username::text))=(joyce1234) already exists. > : UPDATE "users" SET "online_at" = '2016-11-30 22:11:37.167589', > "visible_online_at" = '2016-11-30 22:11:37.167647' WHERE "users"."id" = $1 > > The table: > > # \d users > Column | Type | > Modifiers > -+--+--- > - > id | integer | not null default > nextval('users_id_seq'::regclass) > username| character varying| not null > password| character varying| not null > email | character varying| not null > Indexes: > "users_pkey" PRIMARY KEY, btree (id) > "index_users_unique_on_lower_email" UNIQUE, btree (lower(email::text)) > "index_users_unique_on_lower_username" UNIQUE, btree > (lower(username::text)) > > > Testing: > > test2=# insert into users (password, email, username) values ('a', > 'aa', 'Ági'); > INSERT 0 1 > > test2=# select id, username from users where lower(username) = 'ági'; >id | username > +-- > 123 | Ági > (1 row) > > > test2=# select id, username from users where username = 'Ági'; >id | username > +-- > 123 | Ági > 456 | Ági > (2 rows) > > > test2=# select id, username from users where username = 'Mustafa'; > > id | username > ---+-- > 123 | Mustafa > (1 row) > > test2=# insert into users (password, email, username) values ('a', > 'aab', 'Mustafa'); > INSERT 0 1 > > test2=# select id, username from users where username = 'Mustafa'; > >id | username > +-- > > 123 | Mustafa > 456 | Mustafa > (2 rows) > > > > test2=# select id, username from users where username = 'bunny'; > id | username > --+-- > 123 | bunny > (1 row) > > > test2=# insert into users (password, email, username) values ('a', > 'aac', 'bunny'); > INSERT 0 1test2=# select id, username from users where username = 'bunny'; >id | username > +-- >123 | bunny > 456 | bunny > (2 rows) > > test2=# select id, username from users where username = 'edwin'; > id | username > ---+-- > 123 | edwin > (1 row) > > test2=# insert into users (password, email, username) values ('a', > 'aad', 'edwin'); > INSERT 0 1 > > test2=# select id, username from users where username = 'edwin'; >id | username > +
[GENERAL] FreeBSD 10 => 11: Dump and reload your PostgreSQL database unless you like it broken
Hello, I’ll be as short as I can as I have broken my arm and I’m not supposed to type. This isn’t a joke. However, I learned the following the hard way and I want to save you from the same. Upgrading FreeBSD from 10 to 11 might break your database. It probably won’t be corrupted but it will be useless until you dump-import it, and you might need to do manual fixing. My app has more than 1000 automated tests, and the upgrade itself was explicitly tested. The affected columns are tested many times. It was tested on two different machines (including the 10 => 11) before done in production. But the issue happens only at random on large scale. I could not reproduce it with inserting a few rows. I could reproduce it with real data. I didn’t debug much as I did not sleep for two days until I fixed it (live systems, with left hand). I removed noise from queries, the real tables have dozens of not null columns. The edited queries may have syntax errors, but they were copied from real world output. So were the errors. I use 'xxx' and '123' to hide private info but the strings are real in general. This matters as the issue might be locale related. I started seeing these in the logs: PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_users_unique_on_lower_email" DETAIL: Key (lower(email::text))=(andy.m...@xxx.com) already exists. : UPDATE "users" SET "online_at" = '2016-12-01 00:23:33.245594', "visible_online_at" = '2016-12-01 00:23:33.245633' WHERE "users"."id" = $1 PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_users_unique_on_lower_username" DETAIL: Key (lower(username::text))=(joyce1234) already exists. : UPDATE "users" SET "online_at" = '2016-11-30 22:11:37.167589', "visible_online_at" = '2016-11-30 22:11:37.167647' WHERE "users"."id" = $1 The table: # \d users Column | Type | Modifiers -+--+ id | integer | not null default nextval('users_id_seq'::regclass) username| character varying| not null password| character varying| not null email | character varying| not null Indexes: "users_pkey" PRIMARY KEY, btree (id) "index_users_unique_on_lower_email" UNIQUE, btree (lower(email::text)) "index_users_unique_on_lower_username" UNIQUE, btree (lower(username::text)) Testing: test2=# insert into users (password, email, username) values ('a', 'aa', 'Ági'); INSERT 0 1 test2=# select id, username from users where lower(username) = 'ági'; id | username +-- 123 | Ági (1 row) test2=# select id, username from users where username = 'Ági'; id | username +-- 123 | Ági 456 | Ági (2 rows) test2=# select id, username from users where username = 'Mustafa'; id | username ---+-- 123 | Mustafa (1 row) test2=# insert into users (password, email, username) values ('a', 'aab', 'Mustafa'); INSERT 0 1 test2=# select id, username from users where username = 'Mustafa'; id | username +-- 123 | Mustafa 456 | Mustafa (2 rows) test2=# select id, username from users where username = 'bunny'; id | username --+-- 123 | bunny (1 row) test2=# insert into users (password, email, username) values ('a', 'aac', 'bunny'); INSERT 0 1test2=# select id, username from users where username = 'bunny'; id | username +-- 123 | bunny 456 | bunny (2 rows) test2=# select id, username from users where username = 'edwin'; id | username ---+-- 123 | edwin (1 row) test2=# insert into users (password, email, username) values ('a', 'aad', 'edwin'); INSERT 0 1 test2=# select id, username from users where username = 'edwin'; id | username +-- 123 | edwin 456 | edwin (2 rows) test2=# insert into users (password, email, username) values ('a', 'aae', 'edwin'); ERROR: duplicate key value violates unique constraint "index_users_unique_on_lower_username" DETAIL: Key (lower(username::text))=(edwin) already exists. test2=# insert into users (password, email, username) values ('a', 'aae', 'bunny'); ERROR: duplicate key value violates unique constraint "index_users_unique_on_lower_username" DETAIL: Key (lower(username::text))=(bunny) already exists. The issue didn’t happen only with strings that have characters like 'é', 'ő'. English names were also affected. The application does have validation on the strings but I didn’t see the existing versions for the same reason the insert didn’t see. *Sometimes* Also, the colliding (not colliding) strings usually had the same case of characters. (Not 'edwin' vs 'Edwin' but two 'edwin's) Sometimes only the latter triggered the violation:
[GENERAL] Google Cloud Compute
Hello, I’m considering moving my servers to Google. The main reason is the transparent encryption they offer. This means I should either move all or none. The former would include PostreSQL, specifially: FreeBSD + ZFS + PostgreSQL. Do you have any pros or cons based on experience? (Would fsync work as it is supposed to work?) Yesterday I run a few "benchmarks", in case loading dumps and running vacuum qualifies, and there was little difference between the peristent SSD and the local SSD storage. Both were set up as RAID 1 (ZFS mirror). Later I managed to read the documentation to learn the persistent SSD is already redundant hence I can’t justify the price and the limitations of the local SSD storage for my current needs. However, the persistent storage is network-based. May I use it and sleep well? (If the night is silent otherwise.) Are the so-called migration events ever happen? M -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Uber migrated from Postgres to MySQL
On 26/07/16 19:39, Guyren Howe wrote: Honestly, I've never heard of anyone doing that. But it sounds like they had good reasons. https://eng.uber.com/mysql-migration/ Thoughts? 1. Open the page 2. Press Cmd-F on Mac, Ctrl-F on Linux/Windows. (Find on page) 3. Type "transaction" in the search field, without the quotes 4. Scroll through the article and look for the highlights There are 17 highlights. I count 2 in the MySQL part. Chances are good that the rest 15 are in the PostgreSQL part. It tells a lot. When I am told that MySQL supports transactions I face the fact that the word "transaction" must have at least two meanings and no matter what I do I know only one of them. Every time I had to work with MySQL I felt overwhelming sloppiness. I can’t (or don’t want to) deal with accepting 0 as NULL but only once, auto typecasting pianos to cats (take it as a metaphor), committing a "transaction" (in MySQL terms), without sending commit, on client disconnect. (Older version). One can say it can’t be that bad as Facebook and now Uber are using it. The same logic tells that junk food is the best for humans. In the last few years I tried out more or less every hyped schemaless databases. Not for their main feature as my data like like the rest of the data in the Universe can be put in a schema. I did it because faced some of the issues mentioned in the article and other issues that aren’t mentioned, even on smaller scale, that’s why. (Smaller scale means smaller company, less resources, less people. At the end it hurts the same way.) I still don’t see how I could live without transactions, and not only because a simple and intentional rollback saved me from much coding and complexity with one the most important features of my application. But having a single update statement modified about the 70% of the records before Cassandra crashed is not for me. It tried to repair about 1000 records, using 3 nodes, for 1 or 2 hours before I deleted the test cluster. Maybe I did it wrong. Or probably. It can’t be that bad. People at Uber probably know more about the internals than I ever will. I also know that a few big companies had about 1 day long downtimes thanx to MongoDB and CouchDB. Since I know that people who are way more professional than me decide to use a database engine in production that doesn’t tell you whether it could store your data or not, I don’t care who is doing what and what is on his business card. And yes, I hate upgrading PostgreSQL especially on FreeBSD where pg_upgrade isn’t really an option. Sometimes the answer is manual or full vacuum, no matter what the manual says. (And yes, the downtime does hurt.) On the other hand, if there was a stable and officially integrated logical replication that supports multi-master setups, many of the issues would just be gone. Upgrades wouldn’t be painful anymore, timeline and pg_rewind bugs wouldn’t matter, and the DBA could remove the bloat form the masters one by one by doing that thing in the night when no one sees it. (Until the full-vacuum-police would find him and come.) M. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] upsert with trigger (or rule)
On 19/07/16 23:45, Adrian Klaver wrote: To be more complete it would nice to see the schema definition for the table messages. Also maybe some idea of what you the code is supposed to do. If I understand it correctly: 1) Check if a message is a draft. 2) Check if there is a uniqueness conflict with an existing (sender_id, recipient_id) combination 3) If 1) and 2) are true then UPDATE the fields body and updated_at of the existing record with the NEW.body and NEW.updated_at data. The table and the triggers altogether are huge. Anyway, you got it right, the point is that the recipient_id and sender_id must be unique only if the message is a draft (autosave feature). But I’m looking for a general solution as there are other tables and other cases when "converting" the insert to upsert would be awesome. Let’s say that a part of the system will be accessed by another ORM to increase the performance. (Hello Rust). Only a DB-level solution requires no code duplication. Accessing the DB with multiple ORMs isn’t uncommon on large scale so this might be not only my issue. This alone doesn’t prove that it’s not possible. The value returned by "returning id" might be set or read from the existing or inserted record in some way. By returning NULL you said the original INSERT never happened and nothing is returned, so no id is returned. The embedded INSERT happens outside the scope of the trigger. I see your point and you probably are right. In theory though, it’s possible that there is a solution to manually store the returning id. I mean maybe a system table or so. Postgresql knows what the client asked to return. This must be somewhere. It sounds hackish though. It hasn’t be a trigger at all. I’m 99% sure I could make it work by using a rule and a view or parent or a child table (to avoid the endless recursion). However, these together aren’t less complicated than doing it through the ORM. The rule isn’t dynamic (regarding the changes in the columns) which makes the solution problematic. The trigger is simple. It allows adding new columns to the table without rewriting the trigger. And it works, except that the ORM has no idea about the new record’s id. It’s possible to find the record by the unique colums, however, the whole point is not modifying the ORM at all. M. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] upsert with trigger (or rule)
Hello Adrian, On 19/07/16 21:27, Adrian Klaver wrote: On 07/19/2016 11:56 AM, Maeldron T. wrote: By returning NULL in your function you are skipping the original INSERT. I know that, I wrote it. I am sure exactly what you are trying to achieve, In general, there are the following conditions: The application uses ORM. The ORM generates inserts. In this case it’s Ruby on Rails / ActiveRecord, but it could be anything else. The business logic requires uniqueness in certain cases. Upsert is finally there in PostgreSQL. It’s possible to tell the application to send 'upsert' (on conflict do), however, it works by turning off the ORM and build the query. To keep the advantages of the ORM, one has to build a library to generate the queries. Actually I did it, and it works fine, but there are cases when overwriting the insert into upsert in the database is simpler, more elegant, more efficient and more stable. but it would seem the thing to is check for the uniqueness of (sender_id, recipient_id) in your function and then modify the original INSERT row as needed and then RETURN it as NEW. I don’t see how that would help. Except if you mean deleting the old record, which is neither efficient nor safe in race conditions. This works fine. As far as I see adding new columns to messages table won’t require updating the procedure. Which is great. There is one issue though. The orm sends 'insert into messages returning id'. As the original insert is skipped, the id, which is a serial, is not returned, so the orm can’t see the new/updated record's id. Is there a way to make the 'returning id' part work? Not that I know of: https://www.postgresql.org/docs/9.5/static/plpgsql-trigger.html "A trigger function must return either NULL or a record/row value having exactly the structure of the table the trigger was fired for." This alone doesn’t prove that it’s not possible. The value returned by "returning id" might be set or read from the existing or inserted record in some way. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] upsert with trigger (or rule)
Hello, I’m trying to rewrite inserts to upserts on a table when a certain column has a certain value. Reason: the inserts are coming from an ORM. It’s possible to send upsert from the ORM, however, in this case I find it more elegant and future-proof to deal with it at the DB level. First attempt: create rule messages_insert_draft as on insert to messages where new.is_draft do instead insert into messages values (new.*) on conflict (sender_id, recipient_id) where is_draft do update set body = excluded.body, updated_at = excluded.updated_at; This has two disadvantages: 1. It doesnt work because of the endless recursion. Is there a way to deal with the recursion without adding another column to the table? 2. Every time the table’s sctructure changes the rule has to be updated too. With trigger: create function trigger_messages_insert_draft() returns trigger as $$ begin insert into messages values (new.*) on conflict (sender_id, recipient_id) where is_draft do update set body = excluded.body, updated_at = excluded.updated_at; return null; end; $$ language plpgsql; create trigger messages_before_insert_draft before insert on messages for each row when ( new.is_draft and pg_trigger_depth() = 0 ) execute procedure trigger_messages_insert_draft(); This works fine. As far as I see adding new columns to messages table won’t require updating the procedure. Which is great. There is one issue though. The orm sends 'insert into messages returning id'. As the original insert is skipped, the id, which is a serial, is not returned, so the orm can’t see the new/updated record's id. Is there a way to make the 'returning id' part work? M. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] BSD initdb without ICU support and switch later
Hello, the ICU patch isn’t ready for PostgreSQL on FreeBSD. Is there any risk (more than 0) in executing the initdb without ICU support and recompiling PostgreSQL later when the ICU patch is ready? I mean any risk without making a dump and import before the switch. If this is okay for sure, what should I do later when the ICU is available? Do I have to reindex everything with the ICU patched database? Thank you. M. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Failback to old master
Hello, I swear I have read a couple of old threads. Yet I am not sure if it safe to failback to the old master in case of async replication without base backup. Considering: I have the latest 9.3 server A: master B: slave B is actively connected to A I shut down A manually with -m fast (it's the default FreeBSD init script setting) I remove the recovery.conf from B I restart B I create a recovery.conf on A I start A I see nothing wrong in the logs I go for a lunch I shut down B I remove the recovery.conf on A I restart A I restore the recovery.conf on B I start B I see nothing wrong in the logs and I see that replication is working Can I say that my data is safe in this case? If the answer is yes, is it safe to do this if there was a power outage on A instead of manual shutdown? Considering that the log says nothing wrong. (Of course if it complains I'd do base backup from B). Thank you, M.