Re: [GENERAL] INSERT - UPDATE throughput oscillating and SSD activity after stopping the client
Hi, I did: pkill -x postgres so it should send SIGTERM. Regards Pupillo
Re: [GENERAL] INSERT - UPDATE throughput oscillating and SSD activity after stopping the client
On Tue, Dec 6, 2016 at 2:44 AM, Tom DalPozzowrote: > Hi, > about SSD light: > > I guessed it was WAL -> actual db files data traffic. It explains why the > light stops blinking after shutting down the server (I did it via kill > command) . > Do you kill with -15 (the default) or -9? And which process, the postgres master itself or just some random child? > But if so, I expected the light to restart blinking after restarting > the server (in order to continue WAL->db activity). > The normal checkpoint is paced. So trickling out data slowly will keep the light on, but not actually stress the system. When you shutdown the system, it does a fast checkpoint. This gets the data written out as quickly as possible (since you are shutting down, it doesn't worry about interfering with performance for other users, as there are none), so once it is done you don't see the light anymore. If you do a clean shutdown (kill -15 of the postgres master) this fast checkpoint happens upon shutdown. If you do an abort (kill -9) then the fast checkpoint happens upon start-up, once recovery is finished but before the database is opened of regular use. Cheers, Jeff
Re: [GENERAL] INSERT - UPDATE throughput oscillating and SSD activity after stopping the client
Hi, about SSD light: I guessed it was WAL -> actual db files data traffic. It explains why the light stops blinking after shutting down the server (I did it via kill command) . But if so, I expected the light to restart blinking after restarting the server (in order to continue WAL->db activity). Regards 2016-12-05 20:02 GMT+01:00 Jeff Janes: > On Fri, Dec 2, 2016 at 9:40 AM, Tom DalPozzo wrote: > >> Hi, >> I've two tables, t1 and t2, both with one bigint id indexed field and one >> 256 char data field; t1 has always got 1 row, while t2 is increasing as >> explained in the following. >> >> My pqlib client countinously updates one row in t1 (every time targeting >> a different row) and inserts a new row in t2. All this in blocks of 1000 >> update-insert per commit, in order to get better performance. >> Wal_method is fsync, fsync is on, attached my conf file. >> I've a 3.8ghz laptop with evo SSD. >> >> Performance is measured every two executed blocks and related to these >> blocks. >> >> Over the first few minutes performance is around 10Krow/s then it slowly >> drops, over next few minutes to 4Krow/s, then it slowly returns high and so >> on, like a wave. >> I don't understand this behaviour. Is it normal? What does it depend on? >> > > Yes, that is normal. It is also very complicated. It depends on pretty > much everything. PostgreSQL, kernel, filesystem, IO controller, firmware, > hardware, other things going on on the computer simultaneously, etc. > > >> >> Also, when I stop the client I see the SSD light still heavily working. >> > > This is normal. It writes out critical data to a WAL log first, and then > leisurely writes out the changes to the actual data files later. In the > case of a crash, the WAL will be used to replay the data file changes which > may or may not have made it to disk. > > It would last quite a while unless I stop the postgresql server, in this >> case it suddenly stops. >> > > Do you stop postgresql with fast or immediate shutdown? > > >> If I restart the server it remains off. >> I'm wondering if it's normal. I'd like to be sure that my data are safe >> once commited. >> > > If your kernel/fs/SSD doesn't lie about syncing the data, then your data > is safe once committed. (It is possible there are bugs in PostgreSQL, of > course, but nothing you report indicates you have found one). > > If you really want to be sure that the full stack, from PostgreSQL down to > the hardware on the SSD, is crash safe, the only real way is to do some > "pull the plug" tests. > > Cheers, > > Jeff >
Re: [GENERAL] INSERT - UPDATE throughput oscillating and SSD activity after stopping the client
On Fri, Dec 2, 2016 at 9:40 AM, Tom DalPozzowrote: > Hi, > I've two tables, t1 and t2, both with one bigint id indexed field and one > 256 char data field; t1 has always got 1 row, while t2 is increasing as > explained in the following. > > My pqlib client countinously updates one row in t1 (every time targeting > a different row) and inserts a new row in t2. All this in blocks of 1000 > update-insert per commit, in order to get better performance. > Wal_method is fsync, fsync is on, attached my conf file. > I've a 3.8ghz laptop with evo SSD. > > Performance is measured every two executed blocks and related to these > blocks. > > Over the first few minutes performance is around 10Krow/s then it slowly > drops, over next few minutes to 4Krow/s, then it slowly returns high and so > on, like a wave. > I don't understand this behaviour. Is it normal? What does it depend on? > Yes, that is normal. It is also very complicated. It depends on pretty much everything. PostgreSQL, kernel, filesystem, IO controller, firmware, hardware, other things going on on the computer simultaneously, etc. > > Also, when I stop the client I see the SSD light still heavily working. > This is normal. It writes out critical data to a WAL log first, and then leisurely writes out the changes to the actual data files later. In the case of a crash, the WAL will be used to replay the data file changes which may or may not have made it to disk. It would last quite a while unless I stop the postgresql server, in this > case it suddenly stops. > Do you stop postgresql with fast or immediate shutdown? > If I restart the server it remains off. > I'm wondering if it's normal. I'd like to be sure that my data are safe > once commited. > If your kernel/fs/SSD doesn't lie about syncing the data, then your data is safe once committed. (It is possible there are bugs in PostgreSQL, of course, but nothing you report indicates you have found one). If you really want to be sure that the full stack, from PostgreSQL down to the hardware on the SSD, is crash safe, the only real way is to do some "pull the plug" tests. Cheers, Jeff
Re: [GENERAL] INSERT - UPDATE throughput oscillating and SSD activity after stopping the client
I tried to tune some parameters without appreciable changes in this behaviour. I tried to play with: checkpoint timeout wal size shared buffers commit delay checkpoijnt completion target No meaningful info found in the log file. Regards 2016-12-04 4:02 GMT+01:00 Tomas Vondra: > On Fri, 2016-12-02 at 13:45 -0800, Adrian Klaver wrote: > > > > On 12/02/2016 09:40 AM, Tom DalPozzo wrote: > > > > > > > > > Hi, > > > I've two tables, t1 and t2, both with one bigint id indexed field > > > and > > > one 256 char data field; t1 has always got 1 row, while t2 is > > > increasing as explained in the following. > > > > > > My pqlib client countinously updates one row in t1 (every time > > > targeting a different row) and inserts a new row in t2. All this in > > > blocks of 1000 update-insert per commit, in order to get better > > > performance. > > > Wal_method is fsync, fsync is on, attached my conf file. > > > I've a 3.8ghz laptop with evo SSD. > > > > > > Performance is measured every two executed blocks and related to > > > these > > > blocks. > > > > > > Over the first few minutes performance is around 10Krow/s then it > > > slowly > > > drops, over next few minutes to 4Krow/s, then it slowly returns > > > high and > > > so on, like a wave. > > > I don't understand this behaviour. Is it normal? What does it > > > depend on? > > Have you looked at the Postgres log entries that cover these > > episodes? > > > > Is there anything of interest there? > > > In particular look at checkpoints. In the config file you've changed > checkpoint_timeout, but you haven't changed max_wal_size, so my guess > is the checkpoints happen every few minutes, and run for about 1/2 the > time (thanks for completion_target=0.5). That would be consistent with > pattern of good/bad performance. > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > >
Re: [GENERAL] INSERT - UPDATE throughput oscillating and SSD activity after stopping the client
On Fri, 2016-12-02 at 13:45 -0800, Adrian Klaver wrote: > > On 12/02/2016 09:40 AM, Tom DalPozzo wrote: > > > > > > Hi, > > I've two tables, t1 and t2, both with one bigint id indexed field > > and > > one 256 char data field; t1 has always got 1 row, while t2 is > > increasing as explained in the following. > > > > My pqlib client countinously updates one row in t1 (every time > > targeting a different row) and inserts a new row in t2. All this in > > blocks of 1000 update-insert per commit, in order to get better > > performance. > > Wal_method is fsync, fsync is on, attached my conf file. > > I've a 3.8ghz laptop with evo SSD. > > > > Performance is measured every two executed blocks and related to > > these > > blocks. > > > > Over the first few minutes performance is around 10Krow/s then it > > slowly > > drops, over next few minutes to 4Krow/s, then it slowly returns > > high and > > so on, like a wave. > > I don't understand this behaviour. Is it normal? What does it > > depend on? > Have you looked at the Postgres log entries that cover these > episodes? > > Is there anything of interest there? > In particular look at checkpoints. In the config file you've changed checkpoint_timeout, but you haven't changed max_wal_size, so my guess is the checkpoints happen every few minutes, and run for about 1/2 the time (thanks for completion_target=0.5). That would be consistent with pattern of good/bad performance. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] INSERT - UPDATE throughput oscillating and SSD activity after stopping the client
On 12/02/2016 09:40 AM, Tom DalPozzo wrote: Hi, I've two tables, t1 and t2, both with one bigint id indexed field and one 256 char data field; t1 has always got 1 row, while t2 is increasing as explained in the following. My pqlib client countinously updates one row in t1 (every time targeting a different row) and inserts a new row in t2. All this in blocks of 1000 update-insert per commit, in order to get better performance. Wal_method is fsync, fsync is on, attached my conf file. I've a 3.8ghz laptop with evo SSD. Performance is measured every two executed blocks and related to these blocks. Over the first few minutes performance is around 10Krow/s then it slowly drops, over next few minutes to 4Krow/s, then it slowly returns high and so on, like a wave. I don't understand this behaviour. Is it normal? What does it depend on? Have you looked at the Postgres log entries that cover these episodes? Is there anything of interest there? Also, when I stop the client I see the SSD light still heavily working. It would last quite a while unless I stop the postgresql server, in this case it suddenly stops. If I restart the server it remains off. I'm wondering if it's normal. I'd like to be sure that my data are safe once commited. Regards Pupillo P.S.: I put this question in general questions as my concern is not if the performance is high or not. -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] INSERT - UPDATE throughput oscillating and SSD activity after stopping the client
Hi, I've two tables, t1 and t2, both with one bigint id indexed field and one 256 char data field; t1 has always got 1 row, while t2 is increasing as explained in the following. My pqlib client countinously updates one row in t1 (every time targeting a different row) and inserts a new row in t2. All this in blocks of 1000 update-insert per commit, in order to get better performance. Wal_method is fsync, fsync is on, attached my conf file. I've a 3.8ghz laptop with evo SSD. Performance is measured every two executed blocks and related to these blocks. Over the first few minutes performance is around 10Krow/s then it slowly drops, over next few minutes to 4Krow/s, then it slowly returns high and so on, like a wave. I don't understand this behaviour. Is it normal? What does it depend on? Also, when I stop the client I see the SSD light still heavily working. It would last quite a while unless I stop the postgresql server, in this case it suddenly stops. If I restart the server it remains off. I'm wondering if it's normal. I'd like to be sure that my data are safe once commited. Regards Pupillo P.S.: I put this question in general questions as my concern is not if the performance is high or not. postgresql.conf Description: Binary data -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Using UPDATE ... RETURNING in a custom SQL function, which RETURNS integer
On 12/02/2016 04:23 AM, Alexander Farber wrote: Thank you, Rob - On Fri, Dec 2, 2016 at 11:12 AM, Rob Sargent> wrote: > On Dec 2, 2016, at 2:52 AM, Alexander Farber > wrote: > > CREATE OR REPLACE FUNCTION words_unban_user( > in_uid integer) > RETURNS integer AS > $func$ > UPDATE words_users SET > banned_until = null, > banned_reason = null > WHERE uid = in_uid > RETURNING uid; -- returns the user to be notified > > $func$ LANGUAGE sql; > > words=> SELECT uid FROM words_unban_user(1); > ERROR: column "uid" does not exist > LINE 1: SELECT uid FROM words_unban_user(1); >^ > select words_unban_user(1) as uid; Your function returns an int not a table. this has worked well. However if I rewrite the same function as "language plpgsql" - then suddenly both ways of calling work: CREATE OR REPLACE FUNCTION words_unban_user( in_uid integer, OUT out_uid integer) RETURNS integer AS $func$ BEGIN UPDATE words_users SET banned_until = null, banned_reason = null WHERE uid = in_uid RETURNING uid into out_uid; END $func$ LANGUAGE plpgsql; words=> select out_uid AS uid from words_unban_user(1); uid - 1 (1 row) words=> select words_unban_user(1) AS uid; uid - 1 (1 row) I am curious, why is it so... In the SQL function you are not just using RETURNING to pop out the uid, which is not actually assigned to any output variable name but just returned as an integer. In the plpgsql case you actually assign uid to an output variable name. FYI, you can have OUT in SQL functions also: https://www.postgresql.org/docs/9.5/static/xfunc-sql.html#XFUNC-OUTPUT-PARAMETERS Regards Alex -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Using UPDATE ... RETURNING in a custom SQL function, which RETURNS integer
Thank you, Rob - On Fri, Dec 2, 2016 at 11:12 AM, Rob Sargentwrote: > > > On Dec 2, 2016, at 2:52 AM, Alexander Farber > wrote: > > > > CREATE OR REPLACE FUNCTION words_unban_user( > > in_uid integer) > > RETURNS integer AS > > $func$ > > UPDATE words_users SET > > banned_until = null, > > banned_reason = null > > WHERE uid = in_uid > > RETURNING uid; -- returns the user to be notified > > > > $func$ LANGUAGE sql; > > > > words=> SELECT uid FROM words_unban_user(1); > > ERROR: column "uid" does not exist > > LINE 1: SELECT uid FROM words_unban_user(1); > >^ > > > select words_unban_user(1) as uid; > Your function returns an int not a table. this has worked well. However if I rewrite the same function as "language plpgsql" - then suddenly both ways of calling work: CREATE OR REPLACE FUNCTION words_unban_user( in_uid integer, OUT out_uid integer) RETURNS integer AS $func$ BEGIN UPDATE words_users SET banned_until = null, banned_reason = null WHERE uid = in_uid RETURNING uid into out_uid; END $func$ LANGUAGE plpgsql; words=> select out_uid AS uid from words_unban_user(1); uid - 1 (1 row) words=> select words_unban_user(1) AS uid; uid - 1 (1 row) I am curious, why is it so... Regards Alex
Re: [GENERAL] Using UPDATE ... RETURNING in a custom SQL function, which RETURNS integer
> And finally here is the failing usage of the function : > > words=> SELECT uid FROM words_unban_user(1); > ERROR: column "uid" does not exist > LINE 1: SELECT uid FROM words_unban_user(1); > ^ > > Shouldn't you be doing:- SELECT words_unban_user(1); -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Using UPDATE ... RETURNING in a custom SQL function, which RETURNS integer
Hello, why does this fail in PostgreSQL 9.5 please? Here is my custom SQL function : CREATE OR REPLACE FUNCTION words_unban_user( in_uid integer) RETURNS integer AS $func$ UPDATE words_users SET banned_until = null, banned_reason = null WHERE uid = in_uid RETURNING uid; -- returns the user to be notified $func$ LANGUAGE sql; Here is my table: words=> TABLE words_users; uid | created | visited | ip | vip_until | grand_until |banned_until | banned_reason | win | loss | draw | elo | medals | green | red | coins -+-+-+---+---+-+-+---+-+--+--+--++---+-+--- 1 | 2016-12-02 10:33:59.0761+01 | 2016-12-02 10:36:36.3521+01 | 127.0.0.1 | | | 2016-12-09 10:34:09.9151+01 | ban user 1| 0 |0 |0 | 1500 | 0 | 0 | 0 | 0 (1 row) And finally here is the failing usage of the function : words=> SELECT uid FROM words_unban_user(1); ERROR: column "uid" does not exist LINE 1: SELECT uid FROM words_unban_user(1); ^ The background is that this is a websockets-based game and of the custom functions should return a list of user ids to be notified about changes (like player was banned, opponent has resigned, ...) In the custom plpgsql functions I use OUT parameters or return table with RETURN NEXT and it works fine. But in the above sql function this does not work... Regards Alex
Re: [GENERAL] FOR UPDATE
> Why do you want to lock these results? Because we are migrating very old pipeline where we save data in file texts. So we have decided to keep some logics inside perl code. That means to update a record: 1. we have to read/lock it 2. return the result to the application 3. the application does its business and return the new value to the database 4. we update the record. > What do you mean "empty"? If you pass the empty string, the query will > return rows that have the empty string in those columns. Note that the > NULL value is not the same as the empty string. If the function is > defined as STRICT (also spelled RETURNS NULL ON NULL INPUT), then the > function is not even called if you pass NULL arguments, and it simply > returns NULL without locking anything. By empty I mean: '' > Not necessarily ... depends on what you want to happen. It is a migration case so I was thinking to take the more safe way.. So I think the behaviour is clear to me now. Thank you. On Mon, Nov 28, 2016 at 4:37 PM, Alvaro Herrerawrote: > said assemlal wrote: > > Hello, > > > > PG: 9.4 > > CentOS 6 > > > > I am writing functions to lock results. > > > > Let's take an example: > > > > CREATE OR REPLACE FUNCTION usp_locking_my_result($1 VARCHAR(50), $2 > > VARCHAR(50)) > > RETURNS TEXT AS $$ > > SELECT value > > FROM my_table > > WHERE field1 = $1 AND field2 = $2 FOR UPDATE; > > $$ LANGUAGE SQL; > > Why do you want to lock these results? > > > What happens if one of those arguments are empty and database finds > > results? ( I think they will locked ) > > What do you mean "empty"? If you pass the empty string, the query will > return rows that have the empty string in those columns. Note that the > NULL value is not the same as the empty string. If the function is > defined as STRICT (also spelled RETURNS NULL ON NULL INPUT), then the > function is not even called if you pass NULL arguments, and it simply > returns NULL without locking anything. > > > Should I check the inputs and raise an exception if there are empty ? > > Not necessarily ... depends on what you want to happen. > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: [GENERAL] FOR UPDATE
said assemlal wrote: > Hello, > > PG: 9.4 > CentOS 6 > > I am writing functions to lock results. > > Let's take an example: > > CREATE OR REPLACE FUNCTION usp_locking_my_result($1 VARCHAR(50), $2 > VARCHAR(50)) > RETURNS TEXT AS $$ > SELECT value > FROM my_table > WHERE field1 = $1 AND field2 = $2 FOR UPDATE; > $$ LANGUAGE SQL; Why do you want to lock these results? > What happens if one of those arguments are empty and database finds > results? ( I think they will locked ) What do you mean "empty"? If you pass the empty string, the query will return rows that have the empty string in those columns. Note that the NULL value is not the same as the empty string. If the function is defined as STRICT (also spelled RETURNS NULL ON NULL INPUT), then the function is not even called if you pass NULL arguments, and it simply returns NULL without locking anything. > Should I check the inputs and raise an exception if there are empty ? Not necessarily ... depends on what you want to happen. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] FOR UPDATE
Hello, PG: 9.4 CentOS 6 I am writing functions to lock results. Let's take an example: CREATE OR REPLACE FUNCTION usp_locking_my_result($1 VARCHAR(50), $2 VARCHAR(50)) RETURNS TEXT AS $$ SELECT value FROM my_table WHERE field1 = $1 AND field2 = $2 FOR UPDATE; $$ LANGUAGE SQL; What happens if one of those arguments are empty and database finds results? ( I think they will locked ) Should I check the inputs and raise an exception if there are empty ? Thank, Saïd
Re: [GENERAL] Partial update on an postgres upsert violates constraint
Andreas Terriuswrites: > Is there any way to check whether the row already exists before checking > constraints ? I still want it to fail if it turns out to be a new row > (which would violate the not null constraint), but updates the row if it > already exists. I'm not really sure why you expect this to be able to work. The data you are supplying is insufficient to satisfy the INSERT case, so why do you think that an insert-or-update should be possible? ISTM that you could try the UPDATE first and notice whether it updates any rows or not. I'm not sure what you're going to do if it doesn't, though, since you still haven't got enough data to do an INSERT. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Partial update on an postgres upsert violates constraint
On 11/21/2016 2:32 AM, Andreas Terrius wrote: Is there any way to check whether the row already exists before checking constraints ? I still want it to fail if it turns out to be a new row (which would violate the not null constraint), but updates the row if it already exists. just do an update. if the row doesn't exist, it will fail, you then rollback the transaction or savepoint. Since if that is not possible, I would need to do a query to determine whether the row exists in the database which kinda eliminates the use of upsert. (in this case, partial upsert). in general, anything that relies on those sorts of checks will fail under concurrent loads. -- john r pierce, recycling bits in santa cruz -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Partial update on an postgres upsert violates constraint
Is there any way to check whether the row already exists before checking constraints ? I still want it to fail if it turns out to be a new row (which would violate the not null constraint), but updates the row if it already exists. Since if that is not possible, I would need to do a query to determine whether the row exists in the database which kinda eliminates the use of upsert. (in this case, partial upsert). On Sun, Nov 20, 2016 at 3:57 AM, Tom Lanewrote: > Adrian Klaver writes: > > ... So looks like constraints are checked before you get to the ON > CONFLICT section. > > Right. ON CONFLICT is a means for dealing with duplicate-key errors in > the specified (or inferred) unique index. It is *not* an all-purpose > error catcher. In the case at hand, the given INSERT request fails due > to not-null constraints that are unrelated to what the ON CONFLICT clause > tests for. > > regards, tom lane >
Re: [GENERAL] Partial update on an postgres upsert violates constraint
On 11/21/2016 02:32 AM, Andreas Terrius wrote: Is there any way to check whether the row already exists before checking constraints ? I still want it to fail if it turns out to be a new row (which would violate the not null constraint), but updates the row if it already exists. Since if that is not possible, I would need to do a query to determine whether the row exists in the database which kinda eliminates the use of upsert. (in this case, partial upsert). Before UPSERT appeared in 9.5, folks came up of with alternate methods of doing this. I would suggest searching on: postgres upsert cte You might be able to modify the examples to get what you want. On Sun, Nov 20, 2016 at 3:57 AM, Tom Lane> wrote: Adrian Klaver > writes: > ... So looks like constraints are checked before you get to the ON CONFLICT section. Right. ON CONFLICT is a means for dealing with duplicate-key errors in the specified (or inferred) unique index. It is *not* an all-purpose error catcher. In the case at hand, the given INSERT request fails due to not-null constraints that are unrelated to what the ON CONFLICT clause tests for. regards, tom lane -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Partial update on an postgres upsert violates constraint
Adrian Klaverwrites: > ... So looks like constraints are checked before you get to the ON CONFLICT > section. Right. ON CONFLICT is a means for dealing with duplicate-key errors in the specified (or inferred) unique index. It is *not* an all-purpose error catcher. In the case at hand, the given INSERT request fails due to not-null constraints that are unrelated to what the ON CONFLICT clause tests for. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Partial update on an postgres upsert violates constraint
On 11/19/2016 11:33 AM, Kim Rose Carlsen wrote: >> AFAIK, EXCLUDED is only available in a trigger function: > >> >> https://www.postgresql.org/docs/9.5/static/trigger-definition.html >> >> You are using EXCLUDED in a regular function so it would not be found. >> >> Can you also show the failure for your alternate method? > > From the manual > https://www.postgresql.org/docs/9.5/static/sql-insert.html > > " > conflict_action > conflict_action specifies an alternative ON CONFLICT action. It can be > either DO NOTHING, or a DO UPDATE clause specifying the exact details of > the UPDATE action to be performed in case of a conflict. The SET and > WHERE clauses in ON CONFLICT DO UPDATE have access to the existing row > using the table's name (or an alias), and to rows proposed for insertion > using the special excluded table. SELECT privilege is required on any > column in the target table where corresponding excluded columns are read. > " > Oops, my mistake. I should have spent more time on the examples. Changing the function to; CREATE OR REPLACE FUNCTION public.upsert_job(job jsonb) RETURNS void LANGUAGE plpgsql SECURITY DEFINER AS $function$ BEGIN INSERT INTO jobs AS origin VALUES( (job->>'id')::INTEGER, COALESCE(job->>'employee_name'::TEXT, 'test_name'), COALESCE(job->>'address'::TEXT, 'test_address'), job->>'phone_number'::TEXT ) ON CONFLICT (id) DO UPDATE SET employee_name = COALESCE(EXCLUDED.employee_name, origin.employee_name), address = COALESCE(EXCLUDED.address, origin.address), phone_number = COALESCE(EXCLUDED.phone_number, origin.phone_number); END; $function$ ; makes it work. So looks like constraints are checked before you get to the ON CONFLICT section. -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Partial update on an postgres upsert violates constraint
> AFAIK, EXCLUDED is only available in a trigger function: > > https://www.postgresql.org/docs/9.5/static/trigger-definition.html > > You are using EXCLUDED in a regular function so it would not be found. > > Can you also show the failure for your alternate method? >From the manual https://www.postgresql.org/docs/9.5/static/sql-insert.html " conflict_action conflict_action specifies an alternative ON CONFLICT action. It can be either DO NOTHING, or a DO UPDATE clause specifying the exact details of the UPDATE action to be performed in case of a conflict. The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the existing row using the table's name (or an alias), and to rows proposed for insertion using the special excluded table. SELECT privilege is required on any column in the target table where corresponding excluded columns are read. "
Re: [GENERAL] Partial update on an postgres upsert violates constraint
On 11/17/2016 10:13 PM, Andreas Terrius wrote: Hi, Basically I wanted to do a partial update inside pg (9.5), but it seems that a partial update fails when not all of constraint is fulfilled (such as the not null constraint) Below are the sql queries I used, |CREATETABLEjobs (id integer PRIMARYKEY,employee_name TEXT NOTNULL,address TEXT NOTNULL,phone_number TEXT );CREATEORREPLACE FUNCTIONupsert_job(job JSONB)RETURNS VOID AS$$BEGININSERTINTOjobs ASorigin VALUES((job->>'id')::INTEGER,job->>'employee_name'::TEXT,job->>'address'::TEXT,job->>'phone_number'::TEXT )ONCONFLICT (id)DO UPDATESETemployee_name =COALESCE(EXCLUDED.employee_name,origin.employee_name),address =COALESCE(EXCLUDED.address,origin.address),phone_number =COALESCE(EXCLUDED.phone_number,origin.phone_number);END;$$LANGUAGE PLPGSQL SECURITY DEFINER;--Full insert (OK)SELECTupsert_job('{"id" : 1, "employee_name" : "AAA", "address" : "City, x street no.y", "phone_number" : "123456789"}'::jsonb);--Partial update that fulfills constraint (Ok)SELECTupsert_job('{"id" : 1, "employee_name" : "BBB", "address" : "City, x street no.y"}'::jsonb);--Partial update that doesn't fulfill constraint (FAILS)SELECTupsert_job('{"id" : 1, "phone_number" : "12345"}'::jsonb);--ERROR: null value in column "employee_name" violates not-null constraint--DETAIL: Failing row contains (1, null, null, 12345).| I also tried explicitly stating the columns that I wanted to insert, and it also fails. How do I go around doing this ? AFAIK, EXCLUDED is only available in a trigger function: https://www.postgresql.org/docs/9.5/static/trigger-definition.html You are using EXCLUDED in a regular function so it would not be found. Can you also show the failure for your alternate method? Thank you -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Partial update on an postgres upsert violates constraint
Hi, Basically I wanted to do a partial update inside pg (9.5), but it seems that a partial update fails when not all of constraint is fulfilled (such as the not null constraint) Below are the sql queries I used, CREATE TABLE jobs ( id integer PRIMARY KEY, employee_name TEXT NOT NULL, address TEXT NOT NULL, phone_number TEXT); CREATE OR REPLACE FUNCTION upsert_job(job JSONB) RETURNS VOID AS $$BEGININSERT INTO jobs AS origin VALUES( (job->>'id')::INTEGER, job->>'employee_name'::TEXT, job->>'address'::TEXT, job->>'phone_number'::TEXT) ON CONFLICT (id) DO UPDATE SET employee_name = COALESCE(EXCLUDED.employee_name, origin.employee_name), address = COALESCE(EXCLUDED.address, origin.address), phone_number = COALESCE(EXCLUDED.phone_number, origin.phone_number);END;$$ LANGUAGE PLPGSQL SECURITY DEFINER; --Full insert (OK)SELECT upsert_job('{"id" : 1, "employee_name" : "AAA", "address" : "City, x street no.y", "phone_number" : "123456789"}'::jsonb); --Partial update that fulfills constraint (Ok)SELECT upsert_job('{"id" : 1, "employee_name" : "BBB", "address" : "City, x street no.y"}'::jsonb); --Partial update that doesn't fulfill constraint (FAILS)SELECT upsert_job('{"id" : 1, "phone_number" : "12345"}'::jsonb); --ERROR: null value in column "employee_name" violates not-null constraint--DETAIL: Failing row contains (1, null, null, 12345). I also tried explicitly stating the columns that I wanted to insert, and it also fails. How do I go around doing this ? Thank you
Re: [GENERAL] Stats update difference between VACUUM ANALYZE and ANALYZE in 9.2?
On Monday, September 26, 2016 9:44 AM, Tom Lanewrote: >> Paul Jones writes: >> For a freshly pg_restore'd 9.2 database, would VACUUM ANALYZE update >> statistics any better than just an ANALYZE? > > VACUUM would have caused the page-all-visible flags to get set for all > pages of unchanging tables. I don't recall whether ANALYZE has any side > effects on those flags at all, but it certainly would not have set them > for pages it didn't even visit, which would be most. > > Net result is that the pg_class.relallvisible fractions didn't get high > enough to persuade the planner that index-only scans would be effective. > I guess you could call that a statistic, but it's really about the > contents of the tables' free space maps. > > regards, tom lane This is good to know. I think we will be running VACUUM ANALYZE from now on after restore instead of just ANALYZE. I do note that sect. 49.11 claims that ANALYZE updates pg_class.relallvisible. I don't know if this is a documentation problem in light of what you explained. PJ
Re: [GENERAL] Stats update difference between VACUUM ANALYZE and ANALYZE in 9.2?
Paul Joneswrites: > For a freshly pg_restore'd 9.2 database, would VACUUM ANALYZE update > statistics any better than just an ANALYZE? Not as far as the statistics kept in pg_stat go. > After a restore, we ran a bunch of ANALYZEs on each table individually > using GNU 'parallel' (for speed). Many of these tables are child tables > in a partition. Following the ANALYZEs, a join with the parent table > showed all of the child tables scanned sequentially. > After running VACUUM ANALYZE on the whole database, the same join used > index-only scans on the child tables. VACUUM would have caused the page-all-visible flags to get set for all pages of unchanging tables. I don't recall whether ANALYZE has any side effects on those flags at all, but it certainly would not have set them for pages it didn't even visit, which would be most. Net result is that the pg_class.relallvisible fractions didn't get high enough to persuade the planner that index-only scans would be effective. I guess you could call that a statistic, but it's really about the contents of the tables' free space maps. Another possible theory is that you skipped ANALYZE'ing the partition parent tables in your first pass, but I'm betting on the all-visible fractions as being the issue. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Stats update difference between VACUUM ANALYZE and ANALYZE in 9.2?
For a freshly pg_restore'd 9.2 database, would VACUUM ANALYZE update statistics any better than just an ANALYZE? After a restore, we ran a bunch of ANALYZEs on each table individually using GNU 'parallel' (for speed). Many of these tables are child tables in a partition. Following the ANALYZEs, a join with the parent table showed all of the child tables scanned sequentially. After running VACUUM ANALYZE on the whole database, the same join used index-only scans on the child tables. An examination of the fine manual implies there may be some difference (or a documentation conflict?) between running ANALYZE manually on individual tables and an unqualified ANALYZE on the whole database. 5.9.6: "If you are using manual VACUUM or ANALYZE commands, don't forget that you need to run them on each partition individually. A command like: ANALYZE measurement; will only process the master table." ANALYZE: "If the table being analyzed has one or more children, ANALYZE will gather statistics twice: once on the rows of the parent table only, and a second time on the rows of the parent table with all of its children. This second set of statistics is needed when planning queries that traverse the entire inheritance tree. The autovacuum daemon, however, will only consider inserts or updates on the parent table itself when deciding whether to trigger an automatic analyze for that table. If that table is rarely inserted into or updated, the inheritance statistics will not be up to date unless you run ANALYZE manually." Can anyone explain what's going on here? Thanks, Paul Jones -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Comibining UPDATE ... SET ... FROM (SELECT ...) with a JOIN
Thank you Brian and others, but - On Fri, Sep 9, 2016 at 5:22 PM, Brian Dunavantwrote: > I'm making the assumption that you only have one ip/user in words_users. > > with lockrow as ( >SELECT g.gid, u.ip >FROM words_games g join words_users u > ON (g.player1 = u.uid) > WHERE g.finished IS NULL > ANDg.player1 <> in_uid > ANDg.played1 IS NOT NULL > ANDg.player2 IS NULL >LIMIT 1 >FORUPDATE SKIP LOCKED > ), do_the_update as ( >UPDATE words_games g1 >SETplayer2 = in_uid >FROM lockrow g2 >WHERE g1.gid = g2.gid >RETURNING g1.gid, g1.player2 > ) > select m.gid into out_gid, u.ip into out_uip > from do_the_update m > join lockrow u on (gid) > ; > > The general idea being lock the row in the first CTE, update it in the > second, returning your values, and then query against those in the > final select to get the ip. If it didn't update anything, you'll get > no results. > unfortunately, the above query does not seem to ensure, that players with same ip can not join the same game, which is actually my question... But thanks for showing the CTE for UPDATE ... RETURNING - that is probably the way to go for me Regards Alex
[GENERAL] Comibining UPDATE ... SET ... FROM (SELECT ...) with a JOIN
Good afternoon, I have a question please. In one table I store user ids and their IP addresses - CREATE TABLE words_users ( uid SERIAL PRIMARY KEY, ip inet NOT NULL ); And in another table I keep 2-player games and timestamps of last moves (NULL if a player hasn't played yet): CREATE TABLE words_games ( gid SERIAL PRIMARY KEY, finished timestamptz, player1 integer REFERENCES words_users(uid) ON DELETE CASCADE NOT NULL, player2 integer REFERENCES words_users(uid) ON DELETE CASCADE, played1 timestamptz, played2 timestamptz ); When a user wants to start a new game, I first check if there is maybe a new game already available - with just 1 player while the other "seat" is vacant: UPDATE words_games g1 SETplayer2 = in_uid FROM ( SELECT gid FROM words_games WHERE finished IS NULL ANDplayer1 <> in_uid ANDplayed1 IS NOT NULL ANDplayer2 IS NULL LIMIT 1 FORUPDATE SKIP LOCKED ) g2 WHERE g1.gid = g2.gid RETURNING g1.gid INTO out_gid; This code works great, but now I am trying to add an (obviously not solving all cheating/proxy/etc. problems) check, that the IP addresses of both users must be different. Fetching "ip" in the internal SELECT statement is trivial with: UPDATE words_games g1 SETplayer2 = in_uid FROM ( SELECT g.gid, u.ip FROM words_games g, words_users u WHERE g.finished IS NULL ANDg.player1 <> in_uid ANDg.played1 IS NOT NULL ANDg.player2 IS NULL ON (g.player1 = u.uid) LIMIT 1 FORUPDATE SKIP LOCKED ) g2 WHERE g1.gid = g2.gid RETURNING g1.gid INTO out_gid; But how to fetch the "ip" column in the surrounding UPDATE statement? Thank you Alex
Re: [GENERAL] plpgsql update row from record variable
On 04/02/2016 06:07 PM, Seamus Abshere wrote: hi, I want to write a function that updates arbitrary columns and here's my pseudocode: CREATE OR REPLACE FUNCTION update_pet(id int, raw_data jsonb) RETURNS VOID AS $$ DECLARE data record; BEGIN SELECT jsonb_populate_record(null::pets, raw_data) INTO data; UPDATE pets [... from data ...] WHERE id = id; -- probably impossible END; $$ LANGUAGE plpgsql; e.g. SELECT update_pets(1, '{ "name" : "Jerry", "age": 9 }'::jsonb); Back in 2004, Tom showed how to insert from a plpgsql record: http://www.postgresql.org/message-id/17840.1087670...@sss.pgh.pa.us Is there any way to "update *" from a record? What version of Postgres? In 9.5 you have the following syntax available: http://www.postgresql.org/docs/9.5/interactive/sql-update.html UPDATE accounts SET (contact_first_name, contact_last_name) = (SELECT first_name, last_name FROM salesmen WHERE salesmen.id = accounts.sales_id); or its almost(see docs for notes) pre-9.5 equivalent: UPDATE accounts SET contact_first_name = first_name, contact_last_name = last_name FROM salesmen WHERE salesmen.id = accounts.sales_id; So on a 9.4 instance here: test=# \d company Table "public.company" Column | Type | Modifiers -+---+--- id | integer | not null name| text | not null age | integer | not null address | character(50) | salary | real | Indexes: "company_pkey" PRIMARY KEY, btree (id) UPDATE company SET salary = jb.salary FROM ( SELECT id, salary FROM jsonb_populate_record ( NULL::company, '{"id": 1, "age": 32, "name": "Paul", "salary": 1, "address": "California"}' ::jsonb ) ) AS jb WHERE company.id = jb.id; Thanks! Seamus PS. Whether I **should** do this is another matter, I just want to know if it's possible. -- Seamus Abshere, SCEA https://github.com/seamusabshere http://linkedin.com/in/seamusabshere -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] plpgsql update row from record variable
hi, I want to write a function that updates arbitrary columns and here's my pseudocode: CREATE OR REPLACE FUNCTION update_pet(id int, raw_data jsonb) RETURNS VOID AS $$ DECLARE data record; BEGIN SELECT jsonb_populate_record(null::pets, raw_data) INTO data; UPDATE pets [... from data ...] WHERE id = id; -- probably impossible END; $$ LANGUAGE plpgsql; e.g. SELECT update_pets(1, '{ "name" : "Jerry", "age": 9 }'::jsonb); Back in 2004, Tom showed how to insert from a plpgsql record: http://www.postgresql.org/message-id/17840.1087670...@sss.pgh.pa.us Is there any way to "update *" from a record? Thanks! Seamus PS. Whether I **should** do this is another matter, I just want to know if it's possible. -- Seamus Abshere, SCEA https://github.com/seamusabshere http://linkedin.com/in/seamusabshere -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Multiple Update queries
Hello, I am using a foreign data wrapper where i get a portion of my data pre-loaded , i.e I get a set of rows before hand . So now i want to run multiple update queries on this loaded data , write the changes to file , load the next set and continue with updates again. How should i try to achieve my requirement ?? At what point in postgres should i use the hook and change my execution ?? Thanks, Harsha
[GENERAL] With Update From ... vs. Update ... From (With)
Hello! Is there any non-functional difference between these two forms of Update? WITH name AS ( SELECT ) UPDATE tbl SET ... FROM name WHERE tbl.id = name.id and UPDATE tbl SET ... FROM ( WITH qry AS ( SELECT ) SELECT * FROM qry ) AS name WHERE tbl.id = name.id They both better give the same results but does the backend treat them differently? Does the answer to my question depend on the version of PostgreSQL? Thanks! David J.
Re: [GENERAL] With Update From ... vs. Update ... From (With)
On 4/13/15 8:12 PM, David G. Johnston wrote: Hello! Is there any non-functional difference between these two forms of Update? WITH name AS ( SELECT ) UPDATE tbl SET ... FROM name WHERE tbl.id http://tbl.id = name.id http://name.id and UPDATE tbl SET ... FROM ( WITH qry AS ( SELECT ) SELECT * FROM qry ) AS name WHERE tbl.id http://tbl.id = name.id http://name.id They both better give the same results but does the backend treat them differently? Does the answer to my question depend on the version of PostgreSQL? Well, they're both ugly... but see what EXPLAIN or EXPLAIN VERBOSE shows. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Incremental update for matview - automatic trigger-in-c generator
Dear sir/madam. We build a program that can automatically generate triggers in C-language for synchronous incremental matview update. It supports queries with limitations: - inner join - group by with sum, count, avg, min, max - the join has to be of the form: t1 1-n t2 1-n t3 1-n t4 The binary and source can be find at http://it.ued.vn/myprojects/pgTriggerGen/. It is the first version. We now are build another version with some improvements. Anyone can use it if they find it useful. All comments, recommendations, improvements are welcome. Corresponding address: ntquocv...@ued.vn. Thank you for attentions.
[GENERAL] which Update quicker
Hello list, For a big table with more than 1,000,000 records, may I know which update is quicker please? (1) update t1 set c1 = a.c1 from a where pk and t1.c1a.c1; .. update t1 set c_N = a.c_N from a where pk and t1.c_Na.c_N; (2) update t1 set c1 = a.c1 , c2 = a.c2, ... c_N = a.c_N from a where pk AND ( t1.c1 a.c1 OR t1.c2 a.c2. t1.c_N a.c_N) Or other quicker way for update action? Thank you Emi -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] which Update quicker
On Tue, Sep 23, 2014 at 8:35 PM, Emi Lu em...@encs.concordia.ca wrote: Hello list, For a big table with more than 1,000,000 records, may I know which update is quicker please? (1) update t1 set c1 = a.c1 from a where pk and t1.c1a.c1; .. update t1 set c_N = a.c_N from a where pk and t1.c_Na.c_N; (2) update t1 set c1 = a.c1 , c2 = a.c2, ... c_N = a.c_N from a where pk AND ( t1.c1 a.c1 OR t1.c2 a.c2. t1.c_N a.c_N) Definitely the second, and it produces less bloat too. Or other quicker way for update action? You may express the comparison as (t1.c1, t1.c2, ... t1.cN) (t2.c1, t2.c2, ... t2.cN) It's not going to be faster but maybe it's more readable. -- Daniele -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] which Update quicker
On 09/23/2014 12:35 PM, Emi Lu wrote: Hello list, For a big table with more than 1,000,000 records, may I know which update is quicker please? (1) update t1 set c1 = a.c1 from a where pk and t1.c1a.c1; .. update t1 set c_N = a.c_N from a where pk and t1.c_Na.c_N; (2) update t1 set c1 = a.c1 , c2 = a.c2, ... c_N = a.c_N from a where pk AND ( t1.c1 a.c1 OR t1.c2 a.c2. t1.c_N a.c_N) We don't have any info about table structures, index availability and usage for query optimization, whether or not the updated columns are part of an index, amount of memory available, disk speed, portion of t1 that will be updated, PostgreSQL settings, etc. so it's really anyone's guess. A million rows is pretty modest so I was able to try a couple variants of update...from... on million row tables on my aging desktop without coming close to the 60-second mark. *Usually* putting statements into a single transaction is better (as would happen automatically in case 2). Also, to the extent that a given tuple would have multiple columns updated you will have less bloat and I/O using the query that updates the tuple once rather than multiple times. But a lot will depend on the efficiency of looking up the appropriate data in a. Cheers, Steve -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Correct update statement
Hi Please help, we are using postgreSQL 9.2.4. I need to update over 9000 rows. See the query below: A table called contact has got *addr_id *field as null which is incorrect. So now I need to update contact table for each account (cus_acno is in cus table) where contact_addr_id is null. For example using the first of the results below: I need take set addr_id (in contact table) to 187479 where cus_acno = 243492 and con_id = 119360 Example: select distinct(cus_acno), contact.con_id, address.addr_id from address join person using (addr_id) join cus using (per_id) join link_contact using (cus_acno) join contact using (con_id) where contact.addr_id is null; cus_acno | con_id | addr_id --++- 243492 | 119360 | 187479 393701 | 119824 | 458532 388538 | 118413 | 453178 Thanks CONFIDENTIALITY NOTICE The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential information of Argility (Proprietary) Limited and/or its subsidiaries. Any review, use or dissemination thereof by anyone other than the intended addressee is prohibited.If you are not the intended addressee please notify the writer immediately and destroy the e-mail. Argility (Proprietary) Limited and its subsidiaries distance themselves from and accept no liability for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes.
Re: [GENERAL] Correct update statement
update contacts set addr_id=b.addr_id from (select distinct(cus_acno), contact.con_id, address.addr_id from address join person using (addr_id) join cus using (per_id) join link_contact using (cus_acno) join contact using (con_id) where contact.addr_id is null) b where contacts.con_id=b.con_id On 05/15/2014 09:19 AM, Khangelani Gama wrote: Hi Please help, we are using postgreSQL 9.2.4. I need to update over 9000 rows. See the query below: A table called contact has got addr_id field as null which is incorrect. So now I need to update contact table for each account (cus_acno is in cus table) where contact_addr_id is null. For example using the first of the results below: I need take set addr_id (in contact table) to 187479 where cus_acno = 243492 and con_id = 119360 Example: select distinct(cus_acno), contact.con_id, address.addr_id from address join person using (addr_id) join cus using (per_id) join link_contact using (cus_acno) join contact using (con_id) where contact.addr_id is null; cus_acno | con_id | addr_id --++- 243492 | 119360 | 187479 393701 | 119824 | 458532 388538 | 118413 | 453178 Thanks CONFIDENTIALITY NOTICE The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential information of Argility (Proprietary) Limited and/or its subsidiaries. Any review, use or dissemination thereof by anyone other than the intended addressee is prohibited.If you are not the intended addressee please notify the writer immediately and destroy the e-mail. Argility (Proprietary) Limited and its subsidiaries distance themselves from and accept no liability for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes.
Re: [GENERAL] Correct update statement
Thank very much *From:* pgsql-general-ow...@postgresql.org [mailto: pgsql-general-ow...@postgresql.org] *On Behalf Of *Sim Zacks *Sent:* Thursday, May 15, 2014 10:42 AM *To:* pgsql-general@postgresql.org *Subject:* Re: [GENERAL] Correct update statement update contacts set addr_id=b.addr_id from (select distinct(cus_acno), contact.con_id, address.addr_id from address join person using (addr_id) join cus using (per_id) join link_contact using (cus_acno) join contact using (con_id) where contact.addr_id is null) b where contacts.con_id=b.con_id On 05/15/2014 09:19 AM, Khangelani Gama wrote: Hi Please help, we are using postgreSQL 9.2.4. I need to update over 9000 rows. See the query below: A table called contact has got *addr_id *field as null which is incorrect. So now I need to update contact table for each account (cus_acno is in cus table) where contact_addr_id is null. For example using the first of the results below: I need take set addr_id (in contact table) to 187479 where cus_acno = 243492 and con_id = 119360 Example: select distinct(cus_acno), contact.con_id, address.addr_id from address join person using (addr_id) join cus using (per_id) join link_contact using (cus_acno) join contact using (con_id) where contact.addr_id is null; cus_acno | con_id | addr_id --++- 243492 | 119360 | 187479 393701 | 119824 | 458532 388538 | 118413 | 453178 Thanks CONFIDENTIALITY NOTICE The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential information of Argility (Proprietary) Limited and/or its subsidiaries. Any review, use or dissemination thereof by anyone other than the intended addressee is prohibited.If you are not the intended addressee please notify the writer immediately and destroy the e-mail. Argility (Proprietary) Limited and its subsidiaries distance themselves from and accept no liability for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes. CONFIDENTIALITY NOTICE The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential information of Argility (Proprietary) Limited and/or its subsidiaries. Any review, use or dissemination thereof by anyone other than the intended addressee is prohibited.If you are not the intended addressee please notify the writer immediately and destroy the e-mail. Argility (Proprietary) Limited and its subsidiaries distance themselves from and accept no liability for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes.
Re: [GENERAL] simple update query stuck
Ok, thanks. I'll keep that in mind. On Tue, Apr 1, 2014 at 7:45 PM, Andrew Sullivan a...@crankycanuck.ca wrote: On Tue, Apr 01, 2014 at 07:00:16PM -0400, Tom Lane wrote: one of the clients, in a way that isn't visible to the deadlock detector. One way for that to happen without any external interconnections is if the client is waiting for a NOTIFY that will never arrive because the would-be sender is blocked. I bet the case I was thinking of was the NOTIFY example. That never occurred to me as an explanation, but now that you mention it, it seems quite likely to me. More generally (and for the OP's problem), my experience is that lots of updates against the same rows in an unpredictable order is an excellent way to run into trouble, and long-running transactions are a major source of these problems. Without a more detailed report about what is going on in the present case, I don't think it's going to be possible to diagnose better than has been done. A -- Andrew Sullivan a...@crankycanuck.ca -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Si Chen Open Source Strategies, Inc. sic...@opensourcestrategies.com http://www.OpenSourceStrategies.com LinkedIn: http://www.linkedin.com/in/opentaps Twitter: http://twitter.com/opentaps
[GENERAL] simple update query stuck
Hello, I'm using postgresql 9.0.13, and I have a simple query that seems to be stuck. I did a postgres=# select procpid, query_start, waiting, current_query from pg_stat_activity; procpid | query_start | waiting | current_query 32605 | 2014-04-01 12:39:46.957666-07 | t | UPDATE public.GL_ACCOUNT_ORGANIZATION SET ROLE_TYPE_ID=$1, FROM_DATE=$2, THRU_DATE=$3, POSTED_BALANCE=$4, LAST_UPDATED_STAMP=$5, LAST_UPDATED_TX_STAMP=$6, CREATED_STAMP=$7, CREATED_TX_STAMP=$8 WHERE GL_ACCOUNT _ID=$9 AND ORGANIZATION_PARTY_ID=$10 32685 | 2014-04-01 12:25:10.378481-07 | t | UPDATE public.GL_ACCOUNT_ORGANIZATION SET ROLE_TYPE_ID=$1, FROM_DATE=$2, THRU_DATE=$3, POSTED_BALANCE=$4, LAST_UPDATED_STAMP=$5, LAST_UPDATED_TX_STAMP=$6, CREATED_STAMP=$7, CREATED_TX_STAMP=$8 WHERE GL_ACCOUNT _ID=$9 AND ORGANIZATION_PARTY_ID=$10 As you can see this is a pretty simple update query, but it's been running 24 minutes in one thread and nearly 10 minutes in the other thread. What's also strange is it's not trigger a transaction timeout either. I've already re-started the database recently, and there's not too many threads executing -- just 38 threads total. Is it possible that the table is corrupted or needs repair? -- Si Chen Open Source Strategies, Inc. sic...@opensourcestrategies.com http://www.OpenSourceStrategies.com LinkedIn: http://www.linkedin.com/in/opentaps Twitter: http://twitter.com/opentaps
Re: [GENERAL] simple update query stuck
Do these queries update more than one row? I ran into a similar issue a year ago, where two multi-row updates would deadlock because they processed rows in a different order. I'd love to see UPDATE support ORDER BY to fix this, but it doesn't yet. (If I ever try contributing to Postgres, this is a feature I'd love to add. It seems like it might be an easy one for a newbie to tackle.) Paul On Tue, Apr 1, 2014 at 12:51 PM, Si Chen sic...@opensourcestrategies.com wrote: Hello, I'm using postgresql 9.0.13, and I have a simple query that seems to be stuck. I did a postgres=# select procpid, query_start, waiting, current_query from pg_stat_activity; procpid | query_start | waiting | current_query 32605 | 2014-04-01 12:39:46.957666-07 | t | UPDATE public.GL_ACCOUNT_ORGANIZATION SET ROLE_TYPE_ID=$1, FROM_DATE=$2, THRU_DATE=$3, POSTED_BALANCE=$4, LAST_UPDATED_STAMP=$5, LAST_UPDATED_TX_STAMP=$6, CREATED_STAMP=$7, CREATED_TX_STAMP=$8 WHERE GL_ACCOUNT _ID=$9 AND ORGANIZATION_PARTY_ID=$10 32685 | 2014-04-01 12:25:10.378481-07 | t | UPDATE public.GL_ACCOUNT_ORGANIZATION SET ROLE_TYPE_ID=$1, FROM_DATE=$2, THRU_DATE=$3, POSTED_BALANCE=$4, LAST_UPDATED_STAMP=$5, LAST_UPDATED_TX_STAMP=$6, CREATED_STAMP=$7, CREATED_TX_STAMP=$8 WHERE GL_ACCOUNT _ID=$9 AND ORGANIZATION_PARTY_ID=$10 As you can see this is a pretty simple update query, but it's been running 24 minutes in one thread and nearly 10 minutes in the other thread. What's also strange is it's not trigger a transaction timeout either. I've already re-started the database recently, and there's not too many threads executing -- just 38 threads total. Is it possible that the table is corrupted or needs repair? -- Si Chen Open Source Strategies, Inc. sic...@opensourcestrategies.com http://www.OpenSourceStrategies.com LinkedIn: http://www.linkedin.com/in/opentaps Twitter: http://twitter.com/opentaps -- _ Pulchritudo splendor veritatis. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] simple update query stuck
From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Si Chen Sent: Tuesday, April 01, 2014 3:51 PM To: pgsql-general@postgresql.org Subject: [GENERAL] simple update query stuck Hello, I'm using postgresql 9.0.13, and I have a simple query that seems to be stuck. I did a postgres=# select procpid, query_start, waiting, current_query from pg_stat_activity; procpid | query_start | waiting | current_query 32605 | 2014-04-01 12:39:46.957666-07 | t | UPDATE public.GL_ACCOUNT_ORGANIZATION SET ROLE_TYPE_ID=$1, FROM_DATE=$2, THRU_DATE=$3, POSTED_BALANCE=$4, LAST_UPDATED_STAMP=$5, LAST_UPDATED_TX_STAMP=$6, CREATED_STAMP=$7, CREATED_TX_STAMP=$8 WHERE GL_ACCOUNT _ID=$9 AND ORGANIZATION_PARTY_ID=$10 32685 | 2014-04-01 12:25:10.378481-07 | t | UPDATE public.GL_ACCOUNT_ORGANIZATION SET ROLE_TYPE_ID=$1, FROM_DATE=$2, THRU_DATE=$3, POSTED_BALANCE=$4, LAST_UPDATED_STAMP=$5, LAST_UPDATED_TX_STAMP=$6, CREATED_STAMP=$7, CREATED_TX_STAMP=$8 WHERE GL_ACCOUNT _ID=$9 AND ORGANIZATION_PARTY_ID=$10 As you can see this is a pretty simple update query, but it's been running 24 minutes in one thread and nearly 10 minutes in the other thread. What's also strange is it's not trigger a transaction timeout either. I've already re-started the database recently, and there's not too many threads executing -- just 38 threads total. Is it possible that the table is corrupted or needs repair? -- Si Chen Open Source Strategies, Inc. sic...@opensourcestrategies.com http://www.OpenSourceStrategies.com LinkedIn: http://www.linkedin.com/in/opentaps Twitter: http://twitter.com/opentaps Both queries are waiting. Your table must be locked. Check pg_locks. Regards, Igor Neyman -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] simple update query stuck
Thanks for writing back, but I don't think so. There's no message of a deadlock in the log, and the first query started at 12:25, the next one 12:31, 12:39, 12:50, 12:54, so there's plenty of time in between. On Tue, Apr 1, 2014 at 1:01 PM, Hoover, Jeffrey jhoo...@jcvi.org wrote: Could they both be trying to update the same row, resulting in a deadlock? *From:* pgsql-general-ow...@postgresql.org [mailto: pgsql-general-ow...@postgresql.org] *On Behalf Of *Si Chen *Sent:* Tuesday, April 01, 2014 3:51 PM *To:* pgsql-general@postgresql.org *Subject:* [GENERAL] simple update query stuck Hello, I'm using postgresql 9.0.13, and I have a simple query that seems to be stuck. I did a postgres=# select procpid, query_start, waiting, current_query from pg_stat_activity; procpid | query_start | waiting | current_query 32605 | 2014-04-01 12:39:46.957666-07 | t | UPDATE public.GL_ACCOUNT_ORGANIZATION SET ROLE_TYPE_ID=$1, FROM_DATE=$2, THRU_DATE=$3, POSTED_BALANCE=$4, LAST_UPDATED_STAMP=$5, LAST_UPDATED_TX_STAMP=$6, CREATED_STAMP=$7, CREATED_TX_STAMP=$8 WHERE GL_ACCOUNT _ID=$9 AND ORGANIZATION_PARTY_ID=$10 32685 | 2014-04-01 12:25:10.378481-07 | t | UPDATE public.GL_ACCOUNT_ORGANIZATION SET ROLE_TYPE_ID=$1, FROM_DATE=$2, THRU_DATE=$3, POSTED_BALANCE=$4, LAST_UPDATED_STAMP=$5, LAST_UPDATED_TX_STAMP=$6, CREATED_STAMP=$7, CREATED_TX_STAMP=$8 WHERE GL_ACCOUNT _ID=$9 AND ORGANIZATION_PARTY_ID=$10 As you can see this is a pretty simple update query, but it's been running 24 minutes in one thread and nearly 10 minutes in the other thread. What's also strange is it's not trigger a transaction timeout either. I've already re-started the database recently, and there's not too many threads executing -- just 38 threads total. Is it possible that the table is corrupted or needs repair? -- Si Chen Open Source Strategies, Inc. sic...@opensourcestrategies.com http://www.OpenSourceStrategies.com LinkedIn: http://www.linkedin.com/in/opentaps Twitter: http://twitter.com/opentaps -- Si Chen Open Source Strategies, Inc. sic...@opensourcestrategies.com http://www.OpenSourceStrategies.com LinkedIn: http://www.linkedin.com/in/opentaps Twitter: http://twitter.com/opentaps
Re: [GENERAL] simple update query stuck
You are right. That was the problem. I tried the query from http://wiki.postgresql.org/wiki/Lock_Monitoring and found a COMMIT transaction that was blocking it. I restarted postgresql again, and (it seems) everything went back to normal. Was there another way to unlock the table then? On Tue, Apr 1, 2014 at 1:10 PM, Igor Neyman iney...@perceptron.com wrote: From: pgsql-general-ow...@postgresql.org [mailto: pgsql-general-ow...@postgresql.org] On Behalf Of Si Chen Sent: Tuesday, April 01, 2014 3:51 PM To: pgsql-general@postgresql.org Subject: [GENERAL] simple update query stuck Hello, I'm using postgresql 9.0.13, and I have a simple query that seems to be stuck. I did a postgres=# select procpid, query_start, waiting, current_query from pg_stat_activity; procpid | query_start | waiting | current_query 32605 | 2014-04-01 12:39:46.957666-07 | t | UPDATE public.GL_ACCOUNT_ORGANIZATION SET ROLE_TYPE_ID=$1, FROM_DATE=$2, THRU_DATE=$3, POSTED_BALANCE=$4, LAST_UPDATED_STAMP=$5, LAST_UPDATED_TX_STAMP=$6, CREATED_STAMP=$7, CREATED_TX_STAMP=$8 WHERE GL_ACCOUNT _ID=$9 AND ORGANIZATION_PARTY_ID=$10 32685 | 2014-04-01 12:25:10.378481-07 | t | UPDATE public.GL_ACCOUNT_ORGANIZATION SET ROLE_TYPE_ID=$1, FROM_DATE=$2, THRU_DATE=$3, POSTED_BALANCE=$4, LAST_UPDATED_STAMP=$5, LAST_UPDATED_TX_STAMP=$6, CREATED_STAMP=$7, CREATED_TX_STAMP=$8 WHERE GL_ACCOUNT _ID=$9 AND ORGANIZATION_PARTY_ID=$10 As you can see this is a pretty simple update query, but it's been running 24 minutes in one thread and nearly 10 minutes in the other thread. What's also strange is it's not trigger a transaction timeout either. I've already re-started the database recently, and there's not too many threads executing -- just 38 threads total. Is it possible that the table is corrupted or needs repair? -- Si Chen Open Source Strategies, Inc. sic...@opensourcestrategies.com http://www.OpenSourceStrategies.com LinkedIn: http://www.linkedin.com/in/opentaps Twitter: http://twitter.com/opentaps Both queries are waiting. Your table must be locked. Check pg_locks. Regards, Igor Neyman -- Si Chen Open Source Strategies, Inc. sic...@opensourcestrategies.com http://www.OpenSourceStrategies.com LinkedIn: http://www.linkedin.com/in/opentaps Twitter: http://twitter.com/opentaps
Re: [GENERAL] simple update query stuck
On Tue, Apr 01, 2014 at 01:37:17PM -0700, Si Chen wrote: You are right. That was the problem. I tried the query from http://wiki.postgresql.org/wiki/Lock_Monitoring and found a COMMIT transaction that was blocking it. I restarted postgresql again, and (it seems) everything went back to normal. Was there another way to unlock the table then? Probably you could have killed one of the queries. But it sounds like what's happening is that you have multiple queries that are all trying to update the same rows in a different order. It may be that none of these is strictly deadlocked, in that no query is waiting on a lock that another query has, but rather is waiting on a lock that another query will release only when _it_ gets a lock that another query has and so on. (Maybe things have gotten better, but in my experience it's possible to set up a chain of locks such that it doesn't look like a deadlock to the detector, but the lock chain is such that no query will ever be able to release.) I suspect you need to get your locks in a consistent order or you'll continue to have this problem. A -- Andrew Sullivan a...@crankycanuck.ca -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] simple update query stuck
Andrew Sullivan-8 wrote On Tue, Apr 01, 2014 at 01:37:17PM -0700, Si Chen wrote: You are right. That was the problem. I tried the query from http://wiki.postgresql.org/wiki/Lock_Monitoring and found a COMMIT transaction that was blocking it. I restarted postgresql again, and (it seems) everything went back to normal. Was there another way to unlock the table then? http://www.postgresql.org/docs/9.2/interactive/functions-admin.html#FUNCTIONS-ADMIN-SIGNAL-TABLE I am curious why your original scan of pg-stat-query did not show this offending session/transaction...did you omit it by chance? Probably you could have killed one of the queries. But it sounds like what's happening is that you have multiple queries that are all trying to update the same rows in a different order. It may be that none of these is strictly deadlocked, in that no query is waiting on a lock that another query has, but rather is waiting on a lock that another query will release only when _it_ gets a lock that another query has and so on. (Maybe things have gotten better, but in my experience it's possible to set up a chain of locks such that it doesn't look like a deadlock to the detector, but the lock chain is such that no query will ever be able to release.) I suspect you need to get your locks in a consistent order or you'll continue to have this problem. It sounds more like a long-running transaction (or something effectively similar) is blocking updates of the target table through normal locking. To be honest we seem to be lacking enough information, including the query and locks state of the server during the stall, to make an informed guess as to the real cause. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/simple-update-query-stuck-tp5798237p5798261.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] simple update query stuck
Andrew Sullivan a...@crankycanuck.ca writes: Probably you could have killed one of the queries. But it sounds like what's happening is that you have multiple queries that are all trying to update the same rows in a different order. It may be that none of these is strictly deadlocked, in that no query is waiting on a lock that another query has, but rather is waiting on a lock that another query will release only when _it_ gets a lock that another query has and so on. (Maybe things have gotten better, but in my experience it's possible to set up a chain of locks such that it doesn't look like a deadlock to the detector, but the lock chain is such that no query will ever be able to release.) AFAIK the deadlock detector will catch any case where the waits-for loop is entirely internal to the database; if you've got a reproducible case where it doesn't, I'd like to see it. However, it's certainly true that you can get deadlock situations when some of the blocking conditions exist outside the database --- that is, to the server it looks like some transaction(s) are blocked on another session that is idle-in-transaction, but the client attached to that session is somehow waiting for another one of the clients, in a way that isn't visible to the deadlock detector. One way for that to happen without any external interconnections is if the client is waiting for a NOTIFY that will never arrive because the would-be sender is blocked. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] simple update query stuck
On Tue, Apr 01, 2014 at 07:00:16PM -0400, Tom Lane wrote: one of the clients, in a way that isn't visible to the deadlock detector. One way for that to happen without any external interconnections is if the client is waiting for a NOTIFY that will never arrive because the would-be sender is blocked. I bet the case I was thinking of was the NOTIFY example. That never occurred to me as an explanation, but now that you mention it, it seems quite likely to me. More generally (and for the OP's problem), my experience is that lots of updates against the same rows in an unpredictable order is an excellent way to run into trouble, and long-running transactions are a major source of these problems. Without a more detailed report about what is going on in the present case, I don't think it's going to be possible to diagnose better than has been done. A -- Andrew Sullivan a...@crankycanuck.ca -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpected update behaviour
Victor Yegorov vyego...@gmail.com writes: Could you kindly explain me why the query as it is updates no records? It's a bug, that's why. See http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=324577f39bc8738ed0ec24c36c5cb2c2f81ec660 or for 9.2, http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=5d545b7ed53a8b2058c1152bd1c9ae227b3280e3 regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Unexpected update behaviour
Greetings. I've hit a strange issue. This is a simplified setup. First, run create.sql to get a couple of tables. Then, run the update query. Tested on 9.2.6 and 9.3.2. Now: - if I remove the UNION ALL part of the inner query, UPDATE works; - if I move the `raw` subquery into the CTE, UPDATE works (but takes hours on my full data); - if I convert this UPDATE into a SELECT, I get the row to be updated. Could you kindly explain me why the query as it is updates no records? -- Victor Y. Yegorov create.sql Description: Binary data update.sql Description: Binary data -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] [Q] Update from a table
Hello. What happens if I issue UPDATE SET FROM... but with incomplete/buggy WHERE condition and thus SEVERAL rows from the from_list match ONE row to update? Thank you in advance, Ladislav Lenart -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [Q] Update from a table
Ladislav Lenart lenart...@volny.cz writes: What happens if I issue UPDATE SET FROM... but with incomplete/buggy WHERE condition and thus SEVERAL rows from the from_list match ONE row to update? Any given row will be updated at most once. However, the from_list row it gets updated against will be unpredictable, depending on the implementation of the join. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [Q] Update from a table
On 5.12.2013 15:14, Tom Lane wrote: Ladislav Lenart lenart...@volny.cz writes: What happens if I issue UPDATE SET FROM... but with incomplete/buggy WHERE condition and thus SEVERAL rows from the from_list match ONE row to update? Any given row will be updated at most once. However, the from_list row it gets updated against will be unpredictable, depending on the implementation of the join. regards, tom lane Makes perfect sense. Thank you, Ladislav Lenart -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] INSERT/UPDATE statements sometimes choosing not to use PK index?
I have a relatively simple data load script, which upserts (UPDATE existing rows, INSERT new rows), which should be supported by the primary key index, the only index on this table: UPDATE destination SET ... FROM staging WHERE staging.pk = destination.pk; INSERT INTO destination SELECT ... FROM staging WHERE NOT EXISTS(SELECT * FROM destination WHERE pk = staging.pk); In most cases, EXPLAIN and runtime tell me the index is utilized. However, sometime on back to back runs (5 minute intervals) the runtime explodes and EXPLAIN shows me that the PK index is not used, as both the staging table and destination table a sequentially scanned. The source data maintains the same pattern, and this load job is the only write activity in this particular database. A left anti-join in the above pseudo-query explains the same, and seems to make the problem occur more frequently. What could cause PostgreSQL (9.1) to stop using an index in a case like this, when it normally and expectedly uses it?
Re: [GENERAL] INSERT/UPDATE statements sometimes choosing not to use PK index?
Matt bsg...@gmail.com writes: In most cases, EXPLAIN and runtime tell me the index is utilized. However, sometime on back to back runs (5 minute intervals) the runtime explodes and EXPLAIN shows me that the PK index is not used, as both the staging table and destination table a sequentially scanned. You haven't explained what you do with the staging table, but maybe you need to ANALYZE it after you've loaded fresh data into it. The stats for the main table are presumably somewhat stable, but the stats for the staging table maybe not. The apparent correlation to consecutive runs lends some support to this theory, as that would reduce the time window for auto-ANALYZE to fix it for you. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] INSERT/UPDATE statements sometimes choosing not to use PK index?
Since this is an upsert, the staging table simply accepts a copy of pipe-delimited data via a COPY FROM STDIN, and a subsequent SQL script performs the UPDATE/INSERT. The staging table is then truncated (or delete without where) for the next run: Truncate staging, COPY into staging, update then insert destination from staging, repeat on next 5 minute clock cycle. Since the staging table is fully loaded into the destination table, a full scan was not a concern, but the lack of index usage on the destination. For example, this is normally supported by the PK index, and performs well: INSERT INTO destination (…) SELECT (…) FROM staging WHERE NOT EXISTS (SELECT * FROM source WHERE source.id = destination.id) The above runs in expected time when the explain plan shows an index scan on the destination primary key index, and a seq scan on the staging table. This will continue for many runs, until something causes the engine to stop using the destination PK index, and begin scanning both the destination and staging. What triggers this is the interesting question.The only system event I can (loosely) correlate with the problem is the start of a nightly pg_dump, but as a read-only process, this would not be changing any index stats, correct? An ANALYZE on staging is possible after each load cycle, but time does not permit on the destination table. I have been focusing on the destination because it has the index that is not being used. Will stats on the staging table affect index selection on the destination in a case like this? In the process of attempting to diagnose this, both tables involved have been vacuumed (full), analyzed. I have also moved staging to an SSD volume, and created an equivalent index on staging - which is not used in any plan, nor do I expect to as there is no filter criteria on staging, and the index maintenance on staging would seem to be an unneeded overhead. But in this case, is there any advantage to an index on staging? For reference, it is possible (not enough test cycles to verify) that left anti-join makes this situation worse, even though the explain plans appear identical: INSERT INTO destination (…) SELECT (…) FROM staging LEFT JOIN destination ON destination.id = staging.id WHERE destination.id IS NULL On 29 Oct 2013, at 9:45, Tom Lane wrote: Matt bsg...@gmail.com writes: In most cases, EXPLAIN and runtime tell me the index is utilized. However, sometime on back to back runs (5 minute intervals) the runtime explodes and EXPLAIN shows me that the PK index is not used, as both the staging table and destination table a sequentially scanned. You haven't explained what you do with the staging table, but maybe you need to ANALYZE it after you've loaded fresh data into it. The stats for the main table are presumably somewhat stable, but the stats for the staging table maybe not. The apparent correlation to consecutive runs lends some support to this theory, as that would reduce the time window for auto-ANALYZE to fix it for you. regards, tom lane
[GENERAL] Pretend update
With make I can do make -n and it just tells me what it would do but doesn't actually do anything. How could I do that with SQL? I want to write a really complicated (for me) SQL UPDATE statement. I'm sure I won't get it right the first time. Is there an easy way to not really make the changes? I've thought about starting a transaction and then roll it back. That would undo the changes. But I won't be able to tell what the changes were. Thank you for your time, Perry signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [GENERAL] Pretend update
On 10/01/2013 10:16 AM, Perry Smith wrote: With make I can do make -n and it just tells me what it would do but doesn't actually do anything. How could I do that with SQL? I want to write a really complicated (for me) SQL UPDATE statement. I'm sure I won't get it right the first time. Is there an easy way to not really make the changes? I've thought about starting a transaction and then roll it back. That would undo the changes. But I won't be able to tell what the changes were. Assuming you are not doing this in a function, you can. Do UPDATE, then SELECT to see your changes or not and then ROLLBACK. Thank you for your time, Perry -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Pretend update
On Oct 1, 2013, at 12:23 PM, Adrian Klaver adrian.kla...@gmail.com wrote: On 10/01/2013 10:16 AM, Perry Smith wrote: With make I can do make -n and it just tells me what it would do but doesn't actually do anything. How could I do that with SQL? I want to write a really complicated (for me) SQL UPDATE statement. I'm sure I won't get it right the first time. Is there an easy way to not really make the changes? I've thought about starting a transaction and then roll it back. That would undo the changes. But I won't be able to tell what the changes were. Assuming you are not doing this in a function, you can. Do UPDATE, then SELECT to see your changes or not and then ROLLBACK. Ah... yes. I forgot you can see the changes within the same transaction. Dohhh... Thank you very much Perry signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [GENERAL] Pretend update
2013/10/1 Perry Smith pedz...@gmail.com On Oct 1, 2013, at 12:23 PM, Adrian Klaver adrian.kla...@gmail.com wrote: Assuming you are not doing this in a function, you can. Do UPDATE, then SELECT to see your changes or not and then ROLLBACK. Ah... yes. I forgot you can see the changes within the same transaction. Dohhh... It is possible to use RETURNING clause of the UPDATE statement and avoid SELECT. And yes, it is necessary to do this within a transaction and roll it back after. -- Victor Y. Yegorov
Re: [GENERAL] BEFORE UPDATE trigger doesn't change column value
Hi Kevin, Sorry, that's the wrong way around. I should have said: Your BEFORE UPDATE trigger could leave the synced value in NEW alone if force_sync was **true**, and set synced to false otherwise. It could then set NEW.force_sync to false, to leave you ready for the next update. Thanks for your advice (and the patience on this list in general). Instead of using two columns, I now use an integer-column and set it to a value taken from an incrementing sequence. Thanks again, Clemens
Re: [GENERAL] BEFORE UPDATE trigger doesn't change column value
Clemens Eisserer linuxhi...@gmail.com wrote: Here is what I am trying to achieve: Set synced to false at any update, except when it has been set explicitly to true. This does not seem to be possible, without checking the value SET by UPDATE? Right; since there is no way to check whether a 'true' value there was explicitly set or just carrying over from the old version of the row without being set, you need some other mechanism for handling this. You could, for exampe, add a force_sync column which could be tested in a trigger. Your BEFORE UPDATE trigger could leave the synced value in NEW alone if force_sync was false, and set synced to false otherwise. It could then set NEW.force_sync to false, to leave you ready for the next update. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] BEFORE UPDATE trigger doesn't change column value
Kevin Grittner kgri...@ymail.com wrote: Your BEFORE UPDATE trigger could leave the synced value in NEW alone if force_sync was false, and set synced to false otherwise. It could then set NEW.force_sync to false, to leave you ready for the next update. Sorry, that's the wrong way around. I should have said: Your BEFORE UPDATE trigger could leave the synced value in NEW alone if force_sync was **true**, and set synced to false otherwise. It could then set NEW.force_sync to false, to leave you ready for the next update. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] BEFORE UPDATE trigger doesn't change column value
Hi, Sorry for this newbie-question, I am trying for quite some time now to get the following trigger-function to work properly: CREATE OR REPLACE FUNCTION update_synced_column() RETURNS trigger AS $BODY$ BEGIN IF NEW.synced IS NULL THEN NEW.synced := false; END IF; RETURN NEW; END; $BODY$ LANGUAGE plpgsql VOLATILE COST 100; The idea is basically to always update the synced column to false, unless a value has been provided manually in the UPDATE-clause. Synced is defined as BOOLEAN DEFAULT FALSE; I execute this function in a BEFORE UPDATE trigger FOR EACH ROW, however it doesn't seem to have any effect. Any ideas what could be wrong here? Thanks, Clemens
Re: [GENERAL] BEFORE UPDATE trigger doesn't change column value
Clemens Eisserer linuxhi...@gmail.com wrote: I am trying for quite some time now to get the following trigger-function to work properly: CREATE OR REPLACE FUNCTION update_synced_column() RETURNS trigger AS $BODY$ BEGIN IF NEW.synced IS NULL THEN NEW.synced := false; END IF; RETURN NEW; END; $BODY$ LANGUAGE plpgsql VOLATILE COST 100; The idea is basically to always update the synced column to false, unless a value has been provided manually in the UPDATE-clause. Synced is defined as BOOLEAN DEFAULT FALSE; I execute this function in a BEFORE UPDATE trigger FOR EACH ROW, however it doesn't seem to have any effect. Any ideas what could be wrong here? NEW reflects what the row will look like after the UPDATE. There is no way to tell which columns were specified in the SET clause of the UPDATE; a column which is omitted from that clause will look exactly the same as a column which is set to the value it already had. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] BEFORE UPDATE trigger doesn't change column value
On 5/4/13 at 6:59 AM, Clemens Eisserer wrote: Sorry for this newbie-question, I am trying for quite some time now to get the following trigger-function to work properly: CREATE OR REPLACE FUNCTION update_synced_column() RETURNS trigger AS $BODY$ BEGIN IF NEW.synced IS NULL THEN NEW.synced := false; END IF; RETURN NEW; END; $BODY$ LANGUAGE plpgsql VOLATILE COST 100; The idea is basically to always update the synced column to false, unless a value has been provided manually in the UPDATE-clause. Synced is defined as BOOLEAN DEFAULT FALSE; I execute this function in a BEFORE UPDATE trigger FOR EACH ROW, however it doesn't seem to have any effect. You could also add 'NOT NULL' to the declaration of synced so this column can never be set to NULL and this would further ensure the trigger function has nothing to do. By way of sanity testing, do you get any rows when doing something like: SELECT * FROM relevant_table WHERE synced IS NULL; Any ideas what could be wrong here? If the above does not apply and at the risk of being too obvious (specifically not wishing to cause offence): Has the trigger itself been declared? refer: http://www.postgresql.org/docs/9.2/static/sql-createtrigger.html Is the function ever getting called? refer: http://www.postgresql.org/docs/9.2/static/plpgsql-errors-and-messages.html Regards Gavan Schneider -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] BEFORE UPDATE trigger doesn't change column value
Hi, NEW reflects what the row will look like after the UPDATE. There is no way to tell which columns were specified in the SET clause of the UPDATE; a column which is omitted from that clause will look exactly the same as a column which is set to the value it already had. Thanks a lot for clarifying this ... my logic was flawed. At stackoverflow.com I found an example which suggested testing against NULL would allow this and I just relied on it. Here is what I am trying to achieve: Set synced to false at any update, except when it has been set explicitly to true. This does not seem to be possible, without checking the value SET by UPDATE? Regards, Clemens
Re: [GENERAL] Running update in chunks?
On 2013-01-25, Tim Uckun timuc...@gmail.com wrote: I agree that seems like the most likely cause. Each update to the row holding the hstore column requires adding new index entries for all the hstore elements, and autovacuum will need to clean up the old ones in the background. The best solution would be to either normalize the data instead of using hstore, or move the hstore to a separate table which is referenced by some sort of ID from the frequently-updated table. That's very interesting. I can certainly split up the table, no big deal there. So would the index be redone even if I am not updating the hstore field itself? Absolutely! see MVCC. http://www.postgresql.org/docs/current/static/mvcc-intro.html -- ⚂⚃ 100% natural -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
That would be one solution, but I think a better one would be to not store make_id in imports in the first place, but instead to always fetch it by joining imports to models at query time. My problem here is that the incoming data is quite messy so the join conditions become weird (lots of ORs and such). A multi pass approach seems to work better. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
Sorry I haven't been responsive for a little while. I ran your script but creating a new schema instead of my existing schema. My timings were similar to yours (more or less) except fo the vacuums which took roughly 147891 ms. On Tue, Jan 22, 2013 at 3:21 PM, Kevin Grittner kgri...@mail.com wrote: Kevin Grittner wrote: update imports set make_id = 0 Query returned successfully: 98834 rows affected, 45860 ms execution time. For difficult problems, there is nothing like a self-contained test case, that someone else can run to see the issue. Here's a starting point: create extension if not exists hstore; create schema cars; drop table if exists cars.imports; CREATE TABLE cars.imports ( id serial NOT NULL, target_id integer, batch_id integer, make_id integer, model_id integer, date timestamp without time zone, division_id integer, dealer_id integer, data hstore, created_at timestamp without time zone NOT NULL, updated_at timestamp without time zone NOT NULL, CONSTRAINT imports_pkey PRIMARY KEY (id) ); CREATE INDEX index_imports_on_data ON cars.imports USING gin (data); CREATE INDEX index_imports_on_dealer_id ON cars.imports USING btree (dealer_id); CREATE INDEX index_imports_on_division_id ON cars.imports USING btree (division_id); CREATE INDEX index_imports_on_make_id ON cars.imports USING btree (make_id); CREATE INDEX index_imports_on_model_id ON cars.imports USING btree (model_id); insert into cars.imports (created_at, updated_at) select now(), now() from (select generate_series(1, 10)) x; vacuum freeze analyze; \timing on update cars.imports set make_id = 0; vacuum analyze; update cars.imports set make_id = 0; ... and here's what I get when I run it on my desktop computer with ordinary disk drives and a completely default configuration: test=# create extension if not exists hstore; CREATE EXTENSION Time: 48.032 ms test=# create schema cars; CREATE SCHEMA Time: 8.150 ms test=# drop table if exists cars.imports; NOTICE: table imports does not exist, skipping DROP TABLE Time: 0.205 ms test=# CREATE TABLE cars.imports test-# ( test(# id serial NOT NULL, test(# target_id integer, test(# batch_id integer, test(# make_id integer, test(# model_id integer, test(# date timestamp without time zone, test(# division_id integer, test(# dealer_id integer, test(# data hstore, test(# created_at timestamp without time zone NOT NULL, test(# updated_at timestamp without time zone NOT NULL, test(# CONSTRAINT imports_pkey PRIMARY KEY (id) test(# ); NOTICE: CREATE TABLE will create implicit sequence imports_id_seq for serial column imports.id NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index imports_pkey for table imports CREATE TABLE Time: 152.677 ms test=# CREATE INDEX index_imports_on_data ON cars.imports USING gin (data); CREATE INDEX Time: 6.391 ms test=# CREATE INDEX index_imports_on_dealer_id ON cars.imports USING btree (dealer_id); CREATE INDEX Time: 64.668 ms test=# CREATE INDEX index_imports_on_division_id ON cars.imports USING btree (division_id); CREATE INDEX Time: 65.573 ms test=# CREATE INDEX index_imports_on_make_id ON cars.imports USING btree (make_id); CREATE INDEX Time: 64.959 ms test=# CREATE INDEX index_imports_on_model_id ON cars.imports USING btree (model_id); CREATE INDEX Time: 64.906 ms test=# insert into cars.imports (created_at, updated_at) select now(), now() from (select generate_series(1, 10)) x; INSERT 0 10 Time: 2516.559 ms test=# vacuum freeze analyze; VACUUM Time: 3357.778 ms test=# \timing on Timing is on. test=# update cars.imports set make_id = 0; UPDATE 10 Time: 2937.241 ms test=# vacuum analyze; VACUUM Time: 2097.426 ms test=# update cars.imports set make_id = 0; UPDATE 10 Time: 3935.939 ms Ubuntu 12.10 i7-3770 CPU @ 3.40GHz with 16GB RAM Linux Kevin-Desktop 3.5.0-22-generic #34-Ubuntu SMP Tue Jan 8 21:47:00 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux A pair of WDC WD10EALX-009BA0 (15.01H51) 7200 RPM drives in RAID 1. PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2, 64-bit test=# SELECT name, current_setting(name), source test-# FROM pg_settings test-# WHERE source NOT IN ('default', 'override'); name| current_setting |source ++-- application_name | psql | client client_encoding| UTF8 | client DateStyle | ISO, MDY | configuration file default_text_search_config | pg_catalog.english | configuration file lc_messages| en_US.UTF-8| configuration file lc_monetary| en_US.UTF-8| configuration file lc_numeric | en_US.UTF-8| configuration file lc_time| en_US.UTF-8| configuration
Re: [GENERAL] Running update in chunks?
What if you do: alter table cars.imports set (fillfactor=50); Before the vacuum full, and then try the update again? This makes a dramatic difference when combined with a vacuum. UPDATE 98834 Time: 3408.210 ms Ten times faster! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
relid | 26710 schemaname| cars relname | imports seq_scan | 280 seq_tup_read | 25873543 idx_scan | 4 idx_tup_fetch | 2749 n_tup_ins | 98926 n_tup_upd | 6350466 n_tup_del | 92 n_tup_hot_upd | 625286 n_live_tup| 98834 n_dead_tup| 0 last_vacuum | 2013-01-25 21:55:36.078614+13 last_autovacuum | 2013-01-25 21:58:40.850546+13 last_analyze | 2013-01-25 21:55:36.305967+13 last_autoanalyze | 2013-01-25 21:51:54.307639+13 vacuum_count | 6 autovacuum_count | 32 analyze_count | 6 autoanalyze_count | 25 On Wed, Jan 23, 2013 at 7:50 AM, Steve Crawford scrawf...@pinpointresearch.com wrote: On 01/21/2013 05:02 PM, Tim Uckun wrote: I already posted the schema earlier. It's a handful of integer fields with one hstore field. Oh well. I can't find it but maybe it got lost in shipping or eaten by a spam filter. On Tue, Jan 22, 2013 at 1:23 PM, Steve Crawford scrawf...@pinpointresearch.com wrote: select * from pg_stat_user_tables where relname='yourtable'; Messy output Don't know if you are in psql but if so, expanded display works for this. I.e.: steve@[local] = \x Expanded display is on. steve@[local] = select * from pg_stat_user_tables where relname='footest'; -[ RECORD 1 ]-+-- relid | 781691 schemaname| public relname | footest seq_scan | 3 seq_tup_read | 609 idx_scan | idx_tup_fetch | n_tup_ins | 609 n_tup_upd | 0 n_tup_del | 0 n_tup_hot_upd | 0 n_live_tup| 301 n_dead_tup| 0 last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | 2012-12-19 08:42:23.347368-08 vacuum_count | 0 autovacuum_count | 0 analyze_count | 0 autoanalyze_count | 2 relid,schemaname,relname,seq_scan,seq_tup_read,idx_scan,idx_tup_fetch,n_tup_ins,n_tup_upd,n_tup_del,n_tup_hot_upd,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,vacuum_count,autovacuum_count,analyze_count,autoanalyze_count 26710,chrysler,imports,274,25280539,4,2749,98926,5757462,92,327542,98834,0,2013-01-22 12:28:29.01505+13,2013-01-22 12:32:29.249588+13,2013-01-22 12:28:29.173772+13,2013-01-22 12:32:44.123493+13,3,30,3,24 So at least autovacuum is running (and some manual vacuum and analyze as well). Cheers, Steve -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
I agree that seems like the most likely cause. Each update to the row holding the hstore column requires adding new index entries for all the hstore elements, and autovacuum will need to clean up the old ones in the background. The best solution would be to either normalize the data instead of using hstore, or move the hstore to a separate table which is referenced by some sort of ID from the frequently-updated table. That's very interesting. I can certainly split up the table, no big deal there. So would the index be redone even if I am not updating the hstore field itself? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
On 25/01/13 08:57, Tim Uckun wrote: What if you do: alter table cars.imports set (fillfactor=50); Before the vacuum full, and then try the update again? This makes a dramatic difference when combined with a vacuum. UPDATE 98834 Time: 3408.210 ms Ten times faster! That suggests (to me, at least) that it is related to index updating. Again, your GIN index seems primary candidate. A fillfactor of 50% means row updates probably stay on the same disk-block as their previous version. This implies less index updates. Try running iostat (I think that's available on a Mac) with/without the fillfactor and with/without the GIN index while you do the updates. It's possible your SSD is just behaving oddly under stress. -- Richard Huxton Archonet Ltd -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
That suggests (to me, at least) that it is related to index updating. Again, your GIN index seems primary candidate. Try running iostat (I think that's available on a Mac) with/without the fillfactor and with/without the GIN index while you do the updates. It's possible your SSD is just behaving oddly under stress. I dropped the index and the numbers shot up tenfold or more. I don't know why postgres feels the need to update the GIN index on the hstore field when I am only updating an integer field but it looks like I need to split the hstore into a different table. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
On 25/01/13 11:38, Tim Uckun wrote: That suggests (to me, at least) that it is related to index updating. Again, your GIN index seems primary candidate. Try running iostat (I think that's available on a Mac) with/without the fillfactor and with/without the GIN index while you do the updates. It's possible your SSD is just behaving oddly under stress. I dropped the index and the numbers shot up tenfold or more. I don't know why postgres feels the need to update the GIN index on the hstore field when I am only updating an integer field but it looks like I need to split the hstore into a different table. If the row moves to a different block, then it has no choice. The old index entry will point to an invalid block. There are some optimisations (HOT - http://pgsql.tapoueh.org/site/html/misc/hot.html) but that relies on (iirc) the update staying on the same block and also not updating any indexed fields (and you were, I think). A GIN index is very expensive to update compared to btree too. -- Richard Huxton Archonet Ltd -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
Tim Uckun wrote: I dropped the index and the numbers shot up tenfold or more. I don't know why postgres feels the need to update the GIN index on the hstore field when I am only updating an integer field but it looks like I need to split the hstore into a different table. Every UPDATE that is not HOT will create a row version with a new row id. That means that all indexes referencing that row will have to get updated. That is consistent with better performance with low fillfactor (which makes HOT more likely). Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
On Fri, Jan 25, 2013 at 3:38 AM, Tim Uckun timuc...@gmail.com wrote: That suggests (to me, at least) that it is related to index updating. Again, your GIN index seems primary candidate. Try running iostat (I think that's available on a Mac) with/without the fillfactor and with/without the GIN index while you do the updates. It's possible your SSD is just behaving oddly under stress. I dropped the index and the numbers shot up tenfold or more. I don't know why postgres feels the need to update the GIN index on the hstore field when I am only updating an integer field When the row gets updated, it might move to some place else. An index maps data values to row locations. So if the location changes, all indexes need to be updated, even if the data value for that index did not change. (Well, I shouldn't say they *need* to change. The database could have been designed, with considerable difficulty and consequences, to leave behind permanent redirect pointers to the new location. But it wasn't) There is a mechanism called HOT update (Heap-Only Tuple) which can prevent this under certain conditions. 1) Either none of the fields being updated are indexed, or any that are both updated and indexed are updated to the value they already have. 2) There is room for a new copy of the tuple on the same page as the old one. lowering the fillfactor helps with requirement 2, especially since your tuples are probably wide (because of the hstore column) and so not many fit on a page. Note that if you update a field to have the same value as it already does, it still makes a new copy of the entire tuple anyway. (It detects that the :old = :new for HOT-eligibility purposes if the field is indexed, but not for suppression of copying purposes. And if the tuple needs to be copied but there is no room on that page, then it isn't eligible for HOT after all). So you should add a where clause to the UPDATE to filter out things that are unchanged. but it looks like I need to split the hstore into a different table. That would be one solution, but I think a better one would be to not store make_id in imports in the first place, but instead to always fetch it by joining imports to models at query time. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
Jeff Janes wrote: one hstore field can easily be equivalent to 50 text fields with an index on each one. I'm pretty sure that that is your bottleneck. I agree that seems like the most likely cause. Each update to the row holding the hstore column requires adding new index entries for all the hstore elements, and autovacuum will need to clean up the old ones in the background. The best solution would be to either normalize the data instead of using hstore, or move the hstore to a separate table which is referenced by some sort of ID from the frequently-updated table. -Kevin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
On 01/21/2013 05:02 PM, Tim Uckun wrote: I already posted the schema earlier. It's a handful of integer fields with one hstore field. Oh well. I can't find it but maybe it got lost in shipping or eaten by a spam filter. On Tue, Jan 22, 2013 at 1:23 PM, Steve Crawford scrawf...@pinpointresearch.com wrote: select * from pg_stat_user_tables where relname='yourtable'; Messy output Don't know if you are in psql but if so, expanded display works for this. I.e.: steve@[local] = \x Expanded display is on. steve@[local] = select * from pg_stat_user_tables where relname='footest'; -[ RECORD 1 ]-+-- relid | 781691 schemaname| public relname | footest seq_scan | 3 seq_tup_read | 609 idx_scan | idx_tup_fetch | n_tup_ins | 609 n_tup_upd | 0 n_tup_del | 0 n_tup_hot_upd | 0 n_live_tup| 301 n_dead_tup| 0 last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | 2012-12-19 08:42:23.347368-08 vacuum_count | 0 autovacuum_count | 0 analyze_count | 0 autoanalyze_count | 2 relid,schemaname,relname,seq_scan,seq_tup_read,idx_scan,idx_tup_fetch,n_tup_ins,n_tup_upd,n_tup_del,n_tup_hot_upd,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,vacuum_count,autovacuum_count,analyze_count,autoanalyze_count 26710,chrysler,imports,274,25280539,4,2749,98926,5757462,92,327542,98834,0,2013-01-22 12:28:29.01505+13,2013-01-22 12:32:29.249588+13,2013-01-22 12:28:29.173772+13,2013-01-22 12:32:44.123493+13,3,30,3,24 So at least autovacuum is running (and some manual vacuum and analyze as well). Cheers, Steve -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
On 01/21/2013 06:21 PM, Kevin Grittner wrote: Kevin Grittner wrote: ... shared_buffers | 32MB | configuration file ... I did absolutely no tuning from the default configuration. But Tim has his shared_buffers set to 1600kB (shared_buffers;1600kB;configuration file) or roughly 1/20 of the typical default value, which is a very low starting value anyway, on a machine populated with 8GB RAM. I'd like to see how it runs with a more reasonable shared_buffers setting. At a very minimum the 32MB default. Cheers, Steve -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
Steve Crawford escribió: On 01/21/2013 05:02 PM, Tim Uckun wrote: I already posted the schema earlier. It's a handful of integer fields with one hstore field. Oh well. I can't find it but maybe it got lost in shipping or eaten by a spam filter. This is what we have the archives are for: http://www.postgresql.org/message-id/flat/50fedf66.7050...@pinpointresearch.com#50fedf66.7050...@pinpointresearch.com The original message is at the top of the page (obviously). -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
On 01/22/2013 10:59 AM, Alvaro Herrera wrote: Steve Crawford escribió: On 01/21/2013 05:02 PM, Tim Uckun wrote: I already posted the schema earlier. It's a handful of integer fields with one hstore field. Oh well. I can't find it but maybe it got lost in shipping or eaten by a spam filter. This is what we have the archives are for: http://www.postgresql.org/message-id/flat/50fedf66.7050...@pinpointresearch.com#50fedf66.7050...@pinpointresearch.com The original message is at the top of the page (obviously). Didn't notice that the information was over on the github site (which, of course, also makes it impossible to search for in my email and unavailable to the mail archives for those wishing to view it in the future). Cheers, Steve -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
On 21/01/13 08:04, Tim Uckun wrote: This is the query I am running update cars.imports i set make_id = md.make_id from cars.models md where i.model_id = md.id; Here is the analyse Looks like it's the actual update that's taking all the time. This query takes fifty seconds on a macbook air with i7 processor and eight gigs of RAM and SSD hard drive. I am using postgres 9.2 installed with homebrew using the standard conf file. Can you try a couple of things just to check timings. Probably worth EXPLAIN ANALYSE. SELECT count(*) FROM cars.imports i JOIN cars.models md ON i.model_id = md.id; CREATE TEMP TABLE tt AS SELECT i.* FROM cars.imports i JOIN cars.models md ON i.model_id = md.id; Now the first one should take half a second judging by your previous explain. If the second one takes 50 seconds too then that's just the limit of your SSD's write. If it's much faster then something else is happening. -- Richard Huxton Archonet Ltd -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
Can you try a couple of things just to check timings. Probably worth EXPLAIN ANALYSE. SELECT count(*) FROM cars.imports i JOIN cars.models md ON i.model_id = md.id; Takes about 300 ms CREATE TEMP TABLE tt AS SELECT i.* FROM cars.imports i JOIN cars.models md ON i.model_id = md.id; Takes about 300 ms Now the first one should take half a second judging by your previous explain. If the second one takes 50 seconds too then that's just the limit of your SSD's write. If it's much faster then something else is happening. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
On 21/01/13 10:30, Tim Uckun wrote: Can you try a couple of things just to check timings. Probably worth EXPLAIN ANALYSE. SELECT count(*) FROM cars.imports i JOIN cars.models md ON i.model_id = md.id; Takes about 300 ms CREATE TEMP TABLE tt AS SELECT i.* FROM cars.imports i JOIN cars.models md ON i.model_id = md.id; Takes about 300 ms OK - so writing all the data takes very under one second but updating the same amount takes 50 seconds. The only differences I can think of are WAL logging (transaction log) and index updates (the temp table has no indexes). 1. Try CREATE TABLE rather than CREATE TEMP TABLE - if that's still quick then it's not the time taken to write WAL. 2. Run the update query against your new tt table and see how long that takes. 3. Add indexes and repeat (in particular I'd be suspicious of the gin index on data) My guess is that it's the time taken to update the data index - gin indexes can be slow to rebuild (although 50 seconds seems *very* slow). If so there are a few options: 1. Split the table and put whatever this data is into an import_data table - assuming it doesn't change often. 2. Try a fill-factor of 50% or less - keeping the updates on the same data page as the original might help 3. Drop the gin index before doing your bulk update and rebuild it at the end. This is a common approach with bulk-loading / updates. Oh - I'm assuming you're only updating those rows whose id has changed - that seemed to be the suggestion in your first message. If not, simply adding AND make_id md.make_id should help. Also (and you may well have considered this) - for a normalised setup you'd just have the model-id in imports and look up the make-id through the models table. -- Richard Huxton Archonet Ltd -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
Richard Huxton wrote: The only differences I can think of are WAL logging (transaction log) and index updates (the temp table has no indexes). What about foreign keys? Are there any tables which reference the updated column in a foreign key declaration? Do they have indexes on that column? -Kevin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
Just to close this up and give some guidance to future googlers... There are no foreign keys. The table is empty when I start. I copy the data into it from a text file. Removing indexes didn't help much (made a very slight difference). running a query CREATE TEMP TABLE tt AS SELECT using a massive join takes about 8 seconds. I presume that's the baseline for the disk and RAM given my current postgres configuration. Note that this is not a satisfactory option for me because I can't do what I want in one step (the update I specified is one of many). running a very simple update UPDATE imports set make_id = null takes over 50 seconds so that's the minimum amount of time any update is going to take. Running a complex update where I join all the tables together and update all the fields takes about 106 seconds. Just running a complex select with the joins takes no time at all. I tried chunking the updates using chunks of 100 records and 1000 records (where ID between X and Y repeatedly) and it was even slower. Conclusion. Updates on postgres are slow (given the default postgresql.conf). I presume this is due to MVCC or the WAL or something and there are probably some things I can do to tweak the conf file to make them go faster but out of the box running an update on a table with lots of rows is going to cost you a lot. Removing the indexes doesn't help that much. Suggestion for the PG team. Deliver a more realistic postgres.conf by default. The default one seems to be aimed at ten year old PCs with very little RAM and disk space. At least deliver additional conf files for small, medium, large, huge setups. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
On 21/01/13 20:09, Tim Uckun wrote: Just to close this up and give some guidance to future googlers... Careful, future googlers. Conclusion. Updates on postgres are slow Nope. (given the default postgresql.conf). I presume this is due to MVCC or the WAL or something and there are probably some things I can do to tweak the conf file to make them go faster but out of the box running an update on a table with lots of rows is going to cost you a lot. Unlikely. Do you really think that a PostgreSQL installation typically runs 100 times slower on updates than inserts and every other user has just said oh, that's ok then? Or is it more likely that something peculiar is broken on your setup. Removing the indexes doesn't help that much. Suggestion for the PG team. Deliver a more realistic postgres.conf by default. The default one seems to be aimed at ten year old PCs with very little RAM and disk space. At least deliver additional conf files for small, medium, large, huge setups. -- Richard Huxton -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
On Mon, Jan 21, 2013 at 1:23 PM, Richard Huxton d...@archonet.com wrote: On 21/01/13 20:09, Tim Uckun wrote: Just to close this up and give some guidance to future googlers... Careful, future googlers. Conclusion. Updates on postgres are slow Nope. (given the default postgresql.conf). I presume this is due to MVCC or the WAL or something and there are probably some things I can do to tweak the conf file to make them go faster but out of the box running an update on a table with lots of rows is going to cost you a lot. Unlikely. Do you really think that a PostgreSQL installation typically runs 100 times slower on updates than inserts and every other user has just said oh, that's ok then? Or is it more likely that something peculiar is broken on your setup. Removing the indexes doesn't help that much. Suggestion for the PG team. Deliver a more realistic postgres.conf by default. The default one seems to be aimed at ten year old PCs with very little RAM and disk space. At least deliver additional conf files for small, medium, large, huge setups. I'd be curious to see results of the same update on a standard HDD vs the SSD, and maybe on a more typical database deployment hardware vs a macbook air. --patrick -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
Nope. If you have any suggestions I am all ears. For the purposes of this discussion we can narrow down the problem this update statement. Update imports set make_id = null. There are 98K records in the table. There is no index on the make_id field. Standard untouched postgresql.conf from the brew install of postgres. Unlikely. Do you really think that a PostgreSQL installation typically runs 100 times slower on updates than inserts and every other user has just said oh, that's ok then? Or is it more likely that something peculiar is broken on your setup. I really don't know. That's why I am here asking. I don't think anything particular is broken with my system. As mentioned above the setup is really simple. Standard postgres install, the default conf file, update one field on one table. It takes fifty plus seconds. I concede that if I was to go into the postgres.conf and make some changes it will probably run faster (maybe much faster) but I wanted to exhaust other factors before I went messing with the default install. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
I'd be curious to see results of the same update on a standard HDD vs the SSD, and maybe on a more typical database deployment hardware vs a macbook air. I haven't tried it on any other machine yet. CREATE TEMP TABLE tt as SELECT ... takes eight seconds so presumably the disk is not the choke point. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
Richard Huxton wrote: On 21/01/13 20:09, Tim Uckun wrote: Just to close this up and give some guidance to future googlers... Careful, future googlers. +1 Conclusion. Updates on postgres are slow Nope. Agreed. (given the default postgresql.conf). I presume this is due to MVCC or the WAL or something and there are probably some things I can do to tweak the conf file to make them go faster but out of the box running an update on a table with lots of rows is going to cost you a lot. Unlikely. Do you really think that a PostgreSQL installation typically runs 100 times slower on updates than inserts and every other user has just said oh, that's ok then? Or is it more likely that something peculiar is broken on your setup. As someone who has managed hundreds of databases ranging up to over 3TB without seeing this without some very specific cause, I agree that there is something wonky on Tim's setup which he hasn't told us about. Then again, I'm not sure we've pushed hard enough for the relevant details. Tim, if you're still interested in resolving this, please post the results from running the SQL code on this page: http://wiki.postgresql.org/wiki/Server_Configuration It might be worthwhile to read through this page: http://wiki.postgresql.org/wiki/Slow_Query_Questions ... and try some of the ideas there. Base disk perfromance numbers would help put the results in perspective. The cause could be anything from table bloat due to inadequate vacuuming to hardware problems. -Kevin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
Tim Uckun wrote: If you have any suggestions I am all ears. For the purposes of this discussion we can narrow down the problem this update statement. Update imports set make_id = null. Well, that simplifies things. First off, what does it say for rows affected? (Hint, if you really are using a default configuration and it doesn't say 0 rows affected, please show us the actual query used.) Second, try connecting to the database as a superuser and running: VACUUM ANALYZE imports; -- (show us the results) VACUUM FULL imports; VACUUM FREEZE ANALYZE; -- (don't specify a table) Then try your query and see whether performance is any different. -Kevin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
On Mon, Jan 21, 2013 at 1:48 PM, Tim Uckun timuc...@gmail.com wrote: I'd be curious to see results of the same update on a standard HDD vs the SSD, and maybe on a more typical database deployment hardware vs a macbook air. I haven't tried it on any other machine yet. CREATE TEMP TABLE tt as SELECT ... takes eight seconds so presumably the disk is not the choke point. you are making an assumption that a fresh write is the same as a re-write. try the test. --patrick -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
Kevin Grittner wrote: First off, what does it say for rows affected? (Hint, if you really are using a default configuration and it doesn't say 0 rows affected, please show us the actual query used.) Never mind that bit -- I got myself confused. Sorry for the noise. -Kevin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running update in chunks?
First off, what does it say for rows affected? (Hint, if you really are using a default configuration and it doesn't say 0 rows affected, please show us the actual query used.) update imports set make_id = null Query returned successfully: 98834 rows affected, 49673 ms execution time. vacuum analyze imports Query returned successfully with no result in 4138 ms. VACUUM FULL imports; Query returned successfully with no result in 38106 ms. VACUUM FREEZE ANALYZE; Query returned successfully with no result in 184635 ms update imports set make_id = 0 Query returned successfully: 98834 rows affected, 45860 ms execution time. So all the vacuuming saved about four seconds of execution time. here is the postgresql.conf completely untouched from the default install https://gist.github.com/4590590 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general