Re: [HACKERS] MERGE SQL Statement for PG11
Ah, there is one reason not to use a mapping to CTEs to implement MERGE: it might be faster to use a single query that is a FULL OUTER JOIN of the source and target to drive the update/insert/delete operations. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Tue, Nov 07, 2017 at 03:31:22PM -0800, Peter Geoghegan wrote: > On Tue, Nov 7, 2017 at 3:29 PM, Nico Williamswrote: > > On Thu, Nov 02, 2017 at 03:25:48PM -0700, Peter Geoghegan wrote: > >> Nico Williams wrote: > >> >A MERGE mapped to a DML like this: > > > > I needed to spend more time reading MERGE docs from other RDBMSes. > > Please don't hijack this thread. It's about the basic question of > semantics, and is already hard enough for others to follow as-is. I'm absolutely not. If you'd like a pithy summary devoid of detail, it is this: I'm making the argument that using ON CONFLICT to implement MERGE cannot produce a complete implementation [you seem to agree], but there is at least one light-weight way to implement MERGE with _existing_ machinery in PG: CTEs. It's perfectly fine to implement an executor for MERGE, but I think that's a bit silly and I explain why. Further, I explored your question regarding order of events, which you (and I) think is a very important semantics question. You thought order of execution / trigger firing should be defined, whereas I think it should not because MERGE explicitly says, at least MSFT's! MSFT's MERGE says: | For every insert, update, or delete action specified in the MERGE | statement, SQL Server fires any corresponding AFTER triggers defined | on the target table, but does not guarantee on which action to fire | triggers first or last. Triggers defined for the same action honor the | order you specify. Impliedly (though not stated explicitly), the actual updates, inserts, and deletes, can happen in any order as well as the triggers firing in any order. As usual, in the world of programming language design, leaving order of execution undefined as much as possible increases the level of available opportunities to parallelize. Presumably MSFT is leaving the door open to parallizing MERGE, if they haven't already. Impliedly, CTEs that have no dependencies on each other are also ripe for parallelization. This is important too! For one of my goals is: to improve CTE performance. If implementing MERGE as a mapping to CTEs leads to improvements in CTEs, so much the better. But also this *is* a simple implementation of MERGE, and simplicity seems like a good thing. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Tue, Nov 7, 2017 at 3:29 PM, Nico Williamswrote: > On Thu, Nov 02, 2017 at 03:25:48PM -0700, Peter Geoghegan wrote: >> Nico Williams wrote: >> >A MERGE mapped to a DML like this: > > I needed to spend more time reading MERGE docs from other RDBMSes. Please don't hijack this thread. It's about the basic question of semantics, and is already hard enough for others to follow as-is. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Thu, Nov 02, 2017 at 03:25:48PM -0700, Peter Geoghegan wrote: > Nico Williamswrote: > >A MERGE mapped to a DML like this: I needed to spend more time reading MERGE docs from other RDBMSes. The best MERGE so far is MS SQL Server's, which looks like: MERGE INTO USING ON () -- optional: WHEN MATCHED THEN UPDATE SET ... -- optional: WHEN NOT MATCHED [ BY TARGET ] THEN INSERT ... -- optional: WHEN NOT MATCHED BY SOURCE THEN DELETE -- optional: OUTPUT ... ; (The other MERGEs are harder to use because they lack a WHEN NOT MATCHED BY SOURCE THEN DELETE, instead having a DELETE clause on the UPDATE, which is then difficult to use.) This is *trivial* to map to a CTE, and, in fact, I and my colleagues have resorted to hand-coded CTEs like this precisely because PG lacks MERGE (though we ourselves didn't know about MERGE -- it's new to us). If is a query, then we start with a CTE for that, else if it's a view or table, then we don't setup a CTE for it. Each of the UPDATE, INSERT, and/or DELETE can be it's own CTE. If there's an OUTPUT clause, that can be a final SELECT that queries from the CTEs that ran the DMLs with RETURNING. If there's no OUTPUT then none of the DMLs need to have RETURNING, and one of them will be the main statement, rather than a CTE. The pattern is: WITH -- IFF is a query: AS (), -- IFF there's a WHEN MATCHED THEN UPDATE updates AS ( UPDATE AS SET ... FROM WHERE -- IFF there's an OUTPUT clause, then: RETURNING 'update' as "@action", ... ), inserts AS ( INSERT INTO () SELECT ... FROM LEFT JOIN ON WHERE IS NOT DISTINCT FROM NULL -- IFF there's a CONCURRENTLY clause: ON CONFLICT DO NOTHING -- IFF there's an OUTPUT clause, then: RETURNING 'insert' as "@action", ... ), deletes AS ( DELETE FROM WHERE NOT EXISTS (SELECT * FROM WHERE ) -- IFF there's an OUTPUT clause, then: RETURNING 'delete' as "@action", ... ), -- IFF there's an OUTPUT clause SELECT * FROM updates UNION SELECT * FROM inserts UNION SELECT * FROM deletes; If there's not an output clause then one of the DMLs has to be the main statement: WITH ... DELETE ...; -- or UPDATE, or INSERT Note that if the source is a view or table and there's no OUTPUT clause, then it's one DML with up to (but not referring to) two CTEs, and in all cases the CTEs do not refer to each other. This means that the executor can parallelize all of the DMLs. If the source is a query, then that could be made a temp view to avoid having to run the query first. The CTE executor needs to learn to sometimes do this anyways, so this is good. The CTE can be equivalently written without a NOT EXISTS: to_be_deleted AS ( SELECT FROM LEFT JOIN ON () WHERE IS NOT DISTINCT FROM NULL ), deletes AS ( DELETE FROM USING to_be_deleted tbd WHERE = ) if that were to run faster (probably not, since PG today would first run the to_be_deleted CTE, then the deletes CTE). I mention only because it's nice to see the symmetry of LEFT JOINs for the two WHEN NOT MATCHED cases. (Here is the alias for it if one was given.) *** This mapping triggers triggers as one would expect (at least FOR EACH ROW; I expect the DMLs in CTEs should also trigger FOR EACH STATEMENT triggers, and if they don't I consider that a bug). > This is a bad idea. An implementation like this is not at all > maintainable. I beg to differ. First of all, not having to add an executor for MERGE is a win: much, much less code to maintain. The code to map MERGE to CTEs can easily be contained entirely in src/backend/parser/gram.y, which is a maintainability win: any changes to how CTEs are compiled will fail to compile if they break the MERGE mapping to CTEs. > >can handle concurrency via ON CONFLICT DO NOTHING in the INSERT CTE. > > That's not handling concurrency -- it's silently ignoring an error. Who > is to say that the conflict that IGNORE ignored is associated with a row > visible to the MVCC snapshot of the statement? IOW, why should the DELETE > affect any row? That was me misunderstanding MERGE. The DELETE is independent of the INSERT -- if an INSERT does nothing because of an ON CONFLICT DO NOTHING clause, then that won't cause that row to be deleted -- the inserts and deletes CTEs are independent in the latest mapping (see above). I believe adding ON CONFLICT DO NOTHING to the INSERT in this mapping is all that's needed to support concurrency. > There are probably a great many reasons why you need a ModifyTable > executor node that keeps around state, and explicitly indicates that a > MERGE is a MERGE. For example, we'll probably want statement level > triggers to execute in a fixed order, regardless of the MERGE, RLS will > probably require explicitly knowledge of MERGE
Re: [HACKERS] MERGE SQL Statement for PG11
On 6 November 2017 at 17:35, Simon Riggswrote: > I read that step 3 in Approach2 is some kind of problem in MVCC > semantics. My understanding is that SQL Standard allows us to define > what the semantics of the statement are in relation to concurrency, so > any semantic issue can be handled by defining it to work the way we > want. The semantics are: > a) when a unique index is available we avoid errors by using semantics > of INSERT .. ON CONFLICT UPDATE. > b) when a unique index is not available we use other semantics. I'm obviously being obtuse. If a unique index is not available, then surely there won't _be_ a failure? The INSERT (or indeed UPDATE) that results in two similar records will simply happen, and you will end up with two records the same. That's OK, based on the semantics of MERGE, no? At the transaction-start INSERT was the correct thing to do. Geoff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
Simon Riggswrote: In step 3 we discover that an entry exists in the index for a committed row. Since we have a unique index we use it to locate the row we know exists and UPDATE that. We don't use a new MVCC snapshot, we do what EPQ does. EPQ is already violating MVCC for UPDATEs, so why does it matter if we do it for INSERTs also? Before I go on to say why I think that this approach is problematic, I want to point out a few things that I think we actually agree on: * EPQ is fairly arbitrary as a behavior for READ COMMITTED UPDATE conflict handling. It has more to do with how VACUUM works than about some platonic ideal that everyone agrees on. * We can imagine other alternatives, such as the behavior in Oracle (statement level rollback + optimistic retry). * Those alternatives are probably better in some ways but worse in other ways. * EPQ violates snapshot consistency, even though that's not inherently necessary to avoid "READ COMMITTED serialization errors". * ON CONFLICT also violates snapshot consistency, in rather a different way. (Whether or not this is necessary is more debatable.) I actually think that other MVCC systems don't actually copy Oracle here, either, and for similar pragmatic reasons. It's a mixed bag. Where hides the problem? The problem is violating MVCC is something that can be done in different ways, and by meaningful degrees: * EPQ semantics are believed to be fine because we don't get complaints about it. I think that that's because it's specialized to UPDATEs and UPDATE-like operations, where we walk an UPDATE chain specifically, and only use a dirty snapshot for the chain's newer tuples. * ON CONFLICT doesn't care about UPDATE chains. Unlike EPQ, it makes no distinction between a concurrent UPDATE, and a concurrent DELETE + fresh INSERT. It's specialized to CONFLICTs. This might seem abstract, but it has real, practical implications. Certain contradictions exist when you start with MVCC semantics, then fall back to EPQ semantics, then finally fall back to ON CONFLICT semantics. Questions about mixing these two things: * What do we do if someone concurrently UPDATEs in a way that makes the qual not pass during EPQ traversal? Should we INSERT when that happens? * If so, what about the case when the MERGE join qual/unique index values didn't change (just some other attributes that do not pass the additional WHEN MATCHED qual)? * What about when there was a concurrent DELETE -- should we INSERT then? ON CONFLICT goes from a CONFLICT, and then applies its own qual. That's hugely different to doing it the other way around: starting from your own MVCC snapshot qual, and going to a CONFLICT. This is because evaluating the DO UPDATE's WHERE clause is just one little extra step after the one and only latest row for that value has been locked. You could theoretically go this way with 2PL, I think, because that's a bit like locking every row that the predicate touches, but of course that isn't at all practical. I should stop trying to make a watertight case against this, even though I still think that's possible. For now, instead, I'll just say that this is *extremely* complicated, and still has unresolved questions about semantics. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 6 November 2017 at 18:35, Peter Geogheganwrote: >> APPROACH2 (modified from my original proposal slightly) > > > This write-up actually begins to confront the issues that I've raised. > I'm glad to see this. > >> 1. Join... >> 2. Apply results for UPDATE, if present not visible via the snapshot >> taken at 1, do EPQ to ensure we locate current live tuple >> 3. If still not visible, do speculative insertion if we have a unique >> index available, otherwise ERROR. If spec insertion fails, go to 2 >> >> The loop created above can live-lock, meaning that an infinite loop >> could be created. > > > The loop is *guaranteed* to live-lock once you "goto 2". So you might as > well just throw an error at that point, which is the behavior that I've > been arguing for all along! > > If this isn't guaranteed to live-lock at "goto 2", then it's not clear > why. The outcome of step 2 is clearly going to be identical if you don't > acquire a new MVCC snapshot, but you don't address that. > > You might have meant "apply an equivalent ON CONFLICT DO UPDATE", or > something like that, despite the fact that the use of ON CONFLICT DO > NOTHING was clearly implied by the "goto 2". I also see problems with > that, but I'll wait for you to clarify what you meant before going into > what they are. In step 3 we discover that an entry exists in the index for a committed row. Since we have a unique index we use it to locate the row we know exists and UPDATE that. We don't use a new MVCC snapshot, we do what EPQ does. EPQ is already violating MVCC for UPDATEs, so why does it matter if we do it for INSERTs also? Where hides the problem? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
Simon Riggswrote: APPROACH1 1. Join to produce results based upon snapshot at start of query 2. Apply results for INSERT, UPDATE or DELETE Such failures are of great concern in practice because the time between 1 and 2 could be very long for large statements, or for smaller statements we might have sufficiently high concurrency to allow us to see regular failures. I'm not sure that they're a *great* concern in a world with something that targets UPSERT use cases, which is a situation that does not exist in DBMSs with MERGE (with the notable exception of Teradata). But it's clearly a concern that users may expect to avoid duplicate violations in READ COMMITTED, since this caused confusion among users of other database systems with MERGE. APPROACH2 (modified from my original proposal slightly) This write-up actually begins to confront the issues that I've raised. I'm glad to see this. 1. Join... 2. Apply results for UPDATE, if present not visible via the snapshot taken at 1, do EPQ to ensure we locate current live tuple 3. If still not visible, do speculative insertion if we have a unique index available, otherwise ERROR. If spec insertion fails, go to 2 The loop created above can live-lock, meaning that an infinite loop could be created. The loop is *guaranteed* to live-lock once you "goto 2". So you might as well just throw an error at that point, which is the behavior that I've been arguing for all along! If this isn't guaranteed to live-lock at "goto 2", then it's not clear why. The outcome of step 2 is clearly going to be identical if you don't acquire a new MVCC snapshot, but you don't address that. You might have meant "apply an equivalent ON CONFLICT DO UPDATE", or something like that, despite the fact that the use of ON CONFLICT DO NOTHING was clearly implied by the "goto 2". I also see problems with that, but I'll wait for you to clarify what you meant before going into what they are. In practice, such live-locks are rare and we could detect them by falling out of the loop after a few tries. Approach2's purpose is to alleviate errors in Approach1, so falling out of the loop merely takes us back to the error we would have got if we didn't try, so Approach2 has considerable benefit over Approach1. I don't hate the idea of retrying a fixed number of times for things like this, but I don't like it either. I'm going to assume that it's fine for now. I read that step 3 in Approach2 is some kind of problem in MVCC semantics. My understanding is that SQL Standard allows us to define what the semantics of the statement are in relation to concurrency, so any semantic issue can be handled by defining it to work the way we want. My only concern is that our choices here should be good ones, based on practical considerations. We both more or less agree on how this should be assessed, I think; we just reach different conclusions. As you point out, whichever we choose, we will be bound by those semantics. So if we take Approach1, as has been indicated currently, what is the written explanation for that, so we can show that to the people who ask in the future about our decisions? Well, Approach1 is what other systems implement. I think that it would be important to point out that MERGE with Approach1 isn't special, but ON CONFLICT DO UPDATE is special. We'd also say that higher isolation levels will not have duplicate violations. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 3 November 2017 at 16:35, Peter Geogheganwrote: > Simon Riggs wrote: The *only* behavioural difference I have proposed would be the *lack* of an ERROR in (some) concurrent cases. >>> >>> >>> I think that's a big difference. Error vs. non-error is a big deal by >>> itself; >> >> >> Are you saying avoiding an ERROR is a bad or good thing? > > > Are you really asking Robert to repeat what has already been said about > a dozen different ways? I'm asking for clarity of explanation rather than assertions. > That's *not* the only difference. You need to see a couple of steps > ahead to see further differences, as the real dilemma comes when you > have to reconcile having provided the UPSERT-guarantees with cases that > that doesn't map on to (which can happen in a number of different ways). > > I don't understand why you'll talk about just about anything but that. > This is a high-level concern about the overarching design. Do you really > not understand the concern at this point? You're either referring to what is in the docs, which is INSERT ... ON CONFLICT violates MVCC in a particular way, or something as yet unstated. If it is the former, then I still don't see the problem (see later). If it is the latter, I need more. So either way I need more. > Robert Haas said >In the past, there have been objections to implementations of MERGE >which would give rise to such serialization anomalies, but I'm not >sure we should feel bound by those discussions. One thing that's >different is that the common and actually-useful case can now be made >to work in a fairly satisfying way using INSERT .. ON CONFLICT UPDATE; >if less useful cases are vulnerable to some weirdness, maybe it's OK >to just document the problems. I agreed with that, and still do. We need a clear, explicit description of this situation so I will attempt that in detail here. The basic concurrency problem we have is this APPROACH1 1. Join to produce results based upon snapshot at start of query 2. Apply results for INSERT, UPDATE or DELETE Given there is a time delay between 1 and 2 there is a race condition so that if another user concurrently inserts the same value into a unique index then an INSERT will fail with a uniqueness violation. Such failures are of great concern in practice because the time between 1 and 2 could be very long for large statements, or for smaller statements we might have sufficiently high concurrency to allow us to see regular failures. APPROACH2 (modified from my original proposal slightly) 1. Join... 2. Apply results for UPDATE, if present not visible via the snapshot taken at 1, do EPQ to ensure we locate current live tuple 3. If still not visible, do speculative insertion if we have a unique index available, otherwise ERROR. If spec insertion fails, go to 2 The loop created above can live-lock, meaning that an infinite loop could be created. In practice, such live-locks are rare and we could detect them by falling out of the loop after a few tries. Approach2's purpose is to alleviate errors in Approach1, so falling out of the loop merely takes us back to the error we would have got if we didn't try, so Approach2 has considerable benefit over Approach1. This only applies if we do an INSERT, so if there is a WHEN NOT MATCHED ... AND clause with probability W, that makes the INSERT rare then we simply have the probablility of error in Approach2 approach the probability of error in Approach1 as the W drops to zero, but with W high we may avoid many errors. Approach2 never generates more errors than Approach1. I read that step 3 in Approach2 is some kind of problem in MVCC semantics. My understanding is that SQL Standard allows us to define what the semantics of the statement are in relation to concurrency, so any semantic issue can be handled by defining it to work the way we want. The semantics are: a) when a unique index is available we avoid errors by using semantics of INSERT .. ON CONFLICT UPDATE. b) when a unique index is not available we use other semantics. To me this is the same as INSERTs failing in the presence of unique indexes, but not failing when no index is present. The presence of a unique constraint alters the semantics of the query. We can choose Approach2 - as Robert says "[we should not] feel bound by those [earlier] discussions" Please explain what is wrong with the above without merely asserting there is a problem. As you point out, whichever we choose, we will be bound by those semantics. So if we take Approach1, as has been indicated currently, what is the written explanation for that, so we can show that to the people who ask in the future about our decisions? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
Re: [HACKERS] MERGE SQL Statement for PG11
Simon Riggswrote: The *only* behavioural difference I have proposed would be the *lack* of an ERROR in (some) concurrent cases. I think that's a big difference. Error vs. non-error is a big deal by itself; Are you saying avoiding an ERROR is a bad or good thing? Are you really asking Robert to repeat what has already been said about a dozen different ways? That's *not* the only difference. You need to see a couple of steps ahead to see further differences, as the real dilemma comes when you have to reconcile having provided the UPSERT-guarantees with cases that that doesn't map on to (which can happen in a number of different ways). I don't understand why you'll talk about just about anything but that. This is a high-level concern about the overarching design. Do you really not understand the concern at this point? also, the non-error case involves departing from MVCC semantics just as INSERT .. ON CONFLICT UPDATE does. Meaning what exactly? What situation occurs that a user would be concerned with? Please describe exactly what you mean so we get it clear. The concurrent behaviour for MERGE is allowed to be implementation-specific, so we can define it any way we want. Agreed -- we can. It isn't controversial at all to say that the SQL standard has nothing to say on this question. The problem is that the semantics you argue for are ill-defined, and seem to create more problems than they solve. Why keep bringing up the SQL standard? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 3 November 2017 at 08:26, Robert Haaswrote: > On Fri, Nov 3, 2017 at 1:05 PM, Simon Riggs wrote: >>> Therefore, if MERGE eventually uses INSERT .. ON CONFLICT >>> UPDATE when a relevant unique index exists and does something else, >>> such as your proposal of taking a strong lock, or Peter's proposal of >>> doing this in a concurrency-oblivious manner, in other cases, then >>> those two cases will behave very differently. >> >> The *only* behavioural difference I have proposed would be the *lack* >> of an ERROR in (some) concurrent cases. > > I think that's a big difference. Error vs. non-error is a big deal by > itself; Are you saying avoiding an ERROR is a bad or good thing? > also, the non-error case involves departing from MVCC > semantics just as INSERT .. ON CONFLICT UPDATE does. Meaning what exactly? What situation occurs that a user would be concerned with? Please describe exactly what you mean so we get it clear. The concurrent behaviour for MERGE is allowed to be implementation-specific, so we can define it any way we want. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
* Robert Haas (robertmh...@gmail.com) wrote: > On Fri, Nov 3, 2017 at 1:05 PM, Simon Riggswrote: > > We seem to have a few options for PG11 > > > > 1. Do nothing, we reject MERGE > > > > 2. Implement MERGE for unique index situations only, attempting to > > avoid errors (Simon OP) > > > > 3. Implement MERGE, but without attempting to avoid concurrent ERRORs > > (Peter) > > > > 4. Implement MERGE, while attempting to avoid concurrent ERRORs in > > cases where that is possible. > > > > Stephen, Robert, please say which option you now believe we should pick. > > I think Peter has made a good case for #3, so I lean toward that > option. I think #4 is too much of a non-obvious behavior difference > between the cases where we can avoid those errors and the cases where > we can't, and I don't see where #2 can go in the future other than #4. Agreed. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] MERGE SQL Statement for PG11
On 3 November 2017 at 07:46, Thomas Kellererwrote: > PMFJI > >> We seem to have a few options for PG11 >> >> 1. Do nothing, we reject MERGE >> >> 2. Implement MERGE for unique index situations only, attempting to >> avoid errors (Simon OP) >> >> 3. Implement MERGE, but without attempting to avoid concurrent ERRORs >> (Peter) >> >> 4. Implement MERGE, while attempting to avoid concurrent ERRORs in >> cases where that is possible. > > From an end-users point of view I would prefer 3 (or 4 if that won't prevent > this from going into 11) Sounds reasonable approach. > INSERT ... ON CONFLICT is great, but there are situations where the > restrictions can get in the way and it would be nice to have an alternative > - albeit with some (documented) drawbacks. As far as I know Oracle also > doesn't guarantee that MERGE is safe for concurrent use - you can still wind > up with a unique key violation. Yes, Oracle allows some unique key violations. It's clear that we would need to allow some also. So we clearly can't infer whether they avoid some errors just because they allow some. My approach will be to reduce the errors in the best way, not to try to copy errors Oracle makes, if any. But that error avoidance can easily be a later add-on if we prefer it that way. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Fri, Nov 3, 2017 at 1:05 PM, Simon Riggswrote: >> Therefore, if MERGE eventually uses INSERT .. ON CONFLICT >> UPDATE when a relevant unique index exists and does something else, >> such as your proposal of taking a strong lock, or Peter's proposal of >> doing this in a concurrency-oblivious manner, in other cases, then >> those two cases will behave very differently. > > The *only* behavioural difference I have proposed would be the *lack* > of an ERROR in (some) concurrent cases. I think that's a big difference. Error vs. non-error is a big deal by itself; also, the non-error case involves departing from MVCC semantics just as INSERT .. ON CONFLICT UPDATE does. > All I have at the moment is that a few people disagree, but that > doesn't help determine the next action. > > We seem to have a few options for PG11 > > 1. Do nothing, we reject MERGE > > 2. Implement MERGE for unique index situations only, attempting to > avoid errors (Simon OP) > > 3. Implement MERGE, but without attempting to avoid concurrent ERRORs (Peter) > > 4. Implement MERGE, while attempting to avoid concurrent ERRORs in > cases where that is possible. > > Stephen, Robert, please say which option you now believe we should pick. I think Peter has made a good case for #3, so I lean toward that option. I think #4 is too much of a non-obvious behavior difference between the cases where we can avoid those errors and the cases where we can't, and I don't see where #2 can go in the future other than #4. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
PMFJI > We seem to have a few options for PG11 > > 1. Do nothing, we reject MERGE > > 2. Implement MERGE for unique index situations only, attempting to > avoid errors (Simon OP) > > 3. Implement MERGE, but without attempting to avoid concurrent ERRORs > (Peter) > > 4. Implement MERGE, while attempting to avoid concurrent ERRORs in > cases where that is possible. >From an end-users point of view I would prefer 3 (or 4 if that won't prevent this from going into 11) INSERT ... ON CONFLICT is great, but there are situations where the restrictions can get in the way and it would be nice to have an alternative - albeit with some (documented) drawbacks. As far as I know Oracle also doesn't guarantee that MERGE is safe for concurrent use - you can still wind up with a unique key violation. -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 2 November 2017 at 22:59, Nico Williamswrote: > On Thu, Nov 02, 2017 at 03:25:48PM -0700, Peter Geoghegan wrote: >> Nico Williams wrote: >> >A MERGE mapped to a DML like this: >> >> This is a bad idea. An implementation like this is not at all >> maintainable. > > Assuming the DELETE issue can be addressed, why would this not be > maintainable? It would only take one change to make this approach infeasible and when that happened we would need to revert to the full-executor version. One difference that comes to mind is that MERGE doesn't behave the same way as an UPDATE-join, according to SQL:2011 in that it must throw an error if duplicate changes are requested. That would be hard to emulate using a parser only version. I would call it impressively clever but likely fragile, in this case, though I encourage more ideas like that in the future. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 2 November 2017 at 17:06, Robert Haaswrote: > On Tue, Oct 31, 2017 at 5:14 PM, Simon Riggs wrote: >> I've proposed a SQL Standard compliant implementation that would do >> much more than be new syntax over what we already have. >> >> So these two claims aren't accurate: "radical difference" and "syntax >> sugar over a capability we have". > > I think those claims are pretty accurate. No, because there is misunderstanding on some technical points. They key point is at the end - what do we do next? > The design of INSERT .. ON CONFLICT UPDATE and the supporting > machinery only work if there is a unique index. It provides radically > different behavior than what you get with any other DML statement, > with completely novel concurrency behavior, and there is no reasonable > way to emulate that behavior in cases where no relevant unique index > exists. Agreed > Therefore, if MERGE eventually uses INSERT .. ON CONFLICT > UPDATE when a relevant unique index exists and does something else, > such as your proposal of taking a strong lock, or Peter's proposal of > doing this in a concurrency-oblivious manner, in other cases, then > those two cases will behave very differently. The *only* behavioural difference I have proposed would be the *lack* of an ERROR in (some) concurrent cases. We have a way to avoid some concurrency errors during MERGE, while still maintaining exact SQL Standard behaviour. Peter has pointed out that we could not catch errors in certain cases; that does nothing to the cases where it will work well. It would be fallacious to imagine it is an all or nothing situation. > And if, in the meantime, MERGE can only handle the cases where there > is a unique index, I haven't proposed that MERGE would be forever limited to cases with a unique index, only that MERGE-for-unique-indexes would be the version in PG11, for good reason. > then it can only handle the cases INSERT .. ON > CONFLICT UPDATE can cover, which makes it, as far as I can see, > syntactic sugar over what we already have. SQL:2011 is greatly enhanced and this is no longer true; it was in earlier versions but is not now. MERGE allows you do DELETE, as well as conditional UPDATE and INSERT. It is very clearly much more than UPSERT. > Maybe it's not entirely - > you might be planning to make some minor functional enhancements - but > it's not clear what those are, and I feel like whatever it is could be > done with less work and more elegance by just extending the INSERT .. > ON CONFLICT UPDATE syntax. The differences are clearly apparent in the Standard and in my submitted docs. But I am interested in submitting a SQL Standard compliant feature. > And it does seem to be your intention to only handle the cases which > the INSERT .. ON CONFLICT UPDATE infrastructure can cover, because > upthread you wrote this: "I didn't say it but my intention was to just > throw an ERROR if no single unique index can be identified." I don't > think anybody's putting words into your mouth here. We're just > reading what you wrote. Happy to have written it because that covers the common use case I was hoping to deliver in PG11. Incremental development. My proposal was to implement the common case, while avoiding concurrency errors. Peter proposes that I cover both the common case and more general cases, without avoiding concurrency errors. All I have at the moment is that a few people disagree, but that doesn't help determine the next action. We seem to have a few options for PG11 1. Do nothing, we reject MERGE 2. Implement MERGE for unique index situations only, attempting to avoid errors (Simon OP) 3. Implement MERGE, but without attempting to avoid concurrent ERRORs (Peter) 4. Implement MERGE, while attempting to avoid concurrent ERRORs in cases where that is possible. Stephen, Robert, please say which option you now believe we should pick. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Thu, Nov 02, 2017 at 03:25:48PM -0700, Peter Geoghegan wrote: > Nico Williamswrote: > >A MERGE mapped to a DML like this: > > This is a bad idea. An implementation like this is not at all > maintainable. Assuming the DELETE issue can be addressed, why would this not be maintainable? > >can handle concurrency via ON CONFLICT DO NOTHING in the INSERT CTE. > > That's not handling concurrency -- it's silently ignoring an error. Who > is to say that the conflict that IGNORE ignored is associated with a row > visible to the MVCC snapshot of the statement? IOW, why should the DELETE > affect any row? Ah, yes, we'd have to make sure the DELETE does not delete rows that could not be inserted. There's... no way to find out what those would have been -- RETURNING won't mention them, though it'd be a nice addition to UPSERT to have a way to do that, and it'd make this mapping feasible. > There are probably a great many reasons why you need a ModifyTable > executor node that keeps around state, and explicitly indicates that a > MERGE is a MERGE. For example, we'll probably want statement level > triggers to execute in a fixed order, regardless of the MERGE, RLS will > probably require explicitly knowledge of MERGE semantics, and so on. Wouldn't those fire anyways in a statement like the one I mentioned? > FWIW, your example doesn't actually have a source (just a target), so it > isn't actually like MERGE. That can be added. I was trying to keep it pithy. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
Nico Williamswrote: A MERGE mapped to a DML like this: WITH updated AS ( UPDATE SET ... WHERE RETURNING ) , inserted AS ( INSERT INTO SELECT ... WHERE NOT IN (SELECT FROM updated) AND .. ON CONFLICT DO NOTHING -- see below! RETURNING ) DELETE FROM WHERE NOT IN (SELECT FROM updated) AND NOT IN (SELECT FROM inserted) AND ...; This is a bad idea. An implementation like this is not at all maintainable. can handle concurrency via ON CONFLICT DO NOTHING in the INSERT CTE. That's not handling concurrency -- it's silently ignoring an error. Who is to say that the conflict that IGNORE ignored is associated with a row visible to the MVCC snapshot of the statement? IOW, why should the DELETE affect any row? There are probably a great many reasons why you need a ModifyTable executor node that keeps around state, and explicitly indicates that a MERGE is a MERGE. For example, we'll probably want statement level triggers to execute in a fixed order, regardless of the MERGE, RLS will probably require explicitly knowledge of MERGE semantics, and so on. FWIW, your example doesn't actually have a source (just a target), so it isn't actually like MERGE. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Thu, Nov 02, 2017 at 02:05:19PM -0700, Peter Geoghegan wrote: > Simon Riggswrote: > >So in your view we should make no attempt to avoid concurrent errors, > >even when we have the capability to do so (in some cases) and doing so > >would be perfectly compliant with the SQLStandard. > > Yes. That's what I believe. I believe this because I can't see a way to > do this that isn't a mess, and because ON CONFLICT DO UPDATE exists and > works well for the cases where we can do better in READ COMMITTED mode. A MERGE mapped to a DML like this: WITH updated AS ( UPDATE SET ... WHERE RETURNING ) , inserted AS ( INSERT INTO SELECT ... WHERE NOT IN (SELECT FROM updated) AND .. ON CONFLICT DO NOTHING -- see below! RETURNING ) DELETE FROM WHERE NOT IN (SELECT FROM updated) AND NOT IN (SELECT FROM inserted) AND ...; can handle concurrency via ON CONFLICT DO NOTHING in the INSERT CTE. Now, one could write a MERGE that produces conflicts even without concurrency, so adding ON CONFLICT DO NOTHING by default as above... seems not-quite-correct. But presumably one wouldn't write MERGE statements that produce conflicts in the absence of concurrency, so this seems close enough to me. Another thing is that MERGE itself could get an ON CONFLICT clause for the INSERT portion of the MERGE, allowing one to ignore some conflicts and not others, though there would be no need for DO UPDATE, only DO NOTHING for conflict resolution :) This seems better. I do believe this mapping is correct, and could be implemented entirely in src/backend/parser/gram.y! Am I wrong about this? Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
Simon Riggswrote: I think people imagined you had worked out how to make MERGE run concurrently, I certainly did, but in fact you're just saying you don't believe it ever should. I'm certain that they didn't think that at all. But I'll let them speak for themselves. That is strange since the SQL Standard specifically allows the implementation to decide upon concurrent behaviour. And yet nobody else decided to do what you propose with this apparent leeway. (See the UPSERT wiki page for many references that confirm this.) So in your view we should make no attempt to avoid concurrent errors, even when we have the capability to do so (in some cases) and doing so would be perfectly compliant with the SQLStandard. Yes. That's what I believe. I believe this because I can't see a way to do this that isn't a mess, and because ON CONFLICT DO UPDATE exists and works well for the cases where we can do better in READ COMMITTED mode. Did you know that Oracle doesn't have an EPQ style mechanism at all? Instead, it rolls back the entire statement and retries it from scratch. That is not really any further from or closer to the standard than the EPQ stuff, because the standard doesn't say anything about what should happen as far as READ COMMITTED conflict handling goes. My point here is that all of the stuff I'm talking about is only relevant in READ COMMITTED mode, in areas where the standard never provides us with guidance. If you can rely on SSI, then there is no difference between what you propose and what I propose anyway, except that what I propose is more general and will have better performance, especially for batch MERGEs. If READ COMMITTED didn't exist, implementing ON CONFLICT would have been more or less free of controversy. Yes, that certainly will make an easier patch for MERGE. Indeed, it will. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 2 November 2017 at 19:16, Peter Geogheganwrote: > Simon Riggs wrote: >> >> So if I understand you correctly, in your view MERGE should just fail >> with an ERROR if it runs concurrently with other DML? > > > That's certainly my opinion on the matter. It seems like that might be > the consensus, too. Given that I only just found out what you've been talking about, I don't believe that anybody else did either. I think people imagined you had worked out how to make MERGE run concurrently, I certainly did, but in fact you're just saying you don't believe it ever should. That is strange since the SQL Standard specifically allows the implementation to decide upon concurrent behaviour. > Without meaning to sound glib: we already did make it work for a > special, restricted case that is important enough to justify introducing > a couple of kludges -- ON CONFLICT DO UPDATE/upsert. > > I do agree that what I propose for MERGE will probably cause confusion; > just look into Oracle's MERGE implementation for examples of this. We > ought to go out of our way to make it clear that MERGE doesn't provide > these guarantees. So in your view we should make no attempt to avoid concurrent errors, even when we have the capability to do so (in some cases) and doing so would be perfectly compliant with the SQLStandard. Yes, that certainly will make an easier patch for MERGE. Or are you arguing against allowing any patch for MERGE? Now we have more clarity, who else agrees with this? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Thu, Nov 02, 2017 at 12:51:45PM -0700, Peter Geoghegan wrote: > Nico Williamswrote: > >If you want to ignore conflicts arising from concurrency you could > >always add an ON CONFLICT DO NOTHING to the INSERT DML in the mapping I > >proposed earlier. Thus a MERGE CONCURRENTLY could just do that. > > > >Is there any reason not to map MERGE as I proposed? > > Performance, for one. MERGE generally has a join that can be optimized > like an UPDATE FROM join. Ah, right, I think my mapping was pessimal. How about this mapping instead then: WITH updated AS ( UPDATE SET ... WHERE RETURNING ) , inserted AS ( INSERT INTO SELECT ... WHERE NOT IN (SELECT FROM updated) AND .. /* * Add ON CONFLICT DO NOTHING here to avoid conflicts in the face * of concurrency. */ RETURNING ) DELETE FROM WHERE NOT IN (SELECT FROM updated) AND NOT IN (SELECT FROM inserted) AND ...; ? If a MERGE has no delete clause, then the mapping would be: WITH updated AS ( UPDATE SET ... WHERE RETURNING ) INSERT INTO SELECT ... WHERE NOT IN (SELECT FROM updated) AND .. /* * Add ON CONFLICT DO NOTHING here to avoid conflicts in the face * of concurrency. */ ; > I haven't studied this question in any detail, but FWIW I think that > using CTEs for merging is morally equivalent to a traditional MERGE > implementation. [...] I agree. So why not do that initially? Optimize later. Such a MERGE mapping could be implemented entirely within src/backend/parser/gram.y ... Talk about cheap to implement, review, and maintain! Also, this would be notionally very simple. Any optimizations to CTE query/DML execution would be generic and applicable to MERGE and other things besides. If mapping MERGE to CTE-using DMLs motivates such optimizations, all the better. > [...]. It may actually be possible to map from CTEs to a MERGE > statement, but I don't think that that's a good approach to implementing > MERGE. Surely not every DML with CTEs can map to MERGE. Maybe I misunderstood your comment? > Most of the implementation time will probably be spent doing things like > making sure MERGE behaves appropriately with triggers, RLS, updatable > views, and so on. That will take quite a while, but isn't particularly > technically challenging IMV. Note that mapping to a DML with CTEs as above gets triggers, RLS, and updateable views right from the get-go, because DMLs with CTEs, and DMLs as CTEs, surely do as well. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
Simon Riggswrote: So if I understand you correctly, in your view MERGE should just fail with an ERROR if it runs concurrently with other DML? That's certainly my opinion on the matter. It seems like that might be the consensus, too. Obviously there are things that you as a user can do about this on your own, like opt to use a higher isolation level, or manually LOCK TABLE. For some use cases, including bulk loading for OLAP, users might just know that there isn't going to be concurrent activity because it's not an OLTP system. If this still seems odd to you, then consider that exactly the same situation exists with UPDATE. A user could want their UPDATE to affect a row where no row version is actually visible to their MVCC snapshot, because they have an idea about reliably updating the latest row. UPDATE doesn't work like that, of course. Is this unacceptable because the user expects that it should work that way? Bear in mind that ON CONFLICT DO UPDATE *can* actually update a row when there is no version of it visible to the snapshot. It can also update a row where there is a concurrent DELETE + INSERT, and the tuples with the relevant unique index values end up not even being part of the same update chain in each case (MVCC-snapshot-visible vs. latest). IOW, you may end up updating a completely different logical row to the row with the conflicting value that is visible to your MVCC snapshot! i.e. if a race condition between the query and an INSERT runs concurrently with another INSERT We have no interest in making that work? Without meaning to sound glib: we already did make it work for a special, restricted case that is important enough to justify introducing a couple of kludges -- ON CONFLICT DO UPDATE/upsert. I do agree that what I propose for MERGE will probably cause confusion; just look into Oracle's MERGE implementation for examples of this. We ought to go out of our way to make it clear that MERGE doesn't provide these guarantees. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
Nico Williamswrote: If you want to ignore conflicts arising from concurrency you could always add an ON CONFLICT DO NOTHING to the INSERT DML in the mapping I proposed earlier. Thus a MERGE CONCURRENTLY could just do that. Is there any reason not to map MERGE as I proposed? Performance, for one. MERGE generally has a join that can be optimized like an UPDATE FROM join. I haven't studied this question in any detail, but FWIW I think that using CTEs for merging is morally equivalent to a traditional MERGE implementation. It may actually be possible to map from CTEs to a MERGE statement, but I don't think that that's a good approach to implementing MERGE. Most of the implementation time will probably be spent doing things like making sure MERGE behaves appropriately with triggers, RLS, updatable views, and so on. That will take quite a while, but isn't particularly technically challenging IMV. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Thu, Nov 02, 2017 at 06:49:18PM +, Simon Riggs wrote: > On 1 November 2017 at 18:20, Peter Geogheganwrote: > > In Postgres, you can avoid duplicate violations with MERGE by using a > > higher isolation level (these days, those are turned into a > > serialization error at higher isolation levels when no duplicate is > > visible to the xact's snapshot). > > So if I understand you correctly, in your view MERGE should just fail > with an ERROR if it runs concurrently with other DML? > > i.e. if a race condition between the query and an INSERT runs > concurrently with another INSERT > > We have no interest in making that work? If you map MERGE to a DML with RETURNING-DML CTEs as I suggested before, how would that interact with concurrent DMLs? The INSERT DML of the mapped statement could produce conflicts that abort the whole MERGE, correct? If you want to ignore conflicts arising from concurrency you could always add an ON CONFLICT DO NOTHING to the INSERT DML in the mapping I proposed earlier. Thus a MERGE CONCURRENTLY could just do that. Is there any reason not to map MERGE as I proposed? Such an implementation of MERGE wouldn't be online because CTEs are always implemented sequentially currently. That's probably reason enough to eventually produce a native implementation of MERGE, ... or to revamp the CTE machinery to allow such a mapping to be online. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
If nothing else, anyone needing MERGE can port their MERGE statements to a DML with DML-containing CTEs... The generic mapping would be something like this, I think: WITH rows AS (SELECT FROM WHERE ) , updated AS ( UPDATE SET ... WHERE IN (SELECT FROM rows) /* matched */ RETURNING ) , inserted AS ( INSERT INTO SELECT ... WHERE NOT IN (SELECT FROM rows) /* not matched */ RETURNING ) DELETE FROM WHERE (...) AND NOT IN (SELECT FROM updated UNION SELECT FROM inserted); Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 1 November 2017 at 18:20, Peter Geogheganwrote: > In Postgres, you can avoid duplicate violations with MERGE by using a > higher isolation level (these days, those are turned into a > serialization error at higher isolation levels when no duplicate is > visible to the xact's snapshot). So if I understand you correctly, in your view MERGE should just fail with an ERROR if it runs concurrently with other DML? i.e. if a race condition between the query and an INSERT runs concurrently with another INSERT We have no interest in making that work? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
Robert Haaswrote: And if, in the meantime, MERGE can only handle the cases where there is a unique index, then it can only handle the cases INSERT .. ON CONFLICT UPDATE can cover, which makes it, as far as I can see, syntactic sugar over what we already have. Maybe it's not entirely - you might be planning to make some minor functional enhancements - but it's not clear what those are, and I feel like whatever it is could be done with less work and more elegance by just extending the INSERT .. ON CONFLICT UPDATE syntax. +1 Marko Tiikkaja's INSERT ... ON CONFLICT SELECT patch, which is in the current CF [1], moves things in this direction. I myself have occasionally wondered if it was worth adding an alternative DO DELETE conflict_action. This could appear alongside DO UPDATE, and be applied using MERGE-style conditions. All of these things seem like small adjuncts to ON CONFLICT because they're all just an alternative way of modifying or projecting the tuple that is locked by ON CONFLICT. Everything new would have to happen after the novel ON CONFLICT handling has already completed. The only reason that I haven't pursued this is because it doesn't seem that compelling. I mention it now because It's worth acknowledging that ON CONFLICT could be pushed a bit further in this direction. Of course, this still falls far short of making ON CONFLICT entirely like MERGE. [1] https://commitfest.postgresql.org/15/1241/ -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Tue, Oct 31, 2017 at 5:14 PM, Simon Riggswrote: > I've proposed a SQL Standard compliant implementation that would do > much more than be new syntax over what we already have. > > So these two claims aren't accurate: "radical difference" and "syntax > sugar over a capability we have". I think those claims are pretty accurate. The design of INSERT .. ON CONFLICT UPDATE and the supporting machinery only work if there is a unique index. It provides radically different behavior than what you get with any other DML statement, with completely novel concurrency behavior, and there is no reasonable way to emulate that behavior in cases where no relevant unique index exists. Therefore, if MERGE eventually uses INSERT .. ON CONFLICT UPDATE when a relevant unique index exists and does something else, such as your proposal of taking a strong lock, or Peter's proposal of doing this in a concurrency-oblivious manner, in other cases, then those two cases will behave very differently. And if, in the meantime, MERGE can only handle the cases where there is a unique index, then it can only handle the cases INSERT .. ON CONFLICT UPDATE can cover, which makes it, as far as I can see, syntactic sugar over what we already have. Maybe it's not entirely - you might be planning to make some minor functional enhancements - but it's not clear what those are, and I feel like whatever it is could be done with less work and more elegance by just extending the INSERT .. ON CONFLICT UPDATE syntax. And it does seem to be your intention to only handle the cases which the INSERT .. ON CONFLICT UPDATE infrastructure can cover, because upthread you wrote this: "I didn't say it but my intention was to just throw an ERROR if no single unique index can be identified." I don't think anybody's putting words into your mouth here. We're just reading what you wrote. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Thu, Nov 2, 2017 at 8:28 AM, Craig Ringerwrote: > On 1 November 2017 at 01:55, Peter Geoghegan wrote: >> The problem here is: Iff the first statement uses ON CONFLICT >> infrastructure, doesn't the absence of WHEN NOT MATCHED imply >> different semantics for the remaining updates and deletes in the >> second version of the query? You've removed what seems like a neat >> adjunct to the MERGE, but it actually changes everything else too when >> using READ COMMITTED. > > Would these concerns be alleviated by adding some kind of Pg-specific > decoration that constrained concurrency-safe MERGEs? > > So your first statement would be > > MERGE CONCURRENTLY ... > > and when you removed the WHEN NOT MATCHED clause it'd ERROR because > that's no longer able to be done with the same concurrency-safe > semantics? > > I don't know if this would be helpful TBH, or if it would negate > Simon's compatibility goals. Just another idea. Yes, that fixes the problem. Of course, it also turns MERGE CONCURRENTLY into syntactic sugar for INSERT ON CONFLICT UPDATE, which brings one back to the question of exactly what we're trying to achieve here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 1 November 2017 at 01:55, Peter Geogheganwrote: > The problem here is: Iff the first statement uses ON CONFLICT > infrastructure, doesn't the absence of WHEN NOT MATCHED imply > different semantics for the remaining updates and deletes in the > second version of the query? You've removed what seems like a neat > adjunct to the MERGE, but it actually changes everything else too when > using READ COMMITTED. Would these concerns be alleviated by adding some kind of Pg-specific decoration that constrained concurrency-safe MERGEs? So your first statement would be MERGE CONCURRENTLY ... and when you removed the WHEN NOT MATCHED clause it'd ERROR because that's no longer able to be done with the same concurrency-safe semantics? I don't know if this would be helpful TBH, or if it would negate Simon's compatibility goals. Just another idea. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Wed, Nov 1, 2017 at 10:19 AM, Simon Riggswrote: >> The problem here is: Iff the first statement uses ON CONFLICT >> infrastructure, doesn't the absence of WHEN NOT MATCHED imply >> different semantics for the remaining updates and deletes in the >> second version of the query? > > Not according to the SQL Standard, no. I have no plans for such > differences to exist. > > Spec says: If we hit HeapTupleSelfUpdated then we throw an ERROR. Your documentation said that the MERGE was driven by a speculative insertion (BTW, I don't think that this internal implementation detail should be referenced in user-facing docs). I inferred that that could not always be true, since there won't always be an INSERT/WHEN NOT MATCHED case, assuming that you allow that at all (I now gather that you will). >> You've removed what seems like a neat >> adjunct to the MERGE, but it actually changes everything else too when >> using READ COMMITTED. Isn't that pretty surprising? > > I think you're presuming things I haven't said and don't mean, so > we're both surprised. You're right -- I'm surmising what I think might be true, because I don't have the information available to know one way or the other. As far as this issue with using speculative insertions in one context but not in another goes, I still don't really know where you stand. I can still only surmise that you must want both implementations, and will use one or the other as circumstances dictate (to avoid dup violations in the style of ON CONFLICT where that's possible). This seems true because you now say that it will be possible to omit WHEN NOT MATCHED, and yet there is no such thing as a speculative insertion without the insertion. You haven't said that that conclusion is true yourself, but it's the only conclusion that I can draw based on what you have said. > I think we need some way of expressing the problems clearly. It's certainly hard to talk about these problems. I know this from experience. > "a general purpose solution is one that more or > less works like an UPDATE FROM, with an outer join, whose ModifyTable > node is capable of insert, update, or delete (and accepts quals for > MATCHED and NOT matched cases, etc). You could still get duplicate > violations due to concurrent activity in READ COMMITTED mode". > > Surely the whole point of this is to avoid duplicate violations due to > concurrent activity? Now we're getting somewhere. I *don't* think that that's the whole point of MERGE. No other MERGE implementation does that, or claims to do that. The SQL standard says nothing about this. Heikki found this to be acceptable when working on the GSoC MERGE implementation that went nowhere. My position is that we ought to let MERGE be MERGE, and let ON CONFLICT be ON CONFLICT. In Postgres, you can avoid duplicate violations with MERGE by using a higher isolation level (these days, those are turned into a serialization error at higher isolation levels when no duplicate is visible to the xact's snapshot). MERGE isn't and shouldn't be special when it comes to concurrency. > I'm not seeing how either design sketch rigorously avoids live locks, > but those are fairly unlikely and easy to detect and abort. My MERGE semantics (which really are not mine at all) avoid live lock/lock starvation by simply never retrying anything without making forward progress. MERGE doesn't take any special interest in concurrency, just like any other DML statement that isn't INSERT with ON CONFLICT. ON CONFLICT would have live locks if it didn't always have the choice of inserting [1]. In many ways, the syntax of INSERT ON CONFLICT DO UPDATE is restricted in exactly the way it needs to be in order to function correctly. It wasn't an accident that it didn't end up being UPDATE ... ON NOUPDATE DO INSERT, or something like that, which Robert proposed at one point. ON CONFLICT plays by its own rules to a certain extent, because that's what you need in order to get the desired guarantees in READ COMMITTED mode [2]. This is the main reason why it was as painful a project as it was. Further generalizing that seems fraught with difficulties. It seems logically impossible to generalize it in a way where you don't end up with two behaviors masquerading as one. [1] https://wiki.postgresql.org/wiki/UPSERT#Theoretical_lock_starvation_hazards [2] https://wiki.postgresql.org/wiki/UPSERT#Goals_for_implementation -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 31 October 2017 at 18:55, Peter Geogheganwrote: > On Tue, Oct 31, 2017 at 2:25 AM, Simon Riggs wrote: >> If there are challenges ahead, its reasonable to ask for test cases >> for that now especially if you think you know what they already are. >> Imagine we go forwards 2 months - if you dislike my patch when it >> exists you will submit a test case showing the fault. Why not save us >> all the trouble and describe that now? Test Driven Development. > > I already have, on several occasions now. But if you're absolutely > insistent on my constructing the test case in terms of a real SQL > statement, then that's what I'll do. > > Consider this MERGE statement, from your mock documentation: > > MERGE INTO wines w > USING wine_stock_changes s > ON s.winename = w.winename > WHEN NOT MATCHED AND s.stock_delta > 0 THEN > INSERT VALUES(s.winename, s.stock_delta) > WHEN MATCHED AND w.stock + s.stock_delta > 0 THEN > UPDATE SET stock = w.stock + s.stock_delta > ELSE > DELETE; > > Suppose we remove the WHEN NOT MATCHED case, leaving us with: > > MERGE INTO wines w > USING wine_stock_changes s > ON s.winename = w.winename > WHEN MATCHED AND w.stock + s.stock_delta > 0 THEN > UPDATE SET stock = w.stock + s.stock_delta > ELSE > DELETE; > > We now have a MERGE that will not INSERT, but will continue to UPDATE > and DELETE. Agreed > (It's implied that your syntax cannot do this at all, > because you propose use the ON CONFLICT infrastructure, but I think we > have to imagine a world in which that restriction either never existed > or has subsequently been lifted.) > > The problem here is: Iff the first statement uses ON CONFLICT > infrastructure, doesn't the absence of WHEN NOT MATCHED imply > different semantics for the remaining updates and deletes in the > second version of the query? Not according to the SQL Standard, no. I have no plans for such differences to exist. Spec says: If we hit HeapTupleSelfUpdated then we throw an ERROR. > You've removed what seems like a neat > adjunct to the MERGE, but it actually changes everything else too when > using READ COMMITTED. Isn't that pretty surprising? I think you're presuming things I haven't said and don't mean, so we're both surprised. > If you're not > clear on what I mean, see my previous remarks on EPQ, live lock, and > what a CONFLICT could be in READ COMMITTED mode. Concurrent activity > at READ COMMITTED mode can be expected to significantly alter the > outcome here. And I still have questions about what exactly you mean, but at least this post is going in the right direction and I'm encouraged. Thank you, I think we need some way of expressing the problems clearly. > That is rather frustrating. Guess so. >> You've said its possible another way. Show that assertion is actually >> true. We're all listening, me especially, for the technical details. > > My proposal, if you want to call it that, has the merit of actually > being how MERGE works in every other system. Both Robert and Stephen > seem to be almost sold on what I suggest, so I think that I've > probably already explained my position quite well. The only info I have is "a general purpose solution is one that more or less works like an UPDATE FROM, with an outer join, whose ModifyTable node is capable of insert, update, or delete (and accepts quals for MATCHED and NOT matched cases, etc). You could still get duplicate violations due to concurrent activity in READ COMMITTED mode". Surely the whole point of this is to avoid duplicate violations due to concurrent activity? I'm not seeing how either design sketch rigorously avoids live locks, but those are fairly unlikely and easy to detect and abort. Thank you for a constructive email, we are on the way to somewhere good. I have more to add, but wanted to get back to you soonish. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Tue, Oct 31, 2017 at 2:25 AM, Simon Riggswrote: > If there are challenges ahead, its reasonable to ask for test cases > for that now especially if you think you know what they already are. > Imagine we go forwards 2 months - if you dislike my patch when it > exists you will submit a test case showing the fault. Why not save us > all the trouble and describe that now? Test Driven Development. I already have, on several occasions now. But if you're absolutely insistent on my constructing the test case in terms of a real SQL statement, then that's what I'll do. Consider this MERGE statement, from your mock documentation: MERGE INTO wines w USING wine_stock_changes s ON s.winename = w.winename WHEN NOT MATCHED AND s.stock_delta > 0 THEN INSERT VALUES(s.winename, s.stock_delta) WHEN MATCHED AND w.stock + s.stock_delta > 0 THEN UPDATE SET stock = w.stock + s.stock_delta ELSE DELETE; Suppose we remove the WHEN NOT MATCHED case, leaving us with: MERGE INTO wines w USING wine_stock_changes s ON s.winename = w.winename WHEN MATCHED AND w.stock + s.stock_delta > 0 THEN UPDATE SET stock = w.stock + s.stock_delta ELSE DELETE; We now have a MERGE that will not INSERT, but will continue to UPDATE and DELETE. (It's implied that your syntax cannot do this at all, because you propose use the ON CONFLICT infrastructure, but I think we have to imagine a world in which that restriction either never existed or has subsequently been lifted.) The problem here is: Iff the first statement uses ON CONFLICT infrastructure, doesn't the absence of WHEN NOT MATCHED imply different semantics for the remaining updates and deletes in the second version of the query? You've removed what seems like a neat adjunct to the MERGE, but it actually changes everything else too when using READ COMMITTED. Isn't that pretty surprising? If you're not clear on what I mean, see my previous remarks on EPQ, live lock, and what a CONFLICT could be in READ COMMITTED mode. Concurrent activity at READ COMMITTED mode can be expected to significantly alter the outcome here. Why not just always use the behavior that the second query requires, which is very much like an UPDATE FROM with an outer join that can sometimes do deletes (and inserts)? We should always use an MVCC snapshot, and never play ON CONFLICT style games with visibility/dirty snapshots. > It's difficult to discuss anything with someone that refuses to > believe that there are acceptable ways around things. I believe there > are. Isn't that blind faith? Again, it seems like you haven't really tried to convince me. > If you can calm down the rhetoric we can work together, but if you > continue to grandstand it makes it more difficult. I'm trying to break the deadlock, by getting you to actually consider what I'm saying. I don't enjoy confrontation. Currently, it seems like you're just ignoring my objections, which you actually could fairly easily work through. That is rather frustrating. > You've said its possible another way. Show that assertion is actually > true. We're all listening, me especially, for the technical details. My proposal, if you want to call it that, has the merit of actually being how MERGE works in every other system. Both Robert and Stephen seem to be almost sold on what I suggest, so I think that I've probably already explained my position quite well. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 31 October 2017 at 12:56, Stephen Frostwrote: > Simon, > > * Simon Riggs (si...@2ndquadrant.com) wrote: >> On 30 October 2017 at 19:55, Stephen Frost wrote: >> > I don't think MERGE should be radically different from other database >> > systems and just syntax sugar over a capability we have. >> >> I've proposed a SQL Standard compliant implementation that would do >> much more than be new syntax over what we already have. >> >> So these two claims aren't accurate: "radical difference" and "syntax >> sugar over a capability we have". > > Based on the discussion so far, those are the conclusions I've come to. > Saying they're isn't accurate without providing anything further isn't > likely to be helpful. I'm trying to be clear, accurate and non-confrontational. I appreciate those are your conclusions, which is why I need to be clear that the proposal has been misunderstood on those stated points. What else would you like me to add to be helpful? I will try... I've spent weeks looking at other implementations and combing the SQLStandard with a fine tooth comb to see what is possible and what is not. My proposal shows a new way forwards and yet follows the standard in horrible detail. It has taken me nearly 2 months to write the doc page submitted here. It is not simply new syntax stretched over existing capability. The full syntax of MERGE is quite powerful and will take some months to make work, even with my proposal. I'm not sure what else to say, but I am happy to answer specific technical questions that have full context to allow a rational discussion. >> > Time changes >> > many things, but I don't think anything's changed in this from the prior >> > discussions about it. >> >> My proposal is new, that is what has changed. > > I'm happy to admit that I've missed something in the discussion, but > from what I read the difference appeared to be primairly that you're > proposing it, and doing so a couple years later. > >> At this stage, general opinions can be misleading. > > I'd certainly love to see a MERGE capability that meets the standard and > works in much the same way from a user's perspective as the other RDBMS' > which already implement it. The standard explains how it should work, with the proviso that the standard allows us to define concurrent behavior. Serge has explained that he sees that my proposal covers the main use cases. I don't yet see how to cover all, but I'm looking to do the most important use cases first and other things later. > From prior discussions with Peter on > exactly that subject, I'm also convinced that having that would be a > largely independent piece of work from the INSERT .. ON CONFLICT > capability that we have (just as MERGE is distinct from similar UPSERT > capabilities in other RDBMSs). > > The goal here is really just to avoid time wasted developing MERGE based > on top of the INSERT .. ON CONFLICT system only to have it be rejected > later because multiple other committers and major contributors have said > that they don't agree with that approach. If, given all of this > discussion, you don't feel that's a high risk with your approach then by > all means continue on with what you're thinking and we can review the > patch once it's posted. It is certainly a risk and I don't want to waste time either. I am happy to consider any complete proposal for how to proceed, including details and/or code, but I will be prioritising things to ensure that we have a candidate feature for PG11 from at least one author, rather than a research project for PG12+. The key point here is concurrency. I have said that there is an intermediate step that is useful and achievable, but does not cover every case. Serge has also stated this - who it should be noted is the only person in this discussion that has already written the MERGE statement in SQL, for DB2, so I give more weight to his technical opinion than I do to others, at this time. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
Simon, * Simon Riggs (si...@2ndquadrant.com) wrote: > On 30 October 2017 at 19:55, Stephen Frostwrote: > > I don't think MERGE should be radically different from other database > > systems and just syntax sugar over a capability we have. > > I've proposed a SQL Standard compliant implementation that would do > much more than be new syntax over what we already have. > > So these two claims aren't accurate: "radical difference" and "syntax > sugar over a capability we have". Based on the discussion so far, those are the conclusions I've come to. Saying they're isn't accurate without providing anything further isn't likely to be helpful. > > Time changes > > many things, but I don't think anything's changed in this from the prior > > discussions about it. > > My proposal is new, that is what has changed. I'm happy to admit that I've missed something in the discussion, but from what I read the difference appeared to be primairly that you're proposing it, and doing so a couple years later. > At this stage, general opinions can be misleading. I'd certainly love to see a MERGE capability that meets the standard and works in much the same way from a user's perspective as the other RDBMS' which already implement it. From prior discussions with Peter on exactly that subject, I'm also convinced that having that would be a largely independent piece of work from the INSERT .. ON CONFLICT capability that we have (just as MERGE is distinct from similar UPSERT capabilities in other RDBMSs). The goal here is really just to avoid time wasted developing MERGE based on top of the INSERT .. ON CONFLICT system only to have it be rejected later because multiple other committers and major contributors have said that they don't agree with that approach. If, given all of this discussion, you don't feel that's a high risk with your approach then by all means continue on with what you're thinking and we can review the patch once it's posted. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] MERGE SQL Statement for PG11
On 30 October 2017 at 19:55, Stephen Frostwrote: > * Robert Haas (robertmh...@gmail.com) wrote: >> On Sun, Oct 29, 2017 at 1:19 AM, Simon Riggs wrote: >> > Nothing I am proposing blocks later work. >> >> That's not really true. Nobody's going to be happy if MERGE has one >> behavior in one set of cases and an astonishingly different behavior >> in another set of cases. If you adopt a behavior for certain cases >> that can't be extended to other cases, then you're blocking a >> general-purpose MERGE. >> >> And, indeed, it seems that you're proposing an implementation that >> adds no new functionality, just syntax compatibility. Do we really >> want or need two syntaxes for the same thing in core? I kinda think >> Peter might have the right idea here. Under his proposal, we'd be >> getting something that is, in a way, new. > > +1. > > I don't think MERGE should be radically different from other database > systems and just syntax sugar over a capability we have. I've proposed a SQL Standard compliant implementation that would do much more than be new syntax over what we already have. So these two claims aren't accurate: "radical difference" and "syntax sugar over a capability we have". > Time changes > many things, but I don't think anything's changed in this from the prior > discussions about it. My proposal is new, that is what has changed. At this stage, general opinions can be misleading. Hi ho, hi ho. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
Can I add my 2c worth, as someone without a horse in the race, as it were, in the hope that telling me how I've got this wrong might clarify the argument a bit (or at least you can all start shouting at me rather than each other :) ) The point of merge is to allow you to choose to either INSERT or UPDATE (or indeed DELETE) records based on existing state, yes? That state is whatever the state of the system at the start of the transaction? If I understand correctly, the only time when this would be problematic is if you try to insert a record into a table which would not allow that INSERT because another transaction has performed an INSERT by the time the COMMIT happens, and where that new record would have changed the state of the MERGE clause, yes? Isn't the only reason this would fail if there is a unique constraint on that table? Yes, you could end up INSERTing values from the merge when another transaction has INSERTed another, but (again, unless I've misunderstood) there's nothing in the spec that says that shouldn't happen; meanwhile for those tables that do require unique values you can use the UPSERT mechanism, no? Geoff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 30 October 2017 at 19:17, Peter Geogheganwrote: > On Mon, Oct 30, 2017 at 11:07 AM, Simon Riggs wrote: >> Please explain in detail the MERGE SQL statements that you think will >> be problematic and why. > > Your proposal is totally incomplete, so I can only surmise its > behavior in certain cases, to make a guess at what the problems might > be (see my remarks on EPQ, live lock, etc). This makes it impossible > to do what you ask right now. Impossible, huh. Henry Ford was right. If there are challenges ahead, its reasonable to ask for test cases for that now especially if you think you know what they already are. Imagine we go forwards 2 months - if you dislike my patch when it exists you will submit a test case showing the fault. Why not save us all the trouble and describe that now? Test Driven Development. > Besides, you haven't answered the question from my last e-mail > ("What's wrong with that [set of MERGE semantics]?"), so why should I > go to further trouble? You're just not constructively engaging with me > at this point. It's difficult to discuss anything with someone that refuses to believe that there are acceptable ways around things. I believe there are. If you can calm down the rhetoric we can work together, but if you continue to grandstand it makes it more difficult. > We're going around in circles. Not really. You've said some things and I'm waiting for further details about the problems you've raised. You've said its possible another way. Show that assertion is actually true. We're all listening, me especially, for the technical details. I've got a fair amount of work to do to get the rest of the patch in shape, so take your time and make it a complete explanation. My only goal is the MERGE feature in PG11. For me this is a collaborative engineering challenge not a debate and definitely not an argument. If you describe your plan of how to do this, I may be able to follow it and include that design. If you don't, then it will be difficult for me to include your thoughts. If you or others wish to write something as well, I have already said that is OK too. If anybody's goal is to block development or to wait for perfection to exist at some unstated time in the future, than I disagree with those thoughts. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
* Robert Haas (robertmh...@gmail.com) wrote: > On Sun, Oct 29, 2017 at 1:19 AM, Simon Riggswrote: > > Nothing I am proposing blocks later work. > > That's not really true. Nobody's going to be happy if MERGE has one > behavior in one set of cases and an astonishingly different behavior > in another set of cases. If you adopt a behavior for certain cases > that can't be extended to other cases, then you're blocking a > general-purpose MERGE. > > And, indeed, it seems that you're proposing an implementation that > adds no new functionality, just syntax compatibility. Do we really > want or need two syntaxes for the same thing in core? I kinda think > Peter might have the right idea here. Under his proposal, we'd be > getting something that is, in a way, new. +1. I don't think MERGE should be radically different from other database systems and just syntax sugar over a capability we have. The downthread comparison to partitioning isn't accurate either. There's a reason that we have INSERT .. ON CONFLICT and not MERGE and it's because they aren't the same thing, as Peter's already explained, both now and when he and I had exactly this same discussion years ago when he was working on implementing INSERT .. ON CONFLICT. Time changes many things, but I don't think anything's changed in this from the prior discussions about it. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] MERGE SQL Statement for PG11
On Mon, Oct 30, 2017 at 10:59:43AM -0700, Peter Geoghegan wrote: > On Mon, Oct 30, 2017 at 6:21 AM, Simon Riggswrote: > > If a general purpose solution exists, please explain what it is. > > For the umpteenth time, a general purpose solution is one that more or > less works like an UPDATE FROM, with an outer join, whose ModifyTable > node is capable of insert, update, or delete (and accepts quals for > MATCHED and NOT matched cases, etc). You could still get duplicate > violations due to concurrent activity in READ COMMITTED mode, but not > at higher isolation levels thanks to Thomas Munro's work there. In > this world, ON CONFLICT and MERGE are fairly distinct things. FWIW, and as an outsider, having looked at MERGE docs from other RDBMSes, I have to agree that the PG UPSERT (ON CONFLICT .. DO) and MERGE are rather different beasts. In particular, I suspect all UPSERT statements can be mapped onto equivalent MERGE statements, but not all MERGE statements can be mapped onto UPSERTs. The reason is that UPSERT depends on UNIQUE constraints, whereas MERGE uses a generic join condition that need not even refer to any INDEXes, let alone UNIQUE ones. Perhaps PG's UPSERT can be generalized to create a temporary UNIQUE constraint on the specified in the conflict_target portion of the statement, increasing the number of MERGE statements that could be mapped onto UPSERT. But even then, that would still be a UNIQUE constraint, whereas MERGE does not even imply such a thing. Now, a subset of MERGE (those using equijoins in the ON condition) can be mapped onto UPSERT provided either a suitable UNIQUE index exists (or that PG notionally creates a temporary UNIQUE constraint for the purpose of evaluating the UPSERT). This approach would NOT preclude a more complete subsequent implementation of MERGE. But I wonder if that's worthwhile given that a proper and complete implementation of MERGE is probably very desirable. On a tangentially related note, I've long wanted to have an RDBMS- independent SQL parser for the purpose of implementing external query- rewriting (and external optimizers), syntax highlighting, and so on. Having an external / plug-in method for rewriting unsupported SQL as a way of bridging functionality gaps (like lack of MERGE support) would be very nice. PG does have a way to expose its AST... It might be a good idea to start by implementing unsupported SQL features in such a way that they parse and can produce an AST along with a syntax/unsupported error -- then one might rewrite the parsed AST, generate appropriate SQL, and execute that. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Mon, Oct 30, 2017 at 11:07 AM, Simon Riggswrote: > Please explain in detail the MERGE SQL statements that you think will > be problematic and why. Your proposal is totally incomplete, so I can only surmise its behavior in certain cases, to make a guess at what the problems might be (see my remarks on EPQ, live lock, etc). This makes it impossible to do what you ask right now. Besides, you haven't answered the question from my last e-mail ("What's wrong with that [set of MERGE semantics]?"), so why should I go to further trouble? You're just not constructively engaging with me at this point. We're going around in circles. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 30 October 2017 at 18:59, Peter Geogheganwrote: > It is most emphatically *not* just a "small matter of programming". Please explain in detail the MERGE SQL statements that you think will be problematic and why. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Mon, Oct 30, 2017 at 6:21 AM, Simon Riggswrote: >> That's not really true. Nobody's going to be happy if MERGE has one >> behavior in one set of cases and an astonishingly different behavior >> in another set of cases. If you adopt a behavior for certain cases >> that can't be extended to other cases, then you're blocking a >> general-purpose MERGE. > > If a general purpose solution exists, please explain what it is. For the umpteenth time, a general purpose solution is one that more or less works like an UPDATE FROM, with an outer join, whose ModifyTable node is capable of insert, update, or delete (and accepts quals for MATCHED and NOT matched cases, etc). You could still get duplicate violations due to concurrent activity in READ COMMITTED mode, but not at higher isolation levels thanks to Thomas Munro's work there. In this world, ON CONFLICT and MERGE are fairly distinct things. What's wrong with that? You haven't actually told us why you don't like that. > The problem is that nobody has ever done so, so its not like we are > discussing the difference between bad solution X and good solution Y, > we are comparing reasonable solution X with non-existent solution Y. Nobody knows what your proposal would be like when time came to remove the restrictions that you suggest could be removed later. You're the one with the information here. We need to know what those semantics would be up-front, since you're effectively committing us down that path. You keep making vague appeals to pragmatism, but, in all sincerity, I don't understand where you're coming from at all. I strongly believe that generalizing from ON CONFLICT doesn't even make the implementation easier in the short term. ISTM that you're making this difficult for yourself for reasons that are known only to you. >> And, indeed, it seems that you're proposing an implementation that >> adds no new functionality, just syntax compatibility. Do we really >> want or need two syntaxes for the same thing in core? I kinda think >> Peter might have the right idea here. Under his proposal, we'd be >> getting something that is, in a way, new. > > Partitioning looked like "just new syntax", but it has been a useful > new feature. False equivalency. Nobody, including you, ever argued that that work risked painting us into a corner. (IIRC you said something like the progress was too small to justify putting into a single release.) > MERGE provides new capabilities that we do not have and is much more > powerful than INSERT/UPDATE, in a syntax that follow what other > databases use today. Just like partitioning. But you haven't told us *how* it is more powerful. Again, the semantics of a MERGE that is a generalization of ON CONFLICT are not at all obvious, and seem like they might be very surprising and risky. There is no question that it's your job to (at a minimum) define those semantics ahead of time, since you're going to commit us to them in the long term if you continue down this path. It is most emphatically *not* just a "small matter of programming". -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 30 October 2017 at 09:44, Robert Haaswrote: > On Sun, Oct 29, 2017 at 1:19 AM, Simon Riggs wrote: >> Nothing I am proposing blocks later work. > > That's not really true. Nobody's going to be happy if MERGE has one > behavior in one set of cases and an astonishingly different behavior > in another set of cases. If you adopt a behavior for certain cases > that can't be extended to other cases, then you're blocking a > general-purpose MERGE. If a general purpose solution exists, please explain what it is. The problem is that nobody has ever done so, so its not like we are discussing the difference between bad solution X and good solution Y, we are comparing reasonable solution X with non-existent solution Y. > And, indeed, it seems that you're proposing an implementation that > adds no new functionality, just syntax compatibility. Do we really > want or need two syntaxes for the same thing in core? I kinda think > Peter might have the right idea here. Under his proposal, we'd be > getting something that is, in a way, new. Partitioning looked like "just new syntax", but it has been a useful new feature. MERGE provides new capabilities that we do not have and is much more powerful than INSERT/UPDATE, in a syntax that follow what other databases use today. Just like partitioning. Will what I propose do everything in the first release? No, just like partitioning. If other developers are able to do things in phases, then I claim that right also. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 29 October 2017 at 21:25, Peter Geogheganwrote: > The semantics that I suggest (the SQL standard's semantics) will > require less code, and will be far simpler. Right now, I simply don't > understand why you're insisting on using ON CONFLICT without even > saying why. I can only surmise that you think that doing so will > simplify the implementation, but I can guarantee you that it won't. If you see problems in my proposal, please show the specific MERGE SQL statements that you think will give problems and explain how and what the failures will be. We can then use those test cases to drive developments. If we end up with code for multiple approaches we will be able to evaluate the differences between proposals using the test cases produced. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Sun, Oct 29, 2017 at 1:19 AM, Simon Riggswrote: > Nothing I am proposing blocks later work. That's not really true. Nobody's going to be happy if MERGE has one behavior in one set of cases and an astonishingly different behavior in another set of cases. If you adopt a behavior for certain cases that can't be extended to other cases, then you're blocking a general-purpose MERGE. And, indeed, it seems that you're proposing an implementation that adds no new functionality, just syntax compatibility. Do we really want or need two syntaxes for the same thing in core? I kinda think Peter might have the right idea here. Under his proposal, we'd be getting something that is, in a way, new. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Sun, Oct 29, 2017 at 4:48 AM, Simon Riggswrote: > I have no objection to you writing a better version than me and if my > work inspires you to complete that in a reasonable timescale then we > all win. My whole point is that the way that you seem determined to go on this is a dead end. I don't think that *anyone* can go improve on what you come up with if that's based heavily on ON CONFLICT, for the simple reason that the final user visible design is totally unclear. There is an easy way to make me shut up - come up with a design for MERGE that more or less builds on how UPDATE FROM works, rather than building MERGE on ON CONFLICT. (You might base things like RLS handling on ON CONFLICT, but in the main MERGE should be like an UPDATE FROM with an outer join, that can do INSERTs and DELETEs, too.) The original effort to add MERGE didn't do anything upsert-like, which Heikki (the GSOC mentor of the project) was perfectly comfortable with. I'm too lazy to go search the archives right now, but it's there. Heikki cites the SQL standard. This is what MERGE *actually is*, which you can clearly see from the Oracle/SQL Server/DB2 docs. It says this in the first paragraph of their MERGE documentation. It's crystal clear from their docs -- "This statement is a convenient way to combine multiple operations. It lets you avoid multiple INSERT, UPDATE, and DELETE DML statements." > I'm also very happy to work together on writing the version > described - you have already done much work in this area. You seem to want to preserve the ON CONFLICT guarantees at great cost. But you haven't even defended that based on a high level goal, or a use case, or something that makes sense to users (describing how it is possible is another matter). You haven't even tried to convince me. > I don't see any major problems in points you've raised so far, though > obviously there is much detail. Did you even read them? They are not mere details. They're fundamental to the semantics of the feature (if you base it on ON CONFLICT). It's not actually important that you understand them all; the important message is that generalizing ON CONFLICT has all kinds of terrible problems. > All of this needs to be written and then committed, so I'll get on and > write my proposal. We can then see whether that is an 80% solution or > something less. There are more obvious barriers to completion at this > point, like time and getting on with it. Getting on with *what*, exactly? In general, I have nothing against an 80% solution, or even a 50% solution, provided there is a plausible path to a 100% solution. I don't think that you have such a path, but only because you're tacitly inserting requirements that no other MERGE implementation has to live with, that I doubt any implementation *could* live with. Again, I'm not the one making this complicated, or adding requirements that will be difficult for you to get in to your v1 -- you're the one doing that. The semantics that I suggest (the SQL standard's semantics) will require less code, and will be far simpler. Right now, I simply don't understand why you're insisting on using ON CONFLICT without even saying why. I can only surmise that you think that doing so will simplify the implementation, but I can guarantee you that it won't. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 28 October 2017 at 22:04, Peter Geogheganwrote: > On Sat, Oct 28, 2017 at 12:49 PM, Simon Riggs wrote: >> Nothing I am proposing blocks later work. > > Actually, many things will block future work if you go down that road. > You didn't respond to the specific points I raised, but that doesn't > mean that they're not real. > >> Everything you say makes it clear that a fully generalized solution is >> going to be many years in the making, assuming we agree. > > I think that it's formally impossible as long as you preserve the ON > CONFLICT guarantees, unless you somehow define the problems out of > existence. Those are guarantees which no other MERGE implementation > has ever made, and which the SQL standard says nothing about. And for > good reasons. > >> "The extent to which an SQL-implementation may disallow independent >> changes that are not significant is implementation-defined”. >> >> So we get to choose. I recommend that we choose something practical. >> We're approaching the 10 year anniversary of my first serious attempt >> to do MERGE. I say that its time to move forwards with useful >> solutions, rather than wait another 10 years for the perfect one, even >> assuming it exists. > > As far as I'm concerned, you're the one arguing for an unobtainable > solution over a good one, not me. I *don't* think you should solve the > problems that I raise -- you should instead implement MERGE without > any of the ON CONFLICT guarantees, just like everyone else has. > Building MERGE on top of the ON CONFLICT guarantees, and ultimately > arriving at something that is comparable to other implementations over > many releases might be okay if anyone had the slightest idea of what > that would look like. You haven't even _described the semantics_, > which you could do by addressing the specific points that I raised. I have no objection to you writing a better version than me and if my work inspires you to complete that in a reasonable timescale then we all win. I'm also very happy to work together on writing the version described - you have already done much work in this area. I don't see any major problems in points you've raised so far, though obviously there is much detail. All of this needs to be written and then committed, so I'll get on and write my proposal. We can then see whether that is an 80% solution or something less. There are more obvious barriers to completion at this point, like time and getting on with it. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Sat, Oct 28, 2017 at 12:49 PM, Simon Riggswrote: > Nothing I am proposing blocks later work. Actually, many things will block future work if you go down that road. You didn't respond to the specific points I raised, but that doesn't mean that they're not real. > Everything you say makes it clear that a fully generalized solution is > going to be many years in the making, assuming we agree. I think that it's formally impossible as long as you preserve the ON CONFLICT guarantees, unless you somehow define the problems out of existence. Those are guarantees which no other MERGE implementation has ever made, and which the SQL standard says nothing about. And for good reasons. > "The extent to which an SQL-implementation may disallow independent > changes that are not significant is implementation-defined”. > > So we get to choose. I recommend that we choose something practical. > We're approaching the 10 year anniversary of my first serious attempt > to do MERGE. I say that its time to move forwards with useful > solutions, rather than wait another 10 years for the perfect one, even > assuming it exists. As far as I'm concerned, you're the one arguing for an unobtainable solution over a good one, not me. I *don't* think you should solve the problems that I raise -- you should instead implement MERGE without any of the ON CONFLICT guarantees, just like everyone else has. Building MERGE on top of the ON CONFLICT guarantees, and ultimately arriving at something that is comparable to other implementations over many releases might be okay if anyone had the slightest idea of what that would look like. You haven't even _described the semantics_, which you could do by addressing the specific points that I raised. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Fri, Oct 27, 2017 at 3:00 PM, Serge Rielauwrote: >> What other systems *do* have this restriction? I've never seen one that did. > > Not clear what you are leading up to here. > When I did MERGE in DB2 there was also no limitation: > "Each row in the target can only be operated on once. A row in the target can > only be identified as MATCHED with one row in the result table of the > table-reference” > What there was however was a significant amount of code I had to write and > test to enforce the above second sentence. Then it seems that we were talking about two different things all along. > Maybe in PG there is a trivial way to detect an expanding join and block it > at runtime. There is for ON CONFLICT. See the cardinality violation logic within ExecOnConflictUpdate(). (There are esoteric cases where this error can be raised due to a wCTE that does an insert "from afar", which is theoretically undesirable but not actually a problem.) The MERGE implementation that I have in mind would probably do almost the same thing, and make the "HeapTupleSelfUpdated" case within ExecUpdate() raise an error when the caller happened to be a MERGE, rather than following the historic UPDATE behavior. (The behavior is to silently suppress a second or subsequent UPDATE attempt from the same command, a behavior that Simon's mock MERGE documentation references.) > So the whole point I’m trying to make is that I haven’t seen the need for the > extra work I had to do once the feature appeared in the wild. That seems pretty reasonable to me. My whole point is that I think it's a mistake to do things like lock rows ahead of evaluating any UPDATE predicate, in the style of ON CONFLICT, in order to replicate the ON CONFLICT guarantees [1]. I'm arguing for implementation simplicity, too. Trying to implement MERGE in a way that extends ON CONFLICT seems like a big mistake to me, because ON CONFLICT updates rows on the basis of a would-be duplicate violation, along with all the baggage that that carries. This is actually enormously different to an equi-join that is fed by a scan using an MVCC snapshot. The main difference is that there actually is no MVCC snapshot in play in most cases [2]. If *no* row with the PK value of 5 is visible to our MVCC snapshot, but an xact committed having inserted such a row, that still counts as a CONFLICT with READ COMMITTED. [1] https://wiki.postgresql.org/wiki/UPSERT#Goals_for_implementation [2] https://www.postgresql.org/docs/devel/static/transaction-iso.html#xact-read-committed -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 28 October 2017 at 20:39, Peter Geogheganwrote: > On Sat, Oct 28, 2017 at 3:10 AM, Simon Riggs wrote: >> SQL:2011 specifically states "The extent to which an >> SQL-implementation may disallow independent changes that are not >> significant is implementation-defined”, so in my reading the above >> behaviour would make us fully spec compliant. Thank you to Peter for >> providing the infrastructure on which this is now possible for PG11. >> >> Serge puts this very nicely by identifying two different use cases for MERGE. > > MERGE benefits from having a join that is more or less implemented in > the same way as any other join. It can be a merge join, hash join, or > nestloop join. ON CONFLICT doesn't work using a join. > > Should I to take it that you won't be supporting any of these > alternative join algorithms? If not, then you'll have something that > really isn't comparable to MERGE as implemented in Oracle, SQL Server, > or DB2. They *all* do this. > > Would the user be able to omit WHEN NOT MATCHED/INSERT, as is the case > with every existing MERGE implementation? If so, what actually happens > under the hood when WHEN NOT MATCHED is omitted? For example, would > you actually use a regular "UPDATE FROM" style join, as opposed to the > ON CONFLICT infrastructure? And, if that is so, can you justify the > semantic difference for rows that are updated in each scenario > (omitted vs. not omitted) in READ COMMITTED mode? Note that this could > be the difference between updating a row when *no* version is visible > to our MVCC snapshot, as opposed to doing the EPQ stuff and updating > the latest row version if possible. That's a huge, surprising > difference. On top of all this, you risk live-lock if INSERT isn't a > possible outcome (this is also why ON CONFLICT can never accept a > predicate on its INSERT portion -- again, quite unlike MERGE). > > Why not just follow what other systems do? It's actually easier to go > that way, and you get a better outcome. ON CONFLICT involves what you > could call a sleight of hand, and I fear that you don't appreciate > just how specialized the internal infrastructure is. > >> Now, I accept that you might also want a MERGE statement that >> continues to work even if there is no unique constraint, but it would >> need to have different properties to the above. I do not in any way >> argue against adding that. > > Maybe you *should* be arguing against it, though, and arguing against > ever supporting anything but equijoins, because these things will > *become* impossible if you go down that road. By starting with the ON > CONFLICT infrastructure, while framing no-unique-index-support as work > for some unspecified future release, you're leaving it up to someone > else to resolve the problems. Someone else must square the circle of > mixing ON CONFLICT semantics with fully generalized MERGE semantics. > But who? Nothing I am proposing blocks later work. Everything you say makes it clear that a fully generalized solution is going to be many years in the making, assuming we agree. "The extent to which an SQL-implementation may disallow independent changes that are not significant is implementation-defined”. So we get to choose. I recommend that we choose something practical. We're approaching the 10 year anniversary of my first serious attempt to do MERGE. I say that its time to move forwards with useful solutions, rather than wait another 10 years for the perfect one, even assuming it exists. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Sat, Oct 28, 2017 at 3:10 AM, Simon Riggswrote: > SQL:2011 specifically states "The extent to which an > SQL-implementation may disallow independent changes that are not > significant is implementation-defined”, so in my reading the above > behaviour would make us fully spec compliant. Thank you to Peter for > providing the infrastructure on which this is now possible for PG11. > > Serge puts this very nicely by identifying two different use cases for MERGE. MERGE benefits from having a join that is more or less implemented in the same way as any other join. It can be a merge join, hash join, or nestloop join. ON CONFLICT doesn't work using a join. Should I to take it that you won't be supporting any of these alternative join algorithms? If not, then you'll have something that really isn't comparable to MERGE as implemented in Oracle, SQL Server, or DB2. They *all* do this. Would the user be able to omit WHEN NOT MATCHED/INSERT, as is the case with every existing MERGE implementation? If so, what actually happens under the hood when WHEN NOT MATCHED is omitted? For example, would you actually use a regular "UPDATE FROM" style join, as opposed to the ON CONFLICT infrastructure? And, if that is so, can you justify the semantic difference for rows that are updated in each scenario (omitted vs. not omitted) in READ COMMITTED mode? Note that this could be the difference between updating a row when *no* version is visible to our MVCC snapshot, as opposed to doing the EPQ stuff and updating the latest row version if possible. That's a huge, surprising difference. On top of all this, you risk live-lock if INSERT isn't a possible outcome (this is also why ON CONFLICT can never accept a predicate on its INSERT portion -- again, quite unlike MERGE). Why not just follow what other systems do? It's actually easier to go that way, and you get a better outcome. ON CONFLICT involves what you could call a sleight of hand, and I fear that you don't appreciate just how specialized the internal infrastructure is. > Now, I accept that you might also want a MERGE statement that > continues to work even if there is no unique constraint, but it would > need to have different properties to the above. I do not in any way > argue against adding that. Maybe you *should* be arguing against it, though, and arguing against ever supporting anything but equijoins, because these things will *become* impossible if you go down that road. By starting with the ON CONFLICT infrastructure, while framing no-unique-index-support as work for some unspecified future release, you're leaving it up to someone else to resolve the problems. Someone else must square the circle of mixing ON CONFLICT semantics with fully generalized MERGE semantics. But who? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 28 October 2017 at 00:31, Michael Paquierwrote: > On Fri, Oct 27, 2017 at 9:00 AM, Robert Haas wrote: >> On Fri, Oct 27, 2017 at 4:41 PM, Simon Riggs wrote: >>> I didn't say it but my intention was to just throw an ERROR if no >>> single unique index can be identified. >>> >>> It could be possible to still run MERGE in that situaton but we would >>> need to take a full table lock at ShareRowExclusive. It's quite likely >>> that such statements would throw duplicate update errors, so I >>> wouldn't be aiming to do anything with that for PG11. >> >> Like Peter, I think taking such a strong lock for a DML statement >> doesn't sound like a very desirable way forward. It means, for >> example, that you can only have one MERGE in progress on a table at >> the same time, which is quite limiting. It could easily be the case >> that you have multiple MERGE statements running at once but they touch >> disjoint groups of rows and therefore everything works. I think the >> code should be able to cope with concurrent changes, if nothing else >> by throwing an ERROR, and then if the user wants to ensure that >> doesn't happen by taking ShareRowExclusiveLock they can do that via an >> explicit LOCK TABLE statement -- or else they can prevent concurrency >> by any other means they see fit. > > +1, I would suspect users to run this query in parallel of the same > table for multiple data sets. > > Peter has taken some time to explain me a bit his arguments today, and > I agree that it does not sound much appealing to have constraint > limitations for MERGE. Particularly using the existing ON CONFLICT > structure gets the feeling of having twice a grammar for what's > basically the same feature, with pretty much the same restrictions. > > By the way, this page sums up nicely the situation about many > implementations of UPSERT taken for all systems: > https://en.wikipedia.org/wiki/Merge_(SQL) That Wikipedia article is badly out of date and regrettably does NOT sum up the current situation nicely any more since MERGE has changed in definition in SQL:2011 since its introduction in SQL:2003. I'm proposing a MERGE statement for PG11 that i) takes a RowExclusiveLock on rows, so can be run concurrently ii) uses the ON CONFLICT infrastructure to do that and so requires a unique constraint. The above is useful behaviour that will be of great benefit to PostgreSQL users. There are no anomalies remaining. SQL:2011 specifically states "The extent to which an SQL-implementation may disallow independent changes that are not significant is implementation-defined”, so in my reading the above behaviour would make us fully spec compliant. Thank you to Peter for providing the infrastructure on which this is now possible for PG11. Serge puts this very nicely by identifying two different use cases for MERGE. Now, I accept that you might also want a MERGE statement that continues to work even if there is no unique constraint, but it would need to have different properties to the above. I do not in any way argue against adding that. I also agree that adding RETURNING at a later stage would be fine as well. I am proposing that those and any other additional properties people come up with can be added in later releases once we have the main functionality in core in PG11. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Fri, Oct 27, 2017 at 02:13:27PM -0700, srielau wrote: > While the standard may not require a unique index for the ON clause I have > never seen a MERGE statement that did not have this property. So IMHO this > is a reasonable restrictions. I don't understand how one could have a conflict upon which to turn INSERT into UPDATE without having a UNIQUE constraint violated... The only question is whether one should have control over -or have to specify- which constraint violations lead to UPDATE vs. which ones lead to failure vs. which ones lead to doing nothing. The row to update is the one that the to-be-inserted row conflicted with -- there can only have been one if the constraint violated was a PRIMARY KEY constraint, or if there is a PRIMARY KEY at all, but if there's no PRIMARY KEY, then there can have been more conflicting rows because of NULL columns in the to-be-inserted row. If the to-be-inserted row conflicts with multiple rows, then just fail, or don't allow MERGE on tables that have no PK (as you know, many think it makes no sense to not have a PK on a table in SQL). In the common case one does not care about which UNIQUE constraint is violated because there's only one that could have been violated, or because if the UPDATE should itself cause some other UNIQUE constraint to be violated, then the whole statement should fail. PG's UPSERT is fantastic -- it allows very fine-grained control, but it isn't as pithy as it could be when the author doesn't care to specify all that detail. Also, something like SQLite3's INSERT OR REPLACE is very convenient: pithy, INSERT syntax, upsert-like semantics[*]. I'd like to have this in PG: INSERT INTO .. ON CONFLICT DO UPDATE; -- I.e., update all columns of the existing -- row to match the ones from the row that -- would have been inserted had there not been -- a conflict. -- -- If an INSERTed row conflicts and then the -- UPDATE it devolves to also conflicts, then -- fail. and INSERT INTO .. ON CONFLICT DO UPDATE -- I.e., update all columns of the existing -- row to match the ones from the row that -- would have been inserted had there not been -- a conflict. -- ON CONFLICT DO NOTHING; -- If an INSERTed row conflicts and then the -- UPDATE it devolves to also conflicts, then -- DO NOTHING. [*] SQLite3's INSERT OR REPLACE is NOT an insert-or-update, but an insert-or-delete-and-insert, and any deletions that occur in the process do fire triggers. INSERT OR UPDATE would be much more useful. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Fri, Oct 27, 2017 at 9:00 AM, Robert Haaswrote: > On Fri, Oct 27, 2017 at 4:41 PM, Simon Riggs wrote: >> I didn't say it but my intention was to just throw an ERROR if no >> single unique index can be identified. >> >> It could be possible to still run MERGE in that situaton but we would >> need to take a full table lock at ShareRowExclusive. It's quite likely >> that such statements would throw duplicate update errors, so I >> wouldn't be aiming to do anything with that for PG11. > > Like Peter, I think taking such a strong lock for a DML statement > doesn't sound like a very desirable way forward. It means, for > example, that you can only have one MERGE in progress on a table at > the same time, which is quite limiting. It could easily be the case > that you have multiple MERGE statements running at once but they touch > disjoint groups of rows and therefore everything works. I think the > code should be able to cope with concurrent changes, if nothing else > by throwing an ERROR, and then if the user wants to ensure that > doesn't happen by taking ShareRowExclusiveLock they can do that via an > explicit LOCK TABLE statement -- or else they can prevent concurrency > by any other means they see fit. +1, I would suspect users to run this query in parallel of the same table for multiple data sets. Peter has taken some time to explain me a bit his arguments today, and I agree that it does not sound much appealing to have constraint limitations for MERGE. Particularly using the existing ON CONFLICT structure gets the feeling of having twice a grammar for what's basically the same feature, with pretty much the same restrictions. By the way, this page sums up nicely the situation about many implementations of UPSERT taken for all systems: https://en.wikipedia.org/wiki/Merge_(SQL) -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
via Newton Mail [https://cloudmagic.com/k/d/mailapp?ct=dx=9.8.79=10.12.6=email_footer_2] On Fri, Oct 27, 2017 at 2:42 PM, Peter Geogheganwrote: On Fri, Oct 27, 2017 at 2:13 PM, srielau wrote: > While the standard may not require a unique index for the ON clause I have > never seen a MERGE statement that did not have this property. So IMHO this > is a reasonable restrictions. The Oracle docs on MERGE say nothing about unique indexes or constraints. They don't even mention them in passing. They do say "This statement is a convenient way to combine multiple operations. It lets you avoid multiple INSERT, UPDATE, and DELETE DML statements." SQL Server's MERGE docs do mention unique indexes, but only in passing, saying something about unique violations, and that unique violations *cannot* be suppressed in MERGE, even though that's possible with other DML statements (with something called IGNORE_DUP_KEY). What other systems *do* have this restriction? I've never seen one that did. Not clear what you are leading up to here. When I did MERGE in DB2 there was also no limitation: " Each row in the target can only be operated on once. A row in the target can only be identified as MATCHED with one row in the result table of the table-reference” What there was however was a significant amount of code I had to write and test to enforce the above second sentence. IIRC it involved, in the absence of a proof that the join could not expand, adding a row_number() over() AS rn over the target leg of the join and then a row_number() over(partition by rn) > 1 THEN RAISE_ERROR() to catch violators. Maybe in PG there is a trivial way to detect an expanding join and block it at runtime. So the whole point I’m trying to make is that I haven’t seen the need for the extra work I had to do once the feature appeared in the wild. Cheers Serge
Re: [HACKERS] MERGE SQL Statement for PG11
On Fri, Oct 27, 2017 at 2:13 PM, srielauwrote: > While the standard may not require a unique index for the ON clause I have > never seen a MERGE statement that did not have this property. So IMHO this > is a reasonable restrictions. The Oracle docs on MERGE say nothing about unique indexes or constraints. They don't even mention them in passing. They do say "This statement is a convenient way to combine multiple operations. It lets you avoid multiple INSERT, UPDATE, and DELETE DML statements." SQL Server's MERGE docs do mention unique indexes, but only in passing, saying something about unique violations, and that unique violations *cannot* be suppressed in MERGE, even though that's possible with other DML statements (with something called IGNORE_DUP_KEY). What other systems *do* have this restriction? I've never seen one that did. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
Simon, Nice writeup. While the standard may not require a unique index for the ON clause I have never seen a MERGE statement that did not have this property. So IMHO this is a reasonable restrictions. In fact I have only ever seen two flavors of usage: * Single row source (most often simply a VALUES clause) in OLTP In that case there was lots of concurrency * Massive source which affects a significant portion of the target table in DW. In this case there were no concurrent MERGEs I believe support for returning rows at a later stage would prove to be very powerful, especially in combination with chaining MERGE statements in CTEs. To do that would require language extensions to pass the coloring of the source row through, especially for rows that fell into "do nothing". -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Fri, Oct 27, 2017 at 4:41 PM, Simon Riggswrote: > I didn't say it but my intention was to just throw an ERROR if no > single unique index can be identified. > > It could be possible to still run MERGE in that situaton but we would > need to take a full table lock at ShareRowExclusive. It's quite likely > that such statements would throw duplicate update errors, so I > wouldn't be aiming to do anything with that for PG11. Like Peter, I think taking such a strong lock for a DML statement doesn't sound like a very desirable way forward. It means, for example, that you can only have one MERGE in progress on a table at the same time, which is quite limiting. It could easily be the case that you have multiple MERGE statements running at once but they touch disjoint groups of rows and therefore everything works. I think the code should be able to cope with concurrent changes, if nothing else by throwing an ERROR, and then if the user wants to ensure that doesn't happen by taking ShareRowExclusiveLock they can do that via an explicit LOCK TABLE statement -- or else they can prevent concurrency by any other means they see fit. Other problems with taking ShareRowExclusiveLock include (1) probable lock upgrade hazards and (2) do you really want MERGE to kick autovacuum off of your giant table? Probably not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Fri, Oct 27, 2017 at 6:24 AM, Robert Haaswrote: > I think one of the reasons why Peter Geoghegan decided to pursue > INSERT .. ON CONFLICT UPDATE was that, because it is non-standard SQL > syntax, he felt free to mandate a non-standard SQL requirement, namely > the presence of a unique index on the arbiter columns. That's true, but what I was really insistent on, more than anything else, was that the user would get a practical guarantee about an insert-or-update outcome under concurrency. There could be no "unprincipled deadlocks", nor could there be spurious unique violations. This is the kind of thing that the SQL standard doesn't really concern itself with, and yet it's of significant practical importance to users. Both Oracle and SQL Server allow these things that I specifically set out to avoid. I think that that's mostly a good thing, though; they do a really bad job of explaining what's what, and don't provide for a very real need ("upsert") in some other way, but their MERGE semantics do make sense to me. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Fri, Oct 27, 2017 at 7:41 AM, Simon Riggswrote: > Good points. > > I didn't say it but my intention was to just throw an ERROR if no > single unique index can be identified. You'd also throw an error when there was no "upsert compatible" join quals, I take it? I don't see the point in that. That's just mapping one syntax on to another. > It could be possible to still run MERGE in that situaton but we would > need to take a full table lock at ShareRowExclusive. It's quite likely > that such statements would throw duplicate update errors, so I > wouldn't be aiming to do anything with that for PG11. I would avoid mixing up ON CONFLICT DO UPDATE and MERGE. The "anomalies" you describe in MERGE are not really anomalies IMV. They're simply how the feature is supposed to operate, and how it's possible to make MERGE use alternative join algorithms based only on the underlying cost. You might use a merge join for a bulk load use-case, for example. I think an SQL MERGE feature would be compelling, but I don't think that it should take much from ON CONFLICT. As I've said many times [1], they're really two different features (Teradata had features similar to each, for example). I suggest that you come up with something that has the semantics that the standard requires, and therefore makes none of the ON CONFLICT guarantees about an outcome under concurrency (INSERT or UPDATE). Those guarantees are basically incompatible with how MERGE needs to work. In case it matters, I think that the idea of varying relation heavyweight lock strength based on subtle semantics within a DML statement is a bad one. Frankly, I think that that's going to be a nonstarter. [1] https://www.postgresql.org/message-id/CAM3SWZRP0c3g6+aJ=yydgyactzg0xa8-1_fcvo5xm7hrel3...@mail.gmail.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On 27 October 2017 at 15:24, Robert Haaswrote: > On Fri, Oct 27, 2017 at 10:55 AM, Simon Riggs wrote: >> Questions? > > I think one of the reasons why Peter Geoghegan decided to pursue > INSERT .. ON CONFLICT UPDATE was that, because it is non-standard SQL > syntax, he felt free to mandate a non-standard SQL requirement, namely > the presence of a unique index on the arbiter columns. If MERGE's > join clause happens to involve equality conditions on precisely the > same set of columns as some unique index on the target table, then I > think you can reuse the INSERT .. ON CONFLICT UPDATE infrastructure > and I suspect there won't be too many problems. Agreed > However, if it > doesn't, then what? You could decree that such cases will fail, but > that might not meet your use case for developing the feature. Or you > could try to soldier on without the INSERT .. ON CONFLICT UPDATE > machinery, but that means, I think, that sometimes you will get > serialization anomalies - at least, I think, you will sometimes obtain > results that couldn't have been obtained under any serial order of > execution, and maybe it would end up being possible to fail with > serialization errors or unique index violations. > > In the past, there have been objections to implementations of MERGE > which would give rise to such serialization anomalies, but I'm not > sure we should feel bound by those discussions. One thing that's > different is that the common and actually-useful case can now be made > to work in a fairly satisfying way using INSERT .. ON CONFLICT UPDATE; > if less useful cases are vulnerable to some weirdness, maybe it's OK > to just document the problems. Good points. I didn't say it but my intention was to just throw an ERROR if no single unique index can be identified. It could be possible to still run MERGE in that situaton but we would need to take a full table lock at ShareRowExclusive. It's quite likely that such statements would throw duplicate update errors, so I wouldn't be aiming to do anything with that for PG11. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Fri, Oct 27, 2017 at 10:55 AM, Simon Riggswrote: > Questions? I think one of the reasons why Peter Geoghegan decided to pursue INSERT .. ON CONFLICT UPDATE was that, because it is non-standard SQL syntax, he felt free to mandate a non-standard SQL requirement, namely the presence of a unique index on the arbiter columns. If MERGE's join clause happens to involve equality conditions on precisely the same set of columns as some unique index on the target table, then I think you can reuse the INSERT .. ON CONFLICT UPDATE infrastructure and I suspect there won't be too many problems. However, if it doesn't, then what? You could decree that such cases will fail, but that might not meet your use case for developing the feature. Or you could try to soldier on without the INSERT .. ON CONFLICT UPDATE machinery, but that means, I think, that sometimes you will get serialization anomalies - at least, I think, you will sometimes obtain results that couldn't have been obtained under any serial order of execution, and maybe it would end up being possible to fail with serialization errors or unique index violations. In the past, there have been objections to implementations of MERGE which would give rise to such serialization anomalies, but I'm not sure we should feel bound by those discussions. One thing that's different is that the common and actually-useful case can now be made to work in a fairly satisfying way using INSERT .. ON CONFLICT UPDATE; if less useful cases are vulnerable to some weirdness, maybe it's OK to just document the problems. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
Simon Riggs wrote: > Earlier thoughts on how this could/could not be done were sometimes > imprecise or inaccurate, so I have gone through the command per > SQL:2011 spec and produced a definitive spec in the form of an SGML > ref page. This is what I intend to deliver for PG11. Nice work. I didn't verify the SQL spec, just read your HTML page; some very minor comments based on that: * use "and" not "where" as initial words in "when_clause" and "merge_update" clause definitions * missing word here: "the DELETE privilege on the if you specify" * I think the word "match." is leftover from some editing in the phrase " that specifies which rows in the data_source match rows in the target_table_name. match." In the same paragraph, it is not clear whether all columns must be matched or it can be a partial match. * In the when_clause note, it is not clear whether you can have multiple WHEN MATCHED and WHEN NOT MATCHED clauses. Obviously you can have one of each, but I think your doc says it is possible to have more than one of each, with different conditions (WHEN MATCHED AND foo THEN bar WHEN MATCHED AND baz THEN qux). No example shows more than one. On the same point: Is there short-circuiting of such conditions, i.e. will the execution will stop looking for further WHEN matches if some rule matches, or will it rather check all rules and raise an error if more than one WHEN rules match each given row? * Your last example uses ELSE but that appears nowhere in the synopsys. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
Hey, It looks quite nice. Personally I'd like to also have the returning statement, and have the number of deleted and inserted rows as separate numbers in the output message. regards Szymon Lipiński pt., 27.10.2017, 10:56 użytkownik Simon Riggsnapisał: > I'm working on re-submitting MERGE for PG11 > > Earlier thoughts on how this could/could not be done were sometimes > imprecise or inaccurate, so I have gone through the command per > SQL:2011 spec and produced a definitive spec in the form of an SGML > ref page. This is what I intend to deliver for PG11. > > MERGE will use the same mechanisms as INSERT ON CONFLICT, so > concurrent behavior does not require further infrastructure changes, > just detailed work on the statement itself. > > I'm building up the code from scratch based upon the spec, rather than > trying to thwack earlier attempts into shape. This looks more likely > to yield a commitable patch. > > Major spanners or objections, please throw them in now cos I don't see any. > > Questions? > > -- > Simon Riggshttp://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >