Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-07 Thread Nico Williams
Ah, there is one reason not to use a mapping to CTEs to implement MERGE:
it might be faster to use a single query that is a FULL OUTER JOIN of the
source and target to drive the update/insert/delete operations.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-07 Thread Nico Williams
On Tue, Nov 07, 2017 at 03:31:22PM -0800, Peter Geoghegan wrote:
> On Tue, Nov 7, 2017 at 3:29 PM, Nico Williams  wrote:
> > On Thu, Nov 02, 2017 at 03:25:48PM -0700, Peter Geoghegan wrote:
> >> Nico Williams  wrote:
> >> >A MERGE mapped to a DML like this:
> >
> > I needed to spend more time reading MERGE docs from other RDBMSes.
> 
> Please don't hijack this thread. It's about the basic question of
> semantics, and is already hard enough for others to follow as-is.

I'm absolutely not.  If you'd like a pithy summary devoid of detail, it
is this:

  I'm making the argument that using ON CONFLICT to implement MERGE
  cannot produce a complete implementation [you seem to agree], but
  there is at least one light-weight way to implement MERGE with
  _existing_ machinery in PG: CTEs.

  It's perfectly fine to implement an executor for MERGE, but I think
  that's a bit silly and I explain why.

Further, I explored your question regarding order of events, which you
(and I) think is a very important semantics question.  You thought order
of execution / trigger firing should be defined, whereas I think it
should not because MERGE explicitly says, at least MSFT's!

MSFT's MERGE says:

| For every insert, update, or delete action specified in the MERGE
| statement, SQL Server fires any corresponding AFTER triggers defined
| on the target table, but does not guarantee on which action to fire
| triggers first or last. Triggers defined for the same action honor the
| order you specify.

Impliedly (though not stated explicitly), the actual updates, inserts,
and deletes, can happen in any order as well as the triggers firing in
any order.

As usual, in the world of programming language design, leaving order of
execution undefined as much as possible increases the level of available
opportunities to parallelize.  Presumably MSFT is leaving the door open
to parallizing MERGE, if they haven't already.

Impliedly, CTEs that have no dependencies on each other are also ripe
for parallelization.  This is important too!  For one of my goals is: to
improve CTE performance.  If implementing MERGE as a mapping to CTEs
leads to improvements in CTEs, so much the better.  But also this *is* a
simple implementation of MERGE, and simplicity seems like a good thing.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-07 Thread Peter Geoghegan
On Tue, Nov 7, 2017 at 3:29 PM, Nico Williams  wrote:
> On Thu, Nov 02, 2017 at 03:25:48PM -0700, Peter Geoghegan wrote:
>> Nico Williams  wrote:
>> >A MERGE mapped to a DML like this:
>
> I needed to spend more time reading MERGE docs from other RDBMSes.

Please don't hijack this thread. It's about the basic question of
semantics, and is already hard enough for others to follow as-is.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-07 Thread Nico Williams
On Thu, Nov 02, 2017 at 03:25:48PM -0700, Peter Geoghegan wrote:
> Nico Williams  wrote:
> >A MERGE mapped to a DML like this:

I needed to spend more time reading MERGE docs from other RDBMSes.

The best MERGE so far is MS SQL Server's, which looks like:

  MERGE INTO  
  USING  
  ON ()
  
  -- optional:
  WHEN MATCHED THEN UPDATE SET ...

  -- optional:
  WHEN NOT MATCHED [ BY TARGET ] THEN INSERT ...

  -- optional:
  WHEN NOT MATCHED BY SOURCE THEN DELETE

  -- optional:
  OUTPUT ...
  
  ;

(The other MERGEs are harder to use because they lack a WHEN NOT MATCHED
BY SOURCE THEN DELETE, instead having a DELETE clause on the UPDATE,
which is then difficult to use.)

This is *trivial* to map to a CTE, and, in fact, I and my colleagues
have resorted to hand-coded CTEs like this precisely because PG lacks
MERGE (though we ourselves didn't know about MERGE -- it's new to us).

If  is a query, then we start with a CTE for that, else if it's
a view or table, then we don't setup a CTE for it.  Each of the UPDATE,
INSERT, and/or DELETE can be it's own CTE.  If there's an OUTPUT clause,
that can be a final SELECT that queries from the CTEs that ran the DMLs
with RETURNING.  If there's no OUTPUT then none of the DMLs need to have
RETURNING, and one of them will be the main statement, rather than a
CTE.

The pattern is:

  WITH
-- IFF  is a query:
 AS (),

-- IFF there's a WHEN MATCHED THEN UPDATE
updates AS (
  UPDATE  AS  SET ...
  FROM 
  WHERE 
  -- IFF there's an OUTPUT clause, then:
  RETURNING 'update' as "@action", ...
),

inserts AS (
  INSERT INTO  ()
  SELECT ...
  FROM 
  LEFT JOIN  ON 
  WHERE  IS NOT DISTINCT FROM NULL
  -- IFF there's a CONCURRENTLY clause:
  ON CONFLICT DO NOTHING
  -- IFF there's an OUTPUT clause, then:
  RETURNING 'insert' as "@action", ...
),

deletes AS (
  DELETE FROM 
  WHERE NOT EXISTS (SELECT * FROM  WHERE )
  -- IFF there's an OUTPUT clause, then:
  RETURNING 'delete' as "@action", ...
),

  -- IFF there's an OUTPUT clause
  SELECT * FROM updates
  UNION
  SELECT * FROM inserts
  UNION
  SELECT * FROM deletes;

If there's not an output clause then one of the DMLs has to be the main
statement:

  WITH ...
  DELETE ...; -- or UPDATE, or INSERT

Note that if the source is a view or table and there's no OUTPUT clause,
then it's one DML with up to (but not referring to) two CTEs, and in all
cases the CTEs do not refer to each other.  This means that the executor
can parallelize all of the DMLs.

If the source is a query, then that could be made a temp view to avoid
having to run the query first.  The CTE executor needs to learn to
sometimes do this anyways, so this is good.

The  CTE can be equivalently written without a NOT EXISTS:

to_be_deleted AS (
  SELECT 
  FROM 
  LEFT JOIN  ON ()
  WHERE  IS NOT DISTINCT FROM NULL
),
deletes AS (
  DELETE FROM 
  USING to_be_deleted tbd
  WHERE  = 
)

if that were to run faster (probably not, since PG today would first run
the to_be_deleted CTE, then the deletes CTE).  I mention only because
it's nice to see the symmetry of LEFT JOINs for the two WHEN NOT MATCHED
cases.

(Here  is the alias for it if one was given.)

***

This mapping triggers triggers as one would expect (at least FOR EACH
ROW; I expect the DMLs in CTEs should also trigger FOR EACH STATEMENT
triggers, and if they don't I consider that a bug).

> This is a bad idea. An implementation like this is not at all
> maintainable.

I beg to differ.  First of all, not having to add an executor for MERGE
is a win: much, much less code to maintain.  The code to map MERGE to
CTEs can easily be contained entirely in src/backend/parser/gram.y,
which is a maintainability win: any changes to how CTEs are compiled
will fail to compile if they break the MERGE mapping to CTEs.

> >can handle concurrency via ON CONFLICT DO NOTHING in the INSERT CTE.
> 
> That's not handling concurrency -- it's silently ignoring an error. Who
> is to say that the conflict that IGNORE ignored is associated with a row
> visible to the MVCC snapshot of the statement? IOW, why should the DELETE
> affect any row?

That was me misunderstanding MERGE.  The DELETE is independent of the
INSERT -- if an INSERT does nothing because of an ON CONFLICT DO
NOTHING clause, then that won't cause that row to be deleted -- the
inserts and deletes CTEs are independent in the latest mapping (see
above).

I believe adding ON CONFLICT DO NOTHING to the INSERT in this mapping is
all that's needed to support concurrency.

> There are probably a great many reasons why you need a ModifyTable
> executor node that keeps around state, and explicitly indicates that a
> MERGE is a MERGE. For example, we'll probably want statement level
> triggers to execute in a fixed order, regardless of the MERGE, RLS will
> probably require explicitly knowledge of MERGE 

Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-07 Thread Geoff Winkless
On 6 November 2017 at 17:35, Simon Riggs  wrote:
> I read that step 3 in Approach2 is some kind of problem in MVCC
> semantics. My understanding is that SQL Standard allows us to define
> what the semantics of the statement are in relation to concurrency, so
> any semantic issue can be handled by defining it to work the way we
> want. The semantics are:
> a) when a unique index is available we avoid errors by using semantics
> of INSERT .. ON CONFLICT UPDATE.
> b) when a unique index is not available we use other semantics.

I'm obviously being obtuse.

If a unique index is not available, then surely there won't _be_ a
failure? The INSERT (or indeed UPDATE) that results in two similar
records will simply happen, and you will end up with two records the
same. That's OK, based on the semantics of MERGE, no? At the
transaction-start INSERT was the correct thing to do.

Geoff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-06 Thread Peter Geoghegan

Simon Riggs  wrote:

In step 3 we discover that an entry exists in the index for a committed row.

Since we have a unique index we use it to locate the row we know
exists and UPDATE that.

We don't use a new MVCC snapshot, we do what EPQ does. EPQ is already
violating MVCC for UPDATEs, so why does it matter if we do it for
INSERTs also?


Before I go on to say why I think that this approach is problematic, I
want to point out a few things that I think we actually agree on:

* EPQ is fairly arbitrary as a behavior for READ COMMITTED UPDATE
 conflict handling. It has more to do with how VACUUM works than about
 some platonic ideal that everyone agrees on.

* We can imagine other alternatives, such as the behavior in Oracle
 (statement level rollback + optimistic retry).

* Those alternatives are probably better in some ways but worse in other
 ways.

* EPQ violates snapshot consistency, even though that's not inherently
 necessary to avoid "READ COMMITTED serialization errors".

* ON CONFLICT also violates snapshot consistency, in rather a different
 way. (Whether or not this is necessary is more debatable.)

I actually think that other MVCC systems don't actually copy Oracle here,
either, and for similar pragmatic reasons. It's a mixed bag.


Where hides the problem?


The problem is violating MVCC is something that can be done in different
ways, and by meaningful degrees:

* EPQ semantics are believed to be fine because we don't get complaints
 about it. I think that that's because it's specialized to UPDATEs and
 UPDATE-like operations, where we walk an UPDATE chain specifically,
 and only use a dirty snapshot for the chain's newer tuples.

* ON CONFLICT doesn't care about UPDATE chains. Unlike EPQ, it makes no
 distinction between a concurrent UPDATE, and a concurrent DELETE + fresh
 INSERT. It's specialized to CONFLICTs.

This might seem abstract, but it has real, practical implications.
Certain contradictions exist when you start with MVCC semantics, then
fall back to EPQ semantics, then finally fall back to ON CONFLICT
semantics.

Questions about mixing these two things:

* What do we do if someone concurrently UPDATEs in a way that makes the
 qual not pass during EPQ traversal? Should we INSERT when that
 happens?

* If so, what about the case when the MERGE join qual/unique index
 values didn't change (just some other attributes that do not pass the
 additional WHEN MATCHED qual)?

* What about when there was a concurrent DELETE -- should we INSERT then?

ON CONFLICT goes from a CONFLICT, and then applies its own qual. That's
hugely different to doing it the other way around: starting from your
own MVCC snapshot qual, and going to a CONFLICT. This is because
evaluating the DO UPDATE's WHERE clause is just one little extra step
after the one and only latest row for that value has been locked.  You
could theoretically go this way with 2PL, I think, because that's a bit
like locking every row that the predicate touches, but of course that
isn't at all practical.

I should stop trying to make a watertight case against this, even though
I still think that's possible. For now, instead, I'll just say that this
is *extremely* complicated, and still has unresolved questions about
semantics.

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-06 Thread Simon Riggs
On 6 November 2017 at 18:35, Peter Geoghegan  wrote:

>> APPROACH2 (modified from my original proposal slightly)
>
>
> This write-up actually begins to confront the issues that I've raised.
> I'm glad to see this.
>
>> 1. Join...
>> 2. Apply results for UPDATE, if present not visible via the snapshot
>> taken at 1, do EPQ to ensure we locate current live tuple
>> 3. If still not visible, do speculative insertion if we have a unique
>> index available, otherwise ERROR. If spec insertion fails, go to 2
>>
>> The loop created above can live-lock, meaning that an infinite loop
>> could be created.
>
>
> The loop is *guaranteed* to live-lock once you "goto 2". So you might as
> well just throw an error at that point, which is the behavior that I've
> been arguing for all along!
>
> If this isn't guaranteed to live-lock at "goto 2", then it's not clear
> why. The outcome of step 2 is clearly going to be identical if you don't
> acquire a new MVCC snapshot, but you don't address that.
>
> You might have meant "apply an equivalent ON CONFLICT DO UPDATE", or
> something like that, despite the fact that the use of ON CONFLICT DO
> NOTHING was clearly implied by the "goto 2". I also see problems with
> that, but I'll wait for you to clarify what you meant before going into
> what they are.

In step 3 we discover that an entry exists in the index for a committed row.

Since we have a unique index we use it to locate the row we know
exists and UPDATE that.

We don't use a new MVCC snapshot, we do what EPQ does. EPQ is already
violating MVCC for UPDATEs, so why does it matter if we do it for
INSERTs also?


Where hides the problem?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-06 Thread Peter Geoghegan

Simon Riggs  wrote:

APPROACH1
1. Join to produce results based upon snapshot at start of query
2. Apply results for INSERT, UPDATE or DELETE



Such failures are of great concern in practice because the time
between 1 and 2 could be very long for large statements, or for
smaller statements we might have sufficiently high concurrency to
allow us to see regular failures.


I'm not sure that they're a *great* concern in a world with something
that targets UPSERT use cases, which is a situation that does not exist
in DBMSs with MERGE (with the notable exception of Teradata). But it's
clearly a concern that users may expect to avoid duplicate violations in
READ COMMITTED, since this caused confusion among users of other
database systems with MERGE.


APPROACH2 (modified from my original proposal slightly)


This write-up actually begins to confront the issues that I've raised.
I'm glad to see this.


1. Join...
2. Apply results for UPDATE, if present not visible via the snapshot
taken at 1, do EPQ to ensure we locate current live tuple
3. If still not visible, do speculative insertion if we have a unique
index available, otherwise ERROR. If spec insertion fails, go to 2

The loop created above can live-lock, meaning that an infinite loop
could be created.


The loop is *guaranteed* to live-lock once you "goto 2". So you might as
well just throw an error at that point, which is the behavior that I've
been arguing for all along!

If this isn't guaranteed to live-lock at "goto 2", then it's not clear
why. The outcome of step 2 is clearly going to be identical if you don't
acquire a new MVCC snapshot, but you don't address that.

You might have meant "apply an equivalent ON CONFLICT DO UPDATE", or
something like that, despite the fact that the use of ON CONFLICT DO
NOTHING was clearly implied by the "goto 2". I also see problems with
that, but I'll wait for you to clarify what you meant before going into
what they are.


In practice, such live-locks are rare and we could detect them by
falling out of the loop after a few tries. Approach2's purpose is to
alleviate errors in Approach1, so falling out of the loop merely takes
us back to the error we would have got if we didn't try, so Approach2
has considerable benefit over Approach1.


I don't hate the idea of retrying a fixed number of times for things
like this, but I don't like it either. I'm going to assume that it's
fine for now.


I read that step 3 in Approach2 is some kind of problem in MVCC
semantics. My understanding is that SQL Standard allows us to define
what the semantics of the statement are in relation to concurrency, so
any semantic issue can be handled by defining it to work the way we
want.


My only concern is that our choices here should be good ones, based on
practical considerations. We both more or less agree on how this should
be assessed, I think; we just reach different conclusions.


As you point out, whichever we choose, we will be bound by those
semantics. So if we take Approach1, as has been indicated currently,
what is the written explanation for that, so we can show that to the
people who ask in the future about our decisions?


Well, Approach1 is what other systems implement. I think that it would
be important to point out that MERGE with Approach1 isn't special, but
ON CONFLICT DO UPDATE is special. We'd also say that higher isolation
levels will not have duplicate violations.

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-06 Thread Simon Riggs
On 3 November 2017 at 16:35, Peter Geoghegan  wrote:
> Simon Riggs  wrote:

 The *only* behavioural difference I have proposed would be the *lack*
 of an ERROR in (some) concurrent cases.
>>>
>>>
>>> I think that's a big difference.  Error vs. non-error is a big deal by
>>> itself;
>>
>>
>> Are you saying avoiding an ERROR is a bad or good thing?
>
>
> Are you really asking Robert to repeat what has already been said about
> a dozen different ways?

I'm asking for clarity of explanation rather than assertions.


> That's *not* the only difference. You need to see a couple of steps
> ahead to see further differences, as the real dilemma comes when you
> have to reconcile having provided the UPSERT-guarantees with cases that
> that doesn't map on to (which can happen in a number of different ways).
>
> I don't understand why you'll talk about just about anything but that.
> This is a high-level concern about the overarching design. Do you really
> not understand the concern at this point?

You're either referring to what is in the docs, which is INSERT ... ON
CONFLICT violates MVCC in a particular way, or something as yet
unstated. If it is the former, then I still don't see the problem (see
later). If it is the latter, I need more. So either way I need more.

> Robert Haas said
>In the past, there have been objections to implementations of MERGE
>which would give rise to such serialization anomalies, but I'm not
>sure we should feel bound by those discussions.  One thing that's
>different is that the common and actually-useful case can now be made
>to work in a fairly satisfying way using INSERT .. ON CONFLICT UPDATE;
>if less useful cases are vulnerable to some weirdness, maybe it's OK
>to just document the problems.

I agreed with that, and still do.


We need a clear, explicit description of this situation so I will
attempt that in detail here.


The basic concurrency problem we have is this

APPROACH1
1. Join to produce results based upon snapshot at start of query
2. Apply results for INSERT, UPDATE or DELETE

Given there is a time delay between 1 and 2 there is a race condition
so that if another user concurrently inserts the same value into a
unique index then an INSERT will fail with a uniqueness violation.

Such failures are of great concern in practice because the time
between 1 and 2 could be very long for large statements, or for
smaller statements we might have sufficiently high concurrency to
allow us to see regular failures.

APPROACH2 (modified from my original proposal slightly)
1. Join...
2. Apply results for UPDATE, if present not visible via the snapshot
taken at 1, do EPQ to ensure we locate current live tuple
3. If still not visible, do speculative insertion if we have a unique
index available, otherwise ERROR. If spec insertion fails, go to 2

The loop created above can live-lock, meaning that an infinite loop
could be created.

In practice, such live-locks are rare and we could detect them by
falling out of the loop after a few tries. Approach2's purpose is to
alleviate errors in Approach1, so falling out of the loop merely takes
us back to the error we would have got if we didn't try, so Approach2
has considerable benefit over Approach1. This only applies if we do an
INSERT, so if there is a WHEN NOT MATCHED ... AND clause with
probability W, that makes the INSERT rare then we simply have the
probablility of error in Approach2 approach the probability of error
in Approach1 as the W drops to zero, but with W high we may avoid many
errors. Approach2 never generates more errors than Approach1.

I read that step 3 in Approach2 is some kind of problem in MVCC
semantics. My understanding is that SQL Standard allows us to define
what the semantics of the statement are in relation to concurrency, so
any semantic issue can be handled by defining it to work the way we
want. The semantics are:
a) when a unique index is available we avoid errors by using semantics
of INSERT .. ON CONFLICT UPDATE.
b) when a unique index is not available we use other semantics.
To me this is the same as INSERTs failing in the presence of unique
indexes, but not failing when no index is present. The presence of a
unique constraint alters the semantics of the query.
We can choose Approach2 - as Robert says "[we should not] feel bound
by those [earlier] discussions"

Please explain what is wrong with the above without merely asserting
there is a problem.

As you point out, whichever we choose, we will be bound by those
semantics. So if we take Approach1, as has been indicated currently,
what is the written explanation for that, so we can show that to the
people who ask in the future about our decisions?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-03 Thread Peter Geoghegan

Simon Riggs  wrote:

The *only* behavioural difference I have proposed would be the *lack*
of an ERROR in (some) concurrent cases.


I think that's a big difference.  Error vs. non-error is a big deal by
itself;


Are you saying avoiding an ERROR is a bad or good thing?


Are you really asking Robert to repeat what has already been said about
a dozen different ways?

That's *not* the only difference. You need to see a couple of steps
ahead to see further differences, as the real dilemma comes when you
have to reconcile having provided the UPSERT-guarantees with cases that
that doesn't map on to (which can happen in a number of different ways).

I don't understand why you'll talk about just about anything but that.
This is a high-level concern about the overarching design. Do you really
not understand the concern at this point?


also, the non-error case involves departing from MVCC
semantics just as INSERT .. ON CONFLICT UPDATE does.


Meaning what exactly? What situation occurs that a user would be concerned with?

Please describe exactly what you mean so we get it clear.

The concurrent behaviour for MERGE is allowed to be
implementation-specific, so we can define it any way we want.


Agreed -- we can. It isn't controversial at all to say that the SQL
standard has nothing to say on this question. The problem is that the
semantics you argue for are ill-defined, and seem to create more
problems than they solve. Why keep bringing up the SQL standard?

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-03 Thread Simon Riggs
On 3 November 2017 at 08:26, Robert Haas  wrote:
> On Fri, Nov 3, 2017 at 1:05 PM, Simon Riggs  wrote:
>>> Therefore, if MERGE eventually uses INSERT .. ON CONFLICT
>>> UPDATE when a relevant unique index exists and does something else,
>>> such as your proposal of taking a strong lock, or Peter's proposal of
>>> doing this in a concurrency-oblivious manner, in other cases, then
>>> those two cases will behave very differently.
>>
>> The *only* behavioural difference I have proposed would be the *lack*
>> of an ERROR in (some) concurrent cases.
>
> I think that's a big difference.  Error vs. non-error is a big deal by
> itself;

Are you saying avoiding an ERROR is a bad or good thing?

> also, the non-error case involves departing from MVCC
> semantics just as INSERT .. ON CONFLICT UPDATE does.

Meaning what exactly? What situation occurs that a user would be concerned with?

Please describe exactly what you mean so we get it clear.

The concurrent behaviour for MERGE is allowed to be
implementation-specific, so we can define it any way we want.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-03 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> On Fri, Nov 3, 2017 at 1:05 PM, Simon Riggs  wrote:
> > We seem to have a few options for PG11
> >
> > 1. Do nothing, we reject MERGE
> >
> > 2. Implement MERGE for unique index situations only, attempting to
> > avoid errors (Simon OP)
> >
> > 3. Implement MERGE, but without attempting to avoid concurrent ERRORs 
> > (Peter)
> >
> > 4. Implement MERGE, while attempting to avoid concurrent ERRORs in
> > cases where that is possible.
> >
> > Stephen, Robert, please say which option you now believe we should pick.
> 
> I think Peter has made a good case for #3, so I lean toward that
> option.  I think #4 is too much of a non-obvious behavior difference
> between the cases where we can avoid those errors and the cases where
> we can't, and I don't see where #2 can go in the future other than #4.

Agreed.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-03 Thread Simon Riggs
On 3 November 2017 at 07:46, Thomas Kellerer  wrote:
> PMFJI
>
>> We seem to have a few options for PG11
>>
>> 1. Do nothing, we reject MERGE
>>
>> 2. Implement MERGE for unique index situations only, attempting to
>> avoid errors (Simon OP)
>>
>> 3. Implement MERGE, but without attempting to avoid concurrent ERRORs
>> (Peter)
>>
>> 4. Implement MERGE, while attempting to avoid concurrent ERRORs in
>> cases where that is possible.
>
> From an end-users point of view I would prefer 3 (or 4 if that won't prevent
> this from going into 11)

Sounds reasonable approach.

> INSERT ... ON CONFLICT is great, but there are situations where the
> restrictions can get in the way and it would be nice to have an alternative
> - albeit with some (documented) drawbacks. As far as I know Oracle also
> doesn't guarantee that MERGE is safe for concurrent use - you can still wind
> up with a unique key violation.

Yes, Oracle allows some unique key violations. It's clear that we
would need to allow some also. So we clearly can't infer whether they
avoid some errors just because they allow some.

My approach will be to reduce the errors in the best way, not to try
to copy errors Oracle makes, if any. But that error avoidance can
easily be a later add-on if we prefer it that way.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-03 Thread Robert Haas
On Fri, Nov 3, 2017 at 1:05 PM, Simon Riggs  wrote:
>> Therefore, if MERGE eventually uses INSERT .. ON CONFLICT
>> UPDATE when a relevant unique index exists and does something else,
>> such as your proposal of taking a strong lock, or Peter's proposal of
>> doing this in a concurrency-oblivious manner, in other cases, then
>> those two cases will behave very differently.
>
> The *only* behavioural difference I have proposed would be the *lack*
> of an ERROR in (some) concurrent cases.

I think that's a big difference.  Error vs. non-error is a big deal by
itself; also, the non-error case involves departing from MVCC
semantics just as INSERT .. ON CONFLICT UPDATE does.

> All I have at the moment is that a few people disagree, but that
> doesn't help determine the next action.
>
> We seem to have a few options for PG11
>
> 1. Do nothing, we reject MERGE
>
> 2. Implement MERGE for unique index situations only, attempting to
> avoid errors (Simon OP)
>
> 3. Implement MERGE, but without attempting to avoid concurrent ERRORs (Peter)
>
> 4. Implement MERGE, while attempting to avoid concurrent ERRORs in
> cases where that is possible.
>
> Stephen, Robert, please say which option you now believe we should pick.

I think Peter has made a good case for #3, so I lean toward that
option.  I think #4 is too much of a non-obvious behavior difference
between the cases where we can avoid those errors and the cases where
we can't, and I don't see where #2 can go in the future other than #4.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-03 Thread Thomas Kellerer
PMFJI

> We seem to have a few options for PG11
> 
> 1. Do nothing, we reject MERGE
> 
> 2. Implement MERGE for unique index situations only, attempting to
> avoid errors (Simon OP)
> 
> 3. Implement MERGE, but without attempting to avoid concurrent ERRORs
> (Peter)
> 
> 4. Implement MERGE, while attempting to avoid concurrent ERRORs in
> cases where that is possible.

>From an end-users point of view I would prefer 3 (or 4 if that won't prevent
this from going into 11) 

INSERT ... ON CONFLICT is great, but there are situations where the
restrictions can get in the way and it would be nice to have an alternative
- albeit with some (documented) drawbacks. As far as I know Oracle also
doesn't guarantee that MERGE is safe for concurrent use - you can still wind
up with a unique key violation. 





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-03 Thread Simon Riggs
On 2 November 2017 at 22:59, Nico Williams  wrote:
> On Thu, Nov 02, 2017 at 03:25:48PM -0700, Peter Geoghegan wrote:
>> Nico Williams  wrote:
>> >A MERGE mapped to a DML like this:
>>
>> This is a bad idea. An implementation like this is not at all
>> maintainable.
>
> Assuming the DELETE issue can be addressed, why would this not be
> maintainable?

It would only take one change to make this approach infeasible and
when that happened we would need to revert to the full-executor
version.

One difference that comes to mind is that MERGE doesn't behave the
same way as an UPDATE-join, according to SQL:2011 in that it must
throw an error if duplicate changes are requested. That would be hard
to emulate using a parser only version.

I would call it impressively clever but likely fragile, in this case,
though I encourage more ideas like that in the future.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-03 Thread Simon Riggs
On 2 November 2017 at 17:06, Robert Haas  wrote:
> On Tue, Oct 31, 2017 at 5:14 PM, Simon Riggs  wrote:
>> I've proposed a SQL Standard compliant implementation that would do
>> much more than be new syntax over what we already have.
>>
>> So these two claims aren't accurate: "radical difference" and "syntax
>> sugar over a capability we have".
>
> I think those claims are pretty accurate.

No, because there is misunderstanding on some technical points.

They key point is at the end - what do we do next?

> The design of INSERT .. ON CONFLICT UPDATE and the supporting
> machinery only work if there is a unique index.  It provides radically
> different behavior than what you get with any other DML statement,
> with completely novel concurrency behavior, and there is no reasonable
> way to emulate that behavior in cases where no relevant unique index
> exists.

Agreed

> Therefore, if MERGE eventually uses INSERT .. ON CONFLICT
> UPDATE when a relevant unique index exists and does something else,
> such as your proposal of taking a strong lock, or Peter's proposal of
> doing this in a concurrency-oblivious manner, in other cases, then
> those two cases will behave very differently.

The *only* behavioural difference I have proposed would be the *lack*
of an ERROR in (some) concurrent cases.

We have a way to avoid some concurrency errors during MERGE, while
still maintaining exact SQL Standard behaviour. Peter has pointed out
that we could not catch errors in certain cases; that does nothing to
the cases where it will work well. It would be fallacious to imagine
it is an all or nothing situation.

> And if, in the meantime, MERGE can only handle the cases where there
> is a unique index,

I haven't proposed that MERGE would be forever limited to cases with a
unique index, only that MERGE-for-unique-indexes would be the version
in PG11, for good reason.

> then it can only handle the cases INSERT .. ON
> CONFLICT UPDATE can cover, which makes it, as far as I can see,
> syntactic sugar over what we already have.

SQL:2011 is greatly enhanced and this is no longer true; it was in
earlier versions but is not now.

MERGE allows you do DELETE, as well as conditional UPDATE and INSERT.
It is very clearly much more than UPSERT.

> Maybe it's not entirely -
> you might be planning to make some minor functional enhancements - but
> it's not clear what those are, and I feel like whatever it is could be
> done with less work and more elegance by just extending the INSERT ..
> ON CONFLICT UPDATE syntax.

The differences are clearly apparent in the Standard and in my submitted docs.

But I am interested in submitting a SQL Standard compliant feature.

> And it does seem to be your intention to only handle the cases which
> the INSERT .. ON CONFLICT UPDATE infrastructure can cover, because
> upthread you wrote this: "I didn't say it but my intention was to just
> throw an ERROR if no single unique index can be identified."  I don't
> think anybody's putting words into your mouth here.  We're just
> reading what you wrote.

Happy to have written it because that covers the common use case I was
hoping to deliver in PG11. Incremental development.

My proposal was to implement the common case, while avoiding
concurrency errors. Peter proposes that I cover both the common case
and more general cases, without avoiding concurrency errors.


All I have at the moment is that a few people disagree, but that
doesn't help determine the next action.

We seem to have a few options for PG11

1. Do nothing, we reject MERGE

2. Implement MERGE for unique index situations only, attempting to
avoid errors (Simon OP)

3. Implement MERGE, but without attempting to avoid concurrent ERRORs (Peter)

4. Implement MERGE, while attempting to avoid concurrent ERRORs in
cases where that is possible.

Stephen, Robert, please say which option you now believe we should pick.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Nico Williams
On Thu, Nov 02, 2017 at 03:25:48PM -0700, Peter Geoghegan wrote:
> Nico Williams  wrote:
> >A MERGE mapped to a DML like this:
> 
> This is a bad idea. An implementation like this is not at all
> maintainable.

Assuming the DELETE issue can be addressed, why would this not be
maintainable?

> >can handle concurrency via ON CONFLICT DO NOTHING in the INSERT CTE.
> 
> That's not handling concurrency -- it's silently ignoring an error. Who
> is to say that the conflict that IGNORE ignored is associated with a row
> visible to the MVCC snapshot of the statement? IOW, why should the DELETE
> affect any row?

Ah, yes, we'd have to make sure the DELETE does not delete rows that
could not be inserted.  There's... no way to find out what those would
have been -- RETURNING won't mention them, though it'd be a nice
addition to UPSERT to have a way to do that, and it'd make this mapping
feasible.

> There are probably a great many reasons why you need a ModifyTable
> executor node that keeps around state, and explicitly indicates that a
> MERGE is a MERGE. For example, we'll probably want statement level
> triggers to execute in a fixed order, regardless of the MERGE, RLS will
> probably require explicitly knowledge of MERGE semantics, and so on.

Wouldn't those fire anyways in a statement like the one I mentioned?

> FWIW, your example doesn't actually have a source (just a target), so it
> isn't actually like MERGE.

That can be added.  I was trying to keep it pithy.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Peter Geoghegan

Nico Williams  wrote:

A MERGE mapped to a DML like this:

 WITH
 updated AS (
   UPDATE 
   SET ...
   WHERE 
   RETURNING 
   )
   , inserted AS (
   INSERT INTO 
   SELECT ...
   WHERE  NOT IN (SELECT  FROM updated) AND ..
   ON CONFLICT DO NOTHING -- see below!
   RETURNING 
   )
 DELETE FROM 
 WHERE  NOT IN (SELECT  FROM updated) AND
NOT IN (SELECT  FROM inserted) AND ...;



This is a bad idea. An implementation like this is not at all
maintainable.


can handle concurrency via ON CONFLICT DO NOTHING in the INSERT CTE.


That's not handling concurrency -- it's silently ignoring an error. Who
is to say that the conflict that IGNORE ignored is associated with a row
visible to the MVCC snapshot of the statement? IOW, why should the DELETE
affect any row?

There are probably a great many reasons why you need a ModifyTable
executor node that keeps around state, and explicitly indicates that a
MERGE is a MERGE. For example, we'll probably want statement level
triggers to execute in a fixed order, regardless of the MERGE, RLS will
probably require explicitly knowledge of MERGE semantics, and so on.

FWIW, your example doesn't actually have a source (just a target), so it
isn't actually like MERGE.

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Nico Williams
On Thu, Nov 02, 2017 at 02:05:19PM -0700, Peter Geoghegan wrote:
> Simon Riggs  wrote:
> >So in your view we should make no attempt to avoid concurrent errors,
> >even when we have the capability to do so (in some cases) and doing so
> >would be perfectly compliant with the SQLStandard.
> 
> Yes. That's what I believe. I believe this because I can't see a way to
> do this that isn't a mess, and because ON CONFLICT DO UPDATE exists and
> works well for the cases where we can do better in READ COMMITTED mode.

A MERGE mapped to a DML like this:

  WITH
  updated AS (
UPDATE 
SET ...
WHERE 
RETURNING 
)
, inserted AS (
INSERT INTO 
SELECT ...
WHERE  NOT IN (SELECT  FROM updated) AND ..
ON CONFLICT DO NOTHING -- see below!
RETURNING 
)
  DELETE FROM 
  WHERE  NOT IN (SELECT  FROM updated) AND
 NOT IN (SELECT  FROM inserted) AND ...;

can handle concurrency via ON CONFLICT DO NOTHING in the INSERT CTE.

Now, one could write a MERGE that produces conflicts even without
concurrency, so adding ON CONFLICT DO NOTHING by default as above...
seems not-quite-correct.  But presumably one wouldn't write MERGE
statements that produce conflicts in the absence of concurrency, so this
seems close enough to me.

Another thing is that MERGE itself could get an ON CONFLICT clause for
the INSERT portion of the MERGE, allowing one to ignore some conflicts
and not others, though there would be no need for DO UPDATE, only DO
NOTHING for conflict resolution :)  This seems better.

I do believe this mapping is correct, and could be implemented entirely
in src/backend/parser/gram.y!  Am I wrong about this?

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Peter Geoghegan

Simon Riggs  wrote:

I think people imagined you had worked out how to make MERGE run
concurrently, I certainly did, but in fact you're just saying you
don't believe it ever should.


I'm certain that they didn't think that at all. But I'll let them speak
for themselves.


That is strange since the SQL Standard specifically allows the
implementation to decide upon concurrent behaviour.


And yet nobody else decided to do what you propose with this apparent
leeway.  (See the UPSERT wiki page for many references that confirm
this.)


So in your view we should make no attempt to avoid concurrent errors,
even when we have the capability to do so (in some cases) and doing so
would be perfectly compliant with the SQLStandard.


Yes. That's what I believe. I believe this because I can't see a way to
do this that isn't a mess, and because ON CONFLICT DO UPDATE exists and
works well for the cases where we can do better in READ COMMITTED mode.

Did you know that Oracle doesn't have an EPQ style mechanism at all?
Instead, it rolls back the entire statement and retries it from scratch.
That is not really any further from or closer to the standard than the
EPQ stuff, because the standard doesn't say anything about what should
happen as far as READ COMMITTED conflict handling goes.

My point here is that all of the stuff I'm talking about is only
relevant in READ COMMITTED mode, in areas where the standard never
provides us with guidance. If you can rely on SSI, then there is no
difference between what you propose and what I propose anyway, except
that what I propose is more general and will have better performance,
especially for batch MERGEs. If READ COMMITTED didn't exist,
implementing ON CONFLICT would have been more or less free of
controversy.


Yes, that certainly will make an easier patch for MERGE.


Indeed, it will.

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Simon Riggs
On 2 November 2017 at 19:16, Peter Geoghegan  wrote:
> Simon Riggs  wrote:
>>
>> So if I understand you correctly, in your view MERGE should just fail
>> with an ERROR if it runs concurrently with other DML?
>
>
> That's certainly my opinion on the matter. It seems like that might be
> the consensus, too.

Given that I only just found out what you've been talking about, I
don't believe that anybody else did either.

I think people imagined you had worked out how to make MERGE run
concurrently, I certainly did, but in fact you're just saying you
don't believe it ever should.

That is strange since the SQL Standard specifically allows the
implementation to decide upon concurrent behaviour.


> Without meaning to sound glib: we already did make it work for a
> special, restricted case that is important enough to justify introducing
> a couple of kludges -- ON CONFLICT DO UPDATE/upsert.
>
> I do agree that what I propose for MERGE will probably cause confusion;
> just look into Oracle's MERGE implementation for examples of this.  We
> ought to go out of our way to make it clear that MERGE doesn't provide
> these guarantees.

So in your view we should make no attempt to avoid concurrent errors,
even when we have the capability to do so (in some cases) and doing so
would be perfectly compliant with the SQLStandard.

Yes, that certainly will make an easier patch for MERGE.

Or are you arguing against allowing any patch for MERGE?

Now we have more clarity, who else agrees with this?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Nico Williams
On Thu, Nov 02, 2017 at 12:51:45PM -0700, Peter Geoghegan wrote:
> Nico Williams  wrote:
> >If you want to ignore conflicts arising from concurrency you could
> >always add an ON CONFLICT DO NOTHING to the INSERT DML in the mapping I
> >proposed earlier.  Thus a MERGE CONCURRENTLY could just do that.
> >
> >Is there any reason not to map MERGE as I proposed?
> 
> Performance, for one. MERGE generally has a join that can be optimized
> like an UPDATE FROM join.

Ah, right, I think my mapping was pessimal.  How about this mapping
instead then:

WITH
updated AS (
  UPDATE 
  SET ...
  WHERE 
  RETURNING 
  )
  , inserted AS (
  INSERT INTO 
  SELECT ...
  WHERE  NOT IN (SELECT  FROM updated) AND ..
  /*
   * Add ON CONFLICT DO NOTHING here to avoid conflicts in the face
   * of concurrency.
   */
  RETURNING 
  )
DELETE FROM 
WHERE  NOT IN (SELECT  FROM updated) AND
   NOT IN (SELECT  FROM inserted) AND ...;

?

If a MERGE has no delete clause, then the mapping would be:

WITH
updated AS (
  UPDATE 
  SET ...
  WHERE 
  RETURNING 
  )
INSERT INTO 
SELECT ...
WHERE  NOT IN (SELECT  FROM updated) AND ..
/*
 * Add ON CONFLICT DO NOTHING here to avoid conflicts in the face
 * of concurrency.
 */
;

> I haven't studied this question in any detail, but FWIW I think that
> using CTEs for merging is morally equivalent to a traditional MERGE
> implementation. [...]

I agree.  So why not do that initially?  Optimize later.

Such a MERGE mapping could be implemented entirely within
src/backend/parser/gram.y ...

Talk about cheap to implement, review, and maintain!

Also, this would be notionally very simple.

Any optimizations to CTE query/DML execution would be generic and
applicable to MERGE and other things besides.  If mapping MERGE to
CTE-using DMLs motivates such optimizations, all the better.

>  [...]. It may actually be possible to map from CTEs to a MERGE
> statement, but I don't think that that's a good approach to implementing
> MERGE.

Surely not every DML with CTEs can map to MERGE.  Maybe I misunderstood
your comment?

> Most of the implementation time will probably be spent doing things like
> making sure MERGE behaves appropriately with triggers, RLS, updatable
> views, and so on. That will take quite a while, but isn't particularly
> technically challenging IMV.

Note that mapping to a DML with CTEs as above gets triggers, RLS, and
updateable views right from the get-go, because DMLs with CTEs, and DMLs
as CTEs, surely do as well.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Peter Geoghegan

Simon Riggs  wrote:

So if I understand you correctly, in your view MERGE should just fail
with an ERROR if it runs concurrently with other DML?


That's certainly my opinion on the matter. It seems like that might be
the consensus, too.

Obviously there are things that you as a user can do about this on your
own, like opt to use a higher isolation level, or manually LOCK TABLE.
For some use cases, including bulk loading for OLAP, users might just
know that there isn't going to be concurrent activity because it's not
an OLTP system.

If this still seems odd to you, then consider that exactly the same
situation exists with UPDATE. A user could want their UPDATE to affect a
row where no row version is actually visible to their MVCC snapshot,
because they have an idea about reliably updating the latest row. UPDATE
doesn't work like that, of course. Is this unacceptable because the user
expects that it should work that way?

Bear in mind that ON CONFLICT DO UPDATE *can* actually update a row when
there is no version of it visible to the snapshot. It can also update a
row where there is a concurrent DELETE + INSERT, and the tuples with the
relevant unique index values end up not even being part of the same
update chain in each case (MVCC-snapshot-visible vs. latest). IOW, you
may end up updating a completely different logical row to the row with
the conflicting value that is visible to your MVCC snapshot!


i.e. if a race condition between the query and an INSERT runs
concurrently with another INSERT

We have no interest in making that work?


Without meaning to sound glib: we already did make it work for a
special, restricted case that is important enough to justify introducing
a couple of kludges -- ON CONFLICT DO UPDATE/upsert.

I do agree that what I propose for MERGE will probably cause confusion;
just look into Oracle's MERGE implementation for examples of this.  We
ought to go out of our way to make it clear that MERGE doesn't provide
these guarantees.

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Peter Geoghegan

Nico Williams  wrote:

If you want to ignore conflicts arising from concurrency you could
always add an ON CONFLICT DO NOTHING to the INSERT DML in the mapping I
proposed earlier.  Thus a MERGE CONCURRENTLY could just do that.

Is there any reason not to map MERGE as I proposed?


Performance, for one. MERGE generally has a join that can be optimized
like an UPDATE FROM join.

I haven't studied this question in any detail, but FWIW I think that
using CTEs for merging is morally equivalent to a traditional MERGE
implementation. It may actually be possible to map from CTEs to a MERGE
statement, but I don't think that that's a good approach to implementing
MERGE.

Most of the implementation time will probably be spent doing things like
making sure MERGE behaves appropriately with triggers, RLS, updatable
views, and so on. That will take quite a while, but isn't particularly
technically challenging IMV.

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Nico Williams
On Thu, Nov 02, 2017 at 06:49:18PM +, Simon Riggs wrote:
> On 1 November 2017 at 18:20, Peter Geoghegan  wrote:
> > In Postgres, you can avoid duplicate violations with MERGE by using a
> > higher isolation level (these days, those are turned into a
> > serialization error at higher isolation levels when no duplicate is
> > visible to the xact's snapshot).
> 
> So if I understand you correctly, in your view MERGE should just fail
> with an ERROR if it runs concurrently with other DML?
> 
> i.e. if a race condition between the query and an INSERT runs
> concurrently with another INSERT
> 
> We have no interest in making that work?

If you map MERGE to a DML with RETURNING-DML CTEs as I suggested before,
how would that interact with concurrent DMLs?  The INSERT DML of the
mapped statement could produce conflicts that abort the whole MERGE,
correct?

If you want to ignore conflicts arising from concurrency you could
always add an ON CONFLICT DO NOTHING to the INSERT DML in the mapping I
proposed earlier.  Thus a MERGE CONCURRENTLY could just do that.

Is there any reason not to map MERGE as I proposed?

Such an implementation of MERGE wouldn't be online because CTEs are
always implemented sequentially currently.  That's probably reason
enough to eventually produce a native implementation of MERGE, ... or to
revamp the CTE machinery to allow such a mapping to be online.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Nico Williams
If nothing else, anyone needing MERGE can port their MERGE statements to
a DML with DML-containing CTEs...

The generic mapping would be something like this, I think:

WITH
rows AS (SELECT  FROM  WHERE )
  , updated AS (
UPDATE 
SET ...
WHERE  IN (SELECT  FROM rows) /* matched */
RETURNING 
  )
  , inserted AS (
INSERT INTO 
SELECT ...
WHERE  NOT IN (SELECT  FROM rows) /* not matched */
RETURNING 
  )
DELETE FROM 
WHERE (...) AND
   NOT IN (SELECT  FROM updated UNION
SELECT  FROM inserted);

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Simon Riggs
On 1 November 2017 at 18:20, Peter Geoghegan  wrote:

> In Postgres, you can avoid duplicate violations with MERGE by using a
> higher isolation level (these days, those are turned into a
> serialization error at higher isolation levels when no duplicate is
> visible to the xact's snapshot).

So if I understand you correctly, in your view MERGE should just fail
with an ERROR if it runs concurrently with other DML?

i.e. if a race condition between the query and an INSERT runs
concurrently with another INSERT

We have no interest in making that work?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Peter Geoghegan

Robert Haas  wrote:

And if, in the meantime, MERGE can only handle the cases where there
is a unique index, then it can only handle the cases INSERT .. ON
CONFLICT UPDATE can cover, which makes it, as far as I can see,
syntactic sugar over what we already have.  Maybe it's not entirely -
you might be planning to make some minor functional enhancements - but
it's not clear what those are, and I feel like whatever it is could be
done with less work and more elegance by just extending the INSERT ..
ON CONFLICT UPDATE syntax.


+1

Marko Tiikkaja's INSERT ... ON CONFLICT SELECT patch, which is in the
current CF [1], moves things in this direction. I myself have
occasionally wondered if it was worth adding an alternative DO DELETE
conflict_action. This could appear alongside DO UPDATE, and be applied
using MERGE-style conditions.

All of these things seem like small adjuncts to ON CONFLICT because
they're all just an alternative way of modifying or projecting the tuple
that is locked by ON CONFLICT. Everything new would have to happen after
the novel ON CONFLICT handling has already completed.

The only reason that I haven't pursued this is because it doesn't seem
that compelling. I mention it now because It's worth acknowledging that
ON CONFLICT could be pushed a bit further in this direction. Of course,
this still falls far short of making ON CONFLICT entirely like MERGE.

[1] https://commitfest.postgresql.org/15/1241/
--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Robert Haas
On Tue, Oct 31, 2017 at 5:14 PM, Simon Riggs  wrote:
> I've proposed a SQL Standard compliant implementation that would do
> much more than be new syntax over what we already have.
>
> So these two claims aren't accurate: "radical difference" and "syntax
> sugar over a capability we have".

I think those claims are pretty accurate.

The design of INSERT .. ON CONFLICT UPDATE and the supporting
machinery only work if there is a unique index.  It provides radically
different behavior than what you get with any other DML statement,
with completely novel concurrency behavior, and there is no reasonable
way to emulate that behavior in cases where no relevant unique index
exists.  Therefore, if MERGE eventually uses INSERT .. ON CONFLICT
UPDATE when a relevant unique index exists and does something else,
such as your proposal of taking a strong lock, or Peter's proposal of
doing this in a concurrency-oblivious manner, in other cases, then
those two cases will behave very differently.

And if, in the meantime, MERGE can only handle the cases where there
is a unique index, then it can only handle the cases INSERT .. ON
CONFLICT UPDATE can cover, which makes it, as far as I can see,
syntactic sugar over what we already have.  Maybe it's not entirely -
you might be planning to make some minor functional enhancements - but
it's not clear what those are, and I feel like whatever it is could be
done with less work and more elegance by just extending the INSERT ..
ON CONFLICT UPDATE syntax.

And it does seem to be your intention to only handle the cases which
the INSERT .. ON CONFLICT UPDATE infrastructure can cover, because
upthread you wrote this: "I didn't say it but my intention was to just
throw an ERROR if no single unique index can be identified."  I don't
think anybody's putting words into your mouth here.  We're just
reading what you wrote.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Robert Haas
On Thu, Nov 2, 2017 at 8:28 AM, Craig Ringer  wrote:
> On 1 November 2017 at 01:55, Peter Geoghegan  wrote:
>> The problem here is: Iff the first statement uses ON CONFLICT
>> infrastructure, doesn't the absence of WHEN NOT MATCHED imply
>> different semantics for the remaining updates and deletes in the
>> second version of the query? You've removed what seems like a neat
>> adjunct to the MERGE, but it actually changes everything else too when
>> using READ COMMITTED.
>
> Would these concerns be alleviated by adding some kind of Pg-specific
> decoration that constrained concurrency-safe MERGEs?
>
> So your first statement would be
>
>  MERGE CONCURRENTLY ...
>
> and when you removed the WHEN NOT MATCHED clause it'd ERROR because
> that's no longer able to be done with the same concurrency-safe
> semantics?
>
> I don't know if this would be helpful TBH, or if it would negate
> Simon's compatibility goals. Just another idea.

Yes, that fixes the problem.  Of course, it also turns MERGE
CONCURRENTLY into syntactic sugar for INSERT ON CONFLICT UPDATE, which
brings one back to the question of exactly what we're trying to
achieve here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-01 Thread Craig Ringer
On 1 November 2017 at 01:55, Peter Geoghegan  wrote:

> The problem here is: Iff the first statement uses ON CONFLICT
> infrastructure, doesn't the absence of WHEN NOT MATCHED imply
> different semantics for the remaining updates and deletes in the
> second version of the query? You've removed what seems like a neat
> adjunct to the MERGE, but it actually changes everything else too when
> using READ COMMITTED.

Would these concerns be alleviated by adding some kind of Pg-specific
decoration that constrained concurrency-safe MERGEs?

So your first statement would be

 MERGE CONCURRENTLY ...

and when you removed the WHEN NOT MATCHED clause it'd ERROR because
that's no longer able to be done with the same concurrency-safe
semantics?

I don't know if this would be helpful TBH, or if it would negate
Simon's compatibility goals. Just another idea.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-01 Thread Peter Geoghegan
On Wed, Nov 1, 2017 at 10:19 AM, Simon Riggs  wrote:
>> The problem here is: Iff the first statement uses ON CONFLICT
>> infrastructure, doesn't the absence of WHEN NOT MATCHED imply
>> different semantics for the remaining updates and deletes in the
>> second version of the query?
>
> Not according to the SQL Standard, no. I have no plans for such
> differences to exist.
>
> Spec says: If we hit HeapTupleSelfUpdated then we throw an ERROR.

Your documentation said that the MERGE was driven by a speculative
insertion (BTW, I don't think that this internal implementation detail
should be referenced in user-facing docs). I inferred that that could
not always be true, since there won't always be an INSERT/WHEN NOT
MATCHED case, assuming that you allow that at all (I now gather that you
will).

>> You've removed what seems like a neat
>> adjunct to the MERGE, but it actually changes everything else too when
>> using READ COMMITTED. Isn't that pretty surprising?
>
> I think you're presuming things I haven't said and don't mean, so
> we're both surprised.

You're right -- I'm surmising what I think might be true, because I
don't have the information available to know one way or the other. As
far as this issue with using speculative insertions in one context but
not in another goes, I still don't really know where you stand. I can
still only surmise that you must want both implementations, and will use
one or the other as circumstances dictate (to avoid dup violations in
the style of ON CONFLICT where that's possible).

This seems true because you now say that it will be possible to omit
WHEN NOT MATCHED, and yet there is no such thing as a speculative
insertion without the insertion. You haven't said that that conclusion
is true yourself, but it's the only conclusion that I can draw based on
what you have said.

> I think we need some way of expressing the problems clearly.

It's certainly hard to talk about these problems. I know this from
experience.

> "a general purpose solution is one that more or
> less works like an UPDATE FROM, with an outer join, whose ModifyTable
> node is capable of insert, update, or delete (and accepts quals for
> MATCHED and NOT matched cases, etc). You could still get duplicate
> violations due to concurrent activity in READ COMMITTED mode".
>
> Surely the whole point of this is to avoid duplicate violations due to
> concurrent activity?

Now we're getting somewhere.

I *don't* think that that's the whole point of MERGE. No other MERGE
implementation does that, or claims to do that. The SQL standard says
nothing about this. Heikki found this to be acceptable when working on
the GSoC MERGE implementation that went nowhere.

My position is that we ought to let MERGE be MERGE, and let ON CONFLICT
be ON CONFLICT.

In Postgres, you can avoid duplicate violations with MERGE by using a
higher isolation level (these days, those are turned into a
serialization error at higher isolation levels when no duplicate is
visible to the xact's snapshot).  MERGE isn't and shouldn't be special
when it comes to concurrency.

> I'm not seeing how either design sketch rigorously avoids live locks,
> but those are fairly unlikely and easy to detect and abort.

My MERGE semantics (which really are not mine at all) avoid live
lock/lock starvation by simply never retrying anything without making
forward progress. MERGE doesn't take any special interest in
concurrency, just like any other DML statement that isn't INSERT with ON
CONFLICT.

ON CONFLICT would have live locks if it didn't always have the choice of
inserting [1]. In many ways, the syntax of INSERT ON CONFLICT DO UPDATE
is restricted in exactly the way it needs to be in order to function
correctly. It wasn't an accident that it didn't end up being UPDATE ...
ON NOUPDATE DO INSERT, or something like that, which Robert proposed at
one point.

ON CONFLICT plays by its own rules to a certain extent, because that's
what you need in order to get the desired guarantees in READ COMMITTED
mode [2]. This is the main reason why it was as painful a project as it
was. Further generalizing that seems fraught with difficulties. It seems
logically impossible to generalize it in a way where you don't end up
with two behaviors masquerading as one.

[1] https://wiki.postgresql.org/wiki/UPSERT#Theoretical_lock_starvation_hazards
[2] https://wiki.postgresql.org/wiki/UPSERT#Goals_for_implementation
--
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-01 Thread Simon Riggs
On 31 October 2017 at 18:55, Peter Geoghegan  wrote:
> On Tue, Oct 31, 2017 at 2:25 AM, Simon Riggs  wrote:
>> If there are challenges ahead, its reasonable to ask for test cases
>> for that now especially if you think you know what they already are.
>> Imagine we go forwards 2 months - if you dislike my patch when it
>> exists you will submit a test case showing the fault. Why not save us
>> all the trouble and describe that now? Test Driven Development.
>
> I already have, on several occasions now. But if you're absolutely
> insistent on my constructing the test case in terms of a real SQL
> statement, then that's what I'll do.
>
> Consider this MERGE statement, from your mock documentation:
>
> MERGE INTO wines w
> USING wine_stock_changes s
> ON s.winename = w.winename
> WHEN NOT MATCHED AND s.stock_delta > 0 THEN
>   INSERT VALUES(s.winename, s.stock_delta)
> WHEN MATCHED AND w.stock + s.stock_delta > 0 THEN
>   UPDATE SET stock = w.stock + s.stock_delta
> ELSE
>   DELETE;
>
> Suppose we remove the WHEN NOT MATCHED case, leaving us with:
>
> MERGE INTO wines w
> USING wine_stock_changes s
> ON s.winename = w.winename
> WHEN MATCHED AND w.stock + s.stock_delta > 0 THEN
>   UPDATE SET stock = w.stock + s.stock_delta
> ELSE
>   DELETE;
>
> We now have a MERGE that will not INSERT, but will continue to UPDATE
> and DELETE.

Agreed

> (It's implied that your syntax cannot do this at all,
> because you propose use the ON CONFLICT infrastructure, but I think we
> have to imagine a world in which that restriction either never existed
> or has subsequently been lifted.)
>
> The problem here is: Iff the first statement uses ON CONFLICT
> infrastructure, doesn't the absence of WHEN NOT MATCHED imply
> different semantics for the remaining updates and deletes in the
> second version of the query?

Not according to the SQL Standard, no. I have no plans for such
differences to exist.

Spec says: If we hit HeapTupleSelfUpdated then we throw an ERROR.

> You've removed what seems like a neat
> adjunct to the MERGE, but it actually changes everything else too when
> using READ COMMITTED. Isn't that pretty surprising?

I think you're presuming things I haven't said and don't mean, so
we're both surprised.

> If you're not
> clear on what I mean, see my previous remarks on EPQ, live lock, and
> what a CONFLICT could be in READ COMMITTED mode. Concurrent activity
> at READ COMMITTED mode can be expected to significantly alter the
> outcome here.

And I still have questions about what exactly you mean, but at least
this post is going in the right direction and I'm encouraged. Thank
you,

I think we need some way of expressing the problems clearly.

> That is rather frustrating.

Guess so.

>> You've said its possible another way. Show that assertion is actually
>> true. We're all listening, me especially, for the technical details.
>
> My proposal, if you want to call it that, has the merit of actually
> being how MERGE works in every other system. Both Robert and Stephen
> seem to be almost sold on what I suggest, so I think that I've
> probably already explained my position quite well.

The only info I have is

"a general purpose solution is one that more or
less works like an UPDATE FROM, with an outer join, whose ModifyTable
node is capable of insert, update, or delete (and accepts quals for
MATCHED and NOT matched cases, etc). You could still get duplicate
violations due to concurrent activity in READ COMMITTED mode".

Surely the whole point of this is to avoid duplicate violations due to
concurrent activity?

I'm not seeing how either design sketch rigorously avoids live locks,
but those are fairly unlikely and easy to detect and abort.

Thank you for a constructive email, we are on the way to somewhere good.

I have more to add, but wanted to get back to you soonish.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-31 Thread Peter Geoghegan
On Tue, Oct 31, 2017 at 2:25 AM, Simon Riggs  wrote:
> If there are challenges ahead, its reasonable to ask for test cases
> for that now especially if you think you know what they already are.
> Imagine we go forwards 2 months - if you dislike my patch when it
> exists you will submit a test case showing the fault. Why not save us
> all the trouble and describe that now? Test Driven Development.

I already have, on several occasions now. But if you're absolutely
insistent on my constructing the test case in terms of a real SQL
statement, then that's what I'll do.

Consider this MERGE statement, from your mock documentation:

MERGE INTO wines w
USING wine_stock_changes s
ON s.winename = w.winename
WHEN NOT MATCHED AND s.stock_delta > 0 THEN
  INSERT VALUES(s.winename, s.stock_delta)
WHEN MATCHED AND w.stock + s.stock_delta > 0 THEN
  UPDATE SET stock = w.stock + s.stock_delta
ELSE
  DELETE;

Suppose we remove the WHEN NOT MATCHED case, leaving us with:

MERGE INTO wines w
USING wine_stock_changes s
ON s.winename = w.winename
WHEN MATCHED AND w.stock + s.stock_delta > 0 THEN
  UPDATE SET stock = w.stock + s.stock_delta
ELSE
  DELETE;

We now have a MERGE that will not INSERT, but will continue to UPDATE
and DELETE. (It's implied that your syntax cannot do this at all,
because you propose use the ON CONFLICT infrastructure, but I think we
have to imagine a world in which that restriction either never existed
or has subsequently been lifted.)

The problem here is: Iff the first statement uses ON CONFLICT
infrastructure, doesn't the absence of WHEN NOT MATCHED imply
different semantics for the remaining updates and deletes in the
second version of the query? You've removed what seems like a neat
adjunct to the MERGE, but it actually changes everything else too when
using READ COMMITTED. Isn't that pretty surprising? If you're not
clear on what I mean, see my previous remarks on EPQ, live lock, and
what a CONFLICT could be in READ COMMITTED mode. Concurrent activity
at READ COMMITTED mode can be expected to significantly alter the
outcome here.

Why not just always use the behavior that the second query requires,
which is very much like an UPDATE FROM with an outer join that can
sometimes do deletes (and inserts)? We should always use an MVCC
snapshot, and never play ON CONFLICT style games with visibility/dirty
snapshots.

> It's difficult to discuss anything with someone that refuses to
> believe that there are acceptable ways around things. I believe there
> are.

Isn't that blind faith? Again, it seems like you haven't really tried
to convince me.

> If you can calm down the rhetoric we can work together, but if you
> continue to grandstand it makes it more difficult.

I'm trying to break the deadlock, by getting you to actually consider
what I'm saying. I don't enjoy confrontation. Currently, it seems like
you're just ignoring my objections, which you actually could fairly
easily work through. That is rather frustrating.

> You've said its possible another way. Show that assertion is actually
> true. We're all listening, me especially, for the technical details.

My proposal, if you want to call it that, has the merit of actually
being how MERGE works in every other system. Both Robert and Stephen
seem to be almost sold on what I suggest, so I think that I've
probably already explained my position quite well.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-31 Thread Simon Riggs
On 31 October 2017 at 12:56, Stephen Frost  wrote:
> Simon,
>
> * Simon Riggs (si...@2ndquadrant.com) wrote:
>> On 30 October 2017 at 19:55, Stephen Frost  wrote:
>> > I don't think MERGE should be radically different from other database
>> > systems and just syntax sugar over a capability we have.
>>
>> I've proposed a SQL Standard compliant implementation that would do
>> much more than be new syntax over what we already have.
>>
>> So these two claims aren't accurate: "radical difference" and "syntax
>> sugar over a capability we have".
>
> Based on the discussion so far, those are the conclusions I've come to.
> Saying they're isn't accurate without providing anything further isn't
> likely to be helpful.

I'm trying to be clear, accurate and non-confrontational.

I appreciate those are your conclusions, which is why I need to be
clear that the proposal has been misunderstood on those stated points.

What else would you like me to add to be helpful? I will try...

I've spent weeks looking at other implementations and combing the
SQLStandard with a fine tooth comb to see what is possible and what is
not. My proposal shows a new way forwards and yet follows the standard
in horrible detail. It has taken me nearly 2 months to write the doc
page submitted here. It is not simply new syntax stretched over
existing capability. The full syntax of MERGE is quite powerful and
will take some months to make work, even with my proposal.

I'm not sure what else to say, but I am happy to answer specific
technical questions that have full context to allow a rational
discussion.

>> > Time changes
>> > many things, but I don't think anything's changed in this from the prior
>> > discussions about it.
>>
>> My proposal is new, that is what has changed.
>
> I'm happy to admit that I've missed something in the discussion, but
> from what I read the difference appeared to be primairly that you're
> proposing it, and doing so a couple years later.
>
>> At this stage, general opinions can be misleading.
>
> I'd certainly love to see a MERGE capability that meets the standard and
> works in much the same way from a user's perspective as the other RDBMS'
> which already implement it.

The standard explains how it should work, with the proviso that the
standard allows us to define concurrent behavior. Serge has explained
that he sees that my proposal covers the main use cases. I don't yet
see how to cover all, but I'm looking to do the most important use
cases first and other things later.

> From prior discussions with Peter on
> exactly that subject, I'm also convinced that having that would be a
> largely independent piece of work from the INSERT .. ON CONFLICT
> capability that we have (just as MERGE is distinct from similar UPSERT
> capabilities in other RDBMSs).
>
> The goal here is really just to avoid time wasted developing MERGE based
> on top of the INSERT .. ON CONFLICT system only to have it be rejected
> later because multiple other committers and major contributors have said
> that they don't agree with that approach.  If, given all of this
> discussion, you don't feel that's a high risk with your approach then by
> all means continue on with what you're thinking and we can review the
> patch once it's posted.

It is certainly a risk and I don't want to waste time either. I am
happy to consider any complete proposal for how to proceed, including
details and/or code, but I will be prioritising things to ensure that
we have a candidate feature for PG11 from at least one author, rather
than a research project for PG12+.

The key point here is concurrency. I have said that there is an
intermediate step that is useful and achievable, but does not cover
every case. Serge has also stated this - who it should be noted is the
only person in this discussion that has already written the MERGE
statement in SQL, for DB2, so I give more weight to his technical
opinion than I do to others, at this time.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-31 Thread Stephen Frost
Simon,

* Simon Riggs (si...@2ndquadrant.com) wrote:
> On 30 October 2017 at 19:55, Stephen Frost  wrote:
> > I don't think MERGE should be radically different from other database
> > systems and just syntax sugar over a capability we have.
> 
> I've proposed a SQL Standard compliant implementation that would do
> much more than be new syntax over what we already have.
> 
> So these two claims aren't accurate: "radical difference" and "syntax
> sugar over a capability we have".

Based on the discussion so far, those are the conclusions I've come to.
Saying they're isn't accurate without providing anything further isn't
likely to be helpful.

> > Time changes
> > many things, but I don't think anything's changed in this from the prior
> > discussions about it.
> 
> My proposal is new, that is what has changed.

I'm happy to admit that I've missed something in the discussion, but
from what I read the difference appeared to be primairly that you're
proposing it, and doing so a couple years later.

> At this stage, general opinions can be misleading.

I'd certainly love to see a MERGE capability that meets the standard and
works in much the same way from a user's perspective as the other RDBMS'
which already implement it.  From prior discussions with Peter on
exactly that subject, I'm also convinced that having that would be a
largely independent piece of work from the INSERT .. ON CONFLICT
capability that we have (just as MERGE is distinct from similar UPSERT
capabilities in other RDBMSs).

The goal here is really just to avoid time wasted developing MERGE based
on top of the INSERT .. ON CONFLICT system only to have it be rejected
later because multiple other committers and major contributors have said
that they don't agree with that approach.  If, given all of this
discussion, you don't feel that's a high risk with your approach then by
all means continue on with what you're thinking and we can review the
patch once it's posted.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-31 Thread Simon Riggs
On 30 October 2017 at 19:55, Stephen Frost  wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> On Sun, Oct 29, 2017 at 1:19 AM, Simon Riggs  wrote:
>> > Nothing I am proposing blocks later work.
>>
>> That's not really true.  Nobody's going to be happy if MERGE has one
>> behavior in one set of cases and an astonishingly different behavior
>> in another set of cases.  If you adopt a behavior for certain cases
>> that can't be extended to other cases, then you're blocking a
>> general-purpose MERGE.
>>
>> And, indeed, it seems that you're proposing an implementation that
>> adds no new functionality, just syntax compatibility.  Do we really
>> want or need two syntaxes  for the same thing in core?  I kinda think
>> Peter might have the right idea here.  Under his proposal, we'd be
>> getting something that is, in a way, new.
>
> +1.
>
> I don't think MERGE should be radically different from other database
> systems and just syntax sugar over a capability we have.

I've proposed a SQL Standard compliant implementation that would do
much more than be new syntax over what we already have.

So these two claims aren't accurate: "radical difference" and "syntax
sugar over a capability we have".

> Time changes
> many things, but I don't think anything's changed in this from the prior
> discussions about it.

My proposal is new, that is what has changed.

At this stage, general opinions can be misleading.

Hi ho, hi ho.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-31 Thread Geoff Winkless
Can I add my 2c worth, as someone without a horse in the race, as it
were, in the hope that telling me how I've got this wrong might
clarify the argument a bit (or at least you can all start shouting at
me rather than each other :) )

The point of merge is to allow you to choose to either INSERT or
UPDATE (or indeed DELETE) records based on existing state, yes? That
state is whatever the state of the system at the start of the
transaction?

If I understand correctly, the only time when this would be
problematic is if you try to insert a record into a table which would
not allow that INSERT because another transaction has performed an
INSERT by the time the COMMIT happens, and where that new record would
have changed the state of the MERGE clause, yes?

Isn't the only reason this would fail if there is a unique constraint
on that table?

Yes, you could end up INSERTing values from the merge when another
transaction has INSERTed another, but (again, unless I've
misunderstood) there's nothing in the spec that says that shouldn't
happen; meanwhile for those tables that do require unique values you
can use the UPSERT mechanism, no?

Geoff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-31 Thread Simon Riggs
On 30 October 2017 at 19:17, Peter Geoghegan  wrote:
> On Mon, Oct 30, 2017 at 11:07 AM, Simon Riggs  wrote:
>> Please explain in detail the MERGE SQL statements that you think will
>> be problematic and why.
>
> Your proposal is totally incomplete, so I can only surmise its
> behavior in certain cases, to make a guess at what the problems might
> be (see my remarks on EPQ, live lock, etc). This makes it impossible
> to do what you ask right now.

Impossible, huh. Henry Ford was right.

If there are challenges ahead, its reasonable to ask for test cases
for that now especially if you think you know what they already are.
Imagine we go forwards 2 months - if you dislike my patch when it
exists you will submit a test case showing the fault. Why not save us
all the trouble and describe that now? Test Driven Development.

> Besides, you haven't answered the question from my last e-mail
> ("What's wrong with that [set of MERGE semantics]?"), so why should I
> go to further trouble? You're just not constructively engaging with me
> at this point.

It's difficult to discuss anything with someone that refuses to
believe that there are acceptable ways around things. I believe there
are.

If you can calm down the rhetoric we can work together, but if you
continue to grandstand it makes it more difficult.


> We're going around in circles.

Not really. You've said some things and I'm waiting for further
details about the problems you've raised.

You've said its possible another way. Show that assertion is actually
true. We're all listening, me especially, for the technical details.

I've got a fair amount of work to do to get the rest of the patch in
shape, so take your time and make it a complete explanation.

My only goal is the MERGE feature in PG11. For me this is a
collaborative engineering challenge not a debate and definitely not an
argument. If you describe your plan of how to do this, I may be able
to follow it and include that design. If you don't, then it will be
difficult for me to include your thoughts. If you or others wish to
write something as well, I have already said that is OK too.

If anybody's goal is to block development or to wait for perfection to
exist at some unstated time in the future, than I disagree with those
thoughts.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-30 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> On Sun, Oct 29, 2017 at 1:19 AM, Simon Riggs  wrote:
> > Nothing I am proposing blocks later work.
> 
> That's not really true.  Nobody's going to be happy if MERGE has one
> behavior in one set of cases and an astonishingly different behavior
> in another set of cases.  If you adopt a behavior for certain cases
> that can't be extended to other cases, then you're blocking a
> general-purpose MERGE.
> 
> And, indeed, it seems that you're proposing an implementation that
> adds no new functionality, just syntax compatibility.  Do we really
> want or need two syntaxes  for the same thing in core?  I kinda think
> Peter might have the right idea here.  Under his proposal, we'd be
> getting something that is, in a way, new.

+1.

I don't think MERGE should be radically different from other database
systems and just syntax sugar over a capability we have.  The
downthread comparison to partitioning isn't accurate either.

There's a reason that we have INSERT .. ON CONFLICT and not MERGE and
it's because they aren't the same thing, as Peter's already explained,
both now and when he and I had exactly this same discussion years ago
when he was working on implementing INSERT .. ON CONFLICT.  Time changes
many things, but I don't think anything's changed in this from the prior
discussions about it.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-30 Thread Nico Williams
On Mon, Oct 30, 2017 at 10:59:43AM -0700, Peter Geoghegan wrote:
> On Mon, Oct 30, 2017 at 6:21 AM, Simon Riggs  wrote:
> > If a general purpose solution exists, please explain what it is.
> 
> For the umpteenth time, a general purpose solution is one that more or
> less works like an UPDATE FROM, with an outer join, whose ModifyTable
> node is capable of insert, update, or delete (and accepts quals for
> MATCHED and NOT matched cases, etc). You could still get duplicate
> violations due to concurrent activity in READ COMMITTED mode, but not
> at higher isolation levels thanks to Thomas Munro's work there. In
> this world, ON CONFLICT and MERGE are fairly distinct things.

FWIW, and as an outsider, having looked at MERGE docs from other
RDBMSes, I have to agree that the PG UPSERT (ON CONFLICT .. DO) and
MERGE are rather different beasts.

In particular, I suspect all UPSERT statements can be mapped onto
equivalent MERGE statements, but not all MERGE statements can be mapped
onto UPSERTs.

The reason is that UPSERT depends on UNIQUE constraints, whereas MERGE
uses a generic join condition that need not even refer to any INDEXes,
let alone UNIQUE ones.

Perhaps PG's UPSERT can be generalized to create a temporary UNIQUE
constraint on the  specified in the conflict_target portion of the
statement, increasing the number of MERGE statements that could be
mapped onto UPSERT.  But even then, that would still be a UNIQUE
constraint, whereas MERGE does not even imply such a thing.

Now, a subset of MERGE (those using equijoins in the ON condition) can
be mapped onto UPSERT provided either a suitable UNIQUE index exists (or
that PG notionally creates a temporary UNIQUE constraint for the purpose
of evaluating the UPSERT).  This approach would NOT preclude a more
complete subsequent implementation of MERGE.  But I wonder if that's
worthwhile given that a proper and complete implementation of MERGE is
probably very desirable.

On a tangentially related note, I've long wanted to have an RDBMS-
independent SQL parser for the purpose of implementing external query-
rewriting (and external optimizers), syntax highlighting, and so on.
Having an external / plug-in method for rewriting unsupported SQL as a
way of bridging functionality gaps (like lack of MERGE support) would be
very nice.  PG does have a way to expose its AST...  It might be a good
idea to start by implementing unsupported SQL features in such a way
that they parse and can produce an AST along with a syntax/unsupported
error -- then one might rewrite the parsed AST, generate appropriate
SQL, and execute that.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-30 Thread Peter Geoghegan
On Mon, Oct 30, 2017 at 11:07 AM, Simon Riggs  wrote:
> Please explain in detail the MERGE SQL statements that you think will
> be problematic and why.

Your proposal is totally incomplete, so I can only surmise its
behavior in certain cases, to make a guess at what the problems might
be (see my remarks on EPQ, live lock, etc). This makes it impossible
to do what you ask right now.

Besides, you haven't answered the question from my last e-mail
("What's wrong with that [set of MERGE semantics]?"), so why should I
go to further trouble? You're just not constructively engaging with me
at this point. We're going around in circles.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-30 Thread Simon Riggs
On 30 October 2017 at 18:59, Peter Geoghegan  wrote:

> It is most emphatically *not* just a "small matter of programming".

Please explain in detail the MERGE SQL statements that you think will
be problematic and why.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-30 Thread Peter Geoghegan
On Mon, Oct 30, 2017 at 6:21 AM, Simon Riggs  wrote:
>> That's not really true.  Nobody's going to be happy if MERGE has one
>> behavior in one set of cases and an astonishingly different behavior
>> in another set of cases.  If you adopt a behavior for certain cases
>> that can't be extended to other cases, then you're blocking a
>> general-purpose MERGE.
>
> If a general purpose solution exists, please explain what it is.

For the umpteenth time, a general purpose solution is one that more or
less works like an UPDATE FROM, with an outer join, whose ModifyTable
node is capable of insert, update, or delete (and accepts quals for
MATCHED and NOT matched cases, etc). You could still get duplicate
violations due to concurrent activity in READ COMMITTED mode, but not
at higher isolation levels thanks to Thomas Munro's work there. In
this world, ON CONFLICT and MERGE are fairly distinct things.

What's wrong with that? You haven't actually told us why you don't like that.

> The problem is that nobody has ever done so, so its not like we are
> discussing the difference between bad solution X and good solution Y,
> we are comparing reasonable solution X with non-existent solution Y.

Nobody knows what your proposal would be like when time came to remove
the restrictions that you suggest could be removed later. You're the
one with the information here. We need to know what those semantics
would be up-front, since you're effectively committing us down that
path.

You keep making vague appeals to pragmatism, but, in all sincerity, I
don't understand where you're coming from at all. I strongly believe
that generalizing from ON CONFLICT doesn't even make the
implementation easier in the short term. ISTM that you're making this
difficult for yourself for reasons that are known only to you.

>> And, indeed, it seems that you're proposing an implementation that
>> adds no new functionality, just syntax compatibility.  Do we really
>> want or need two syntaxes  for the same thing in core?  I kinda think
>> Peter might have the right idea here.  Under his proposal, we'd be
>> getting something that is, in a way, new.
>
> Partitioning looked like "just new syntax", but it has been a useful
> new feature.

False equivalency. Nobody, including you, ever argued that that work
risked painting us into a corner. (IIRC you said something like the
progress was too small to justify putting into a single release.)

> MERGE provides new capabilities that we do not have and is much more
> powerful than INSERT/UPDATE, in a syntax that follow what other
> databases use today. Just like partitioning.

But you haven't told us *how* it is more powerful. Again, the
semantics of a MERGE that is a generalization of ON CONFLICT are not
at all obvious, and seem like they might be very surprising and risky.
There is no question that it's your job to (at a minimum) define those
semantics ahead of time, since you're going to commit us to them in
the long term if you continue down this path. It is most emphatically
*not* just a "small matter of programming".

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-30 Thread Simon Riggs
On 30 October 2017 at 09:44, Robert Haas  wrote:
> On Sun, Oct 29, 2017 at 1:19 AM, Simon Riggs  wrote:
>> Nothing I am proposing blocks later work.
>
> That's not really true.  Nobody's going to be happy if MERGE has one
> behavior in one set of cases and an astonishingly different behavior
> in another set of cases.  If you adopt a behavior for certain cases
> that can't be extended to other cases, then you're blocking a
> general-purpose MERGE.

If a general purpose solution exists, please explain what it is. The
problem is that nobody has ever done so, so its not like we are
discussing the difference between bad solution X and good solution Y,
we are comparing reasonable solution X with non-existent solution Y.

> And, indeed, it seems that you're proposing an implementation that
> adds no new functionality, just syntax compatibility.  Do we really
> want or need two syntaxes  for the same thing in core?  I kinda think
> Peter might have the right idea here.  Under his proposal, we'd be
> getting something that is, in a way, new.

Partitioning looked like "just new syntax", but it has been a useful
new feature.

MERGE provides new capabilities that we do not have and is much more
powerful than INSERT/UPDATE, in a syntax that follow what other
databases use today. Just like partitioning.

Will what I propose do everything in the first release? No, just like
partitioning.

If other developers are able to do things in phases, then I claim that
right also.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-30 Thread Simon Riggs
On 29 October 2017 at 21:25, Peter Geoghegan  wrote:

> The semantics that I suggest (the SQL standard's semantics) will
> require less code, and will be far simpler. Right now, I simply don't
> understand why you're insisting on using ON CONFLICT without even
> saying why. I can only surmise that you think that doing so will
> simplify the implementation, but I can guarantee you that it won't.

If you see problems in my proposal, please show the specific MERGE SQL
statements that you think will give problems and explain how and what
the failures will be.

We can then use those test cases to drive developments. If we end up
with code for multiple approaches we will be able to evaluate the
differences between proposals using the test cases produced.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-30 Thread Robert Haas
On Sun, Oct 29, 2017 at 1:19 AM, Simon Riggs  wrote:
> Nothing I am proposing blocks later work.

That's not really true.  Nobody's going to be happy if MERGE has one
behavior in one set of cases and an astonishingly different behavior
in another set of cases.  If you adopt a behavior for certain cases
that can't be extended to other cases, then you're blocking a
general-purpose MERGE.

And, indeed, it seems that you're proposing an implementation that
adds no new functionality, just syntax compatibility.  Do we really
want or need two syntaxes  for the same thing in core?  I kinda think
Peter might have the right idea here.  Under his proposal, we'd be
getting something that is, in a way, new.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-29 Thread Peter Geoghegan
On Sun, Oct 29, 2017 at 4:48 AM, Simon Riggs  wrote:
> I have no objection to you writing a better version than me and if my
> work inspires you to complete that in a reasonable timescale then we
> all win.

My whole point is that the way that you seem determined to go on this
is a dead end. I don't think that *anyone* can go improve on what you
come up with if that's based heavily on ON CONFLICT, for the simple
reason that the final user visible design is totally unclear. There is
an easy way to make me shut up - come up with a design for MERGE that
more or less builds on how UPDATE FROM works, rather than building
MERGE on ON CONFLICT. (You might base things like RLS handling on ON
CONFLICT, but in the main MERGE should be like an UPDATE FROM with an
outer join, that can do INSERTs and DELETEs, too.)

The original effort to add MERGE didn't do anything upsert-like, which
Heikki (the GSOC mentor of the project) was perfectly comfortable
with. I'm too lazy to go search the archives right now, but it's
there. Heikki cites the SQL standard.

This is what MERGE *actually is*, which you can clearly see from the
Oracle/SQL Server/DB2 docs. It says this in the first paragraph of
their MERGE documentation. It's crystal clear from their docs -- "This
statement is a convenient way to combine multiple operations. It lets
you avoid multiple INSERT, UPDATE, and DELETE DML statements."

> I'm also very happy to work together on writing the version
> described - you have already done much work in this area.

You seem to want to preserve the ON CONFLICT guarantees at great cost.
But you haven't even defended that based on a high level goal, or a
use case, or something that makes sense to users (describing how it is
possible is another matter). You haven't even tried to convince me.

> I don't see any major problems in points you've raised so far, though
> obviously there is much detail.

Did you even read them? They are not mere details. They're fundamental
to the semantics of the feature (if you base it on ON CONFLICT). It's
not actually important that you understand them all; the important
message is that generalizing ON CONFLICT has all kinds of terrible
problems.

> All of this needs to be written and then committed, so I'll get on and
> write my proposal. We can then see whether that is an 80% solution or
> something less. There are more obvious barriers to completion at this
> point, like time and getting on with it.

Getting on with *what*, exactly?

In general, I have nothing against an 80% solution, or even a 50%
solution, provided there is a plausible path to a 100% solution. I
don't think that you have such a path, but only because you're tacitly
inserting requirements that no other MERGE implementation has to live
with, that I doubt any implementation *could* live with. Again, I'm
not the one making this complicated, or adding requirements that will
be difficult for you to get in to your v1 -- you're the one doing
that.

The semantics that I suggest (the SQL standard's semantics) will
require less code, and will be far simpler. Right now, I simply don't
understand why you're insisting on using ON CONFLICT without even
saying why. I can only surmise that you think that doing so will
simplify the implementation, but I can guarantee you that it won't.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-29 Thread Simon Riggs
On 28 October 2017 at 22:04, Peter Geoghegan  wrote:
> On Sat, Oct 28, 2017 at 12:49 PM, Simon Riggs  wrote:
>> Nothing I am proposing blocks later work.
>
> Actually, many things will block future work if you go down that road.
> You didn't respond to the specific points I raised, but that doesn't
> mean that they're not real.
>
>> Everything you say makes it clear that a fully generalized solution is
>> going to be many years in the making, assuming we agree.
>
> I think that it's formally impossible as long as you preserve the ON
> CONFLICT guarantees, unless you somehow define the problems out of
> existence. Those are guarantees which no other MERGE implementation
> has ever made, and which the SQL standard says nothing about. And for
> good reasons.
>
>> "The extent to which an SQL-implementation may disallow independent
>> changes that are not significant is implementation-defined”.
>>
>> So we get to choose. I recommend that we choose something practical.
>> We're approaching the 10 year anniversary of my first serious attempt
>> to do MERGE. I say that its time to move forwards with useful
>> solutions, rather than wait another 10 years for the perfect one, even
>> assuming it exists.
>
> As far as I'm concerned, you're the one arguing for an unobtainable
> solution over a good one, not me. I *don't* think you should solve the
> problems that I raise -- you should instead implement MERGE without
> any of the ON CONFLICT guarantees, just like everyone else has.
> Building MERGE on top of the ON CONFLICT guarantees, and ultimately
> arriving at something that is comparable to other implementations over
> many releases might be okay if anyone had the slightest idea of what
> that would look like. You haven't even _described the semantics_,
> which you could do by addressing the specific points that I raised.

I have no objection to you writing a better version than me and if my
work inspires you to complete that in a reasonable timescale then we
all win. I'm also very happy to work together on writing the version
described - you have already done much work in this area.

I don't see any major problems in points you've raised so far, though
obviously there is much detail.

All of this needs to be written and then committed, so I'll get on and
write my proposal. We can then see whether that is an 80% solution or
something less. There are more obvious barriers to completion at this
point, like time and getting on with it.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-28 Thread Peter Geoghegan
On Sat, Oct 28, 2017 at 12:49 PM, Simon Riggs  wrote:
> Nothing I am proposing blocks later work.

Actually, many things will block future work if you go down that road.
You didn't respond to the specific points I raised, but that doesn't
mean that they're not real.

> Everything you say makes it clear that a fully generalized solution is
> going to be many years in the making, assuming we agree.

I think that it's formally impossible as long as you preserve the ON
CONFLICT guarantees, unless you somehow define the problems out of
existence. Those are guarantees which no other MERGE implementation
has ever made, and which the SQL standard says nothing about. And for
good reasons.

> "The extent to which an SQL-implementation may disallow independent
> changes that are not significant is implementation-defined”.
>
> So we get to choose. I recommend that we choose something practical.
> We're approaching the 10 year anniversary of my first serious attempt
> to do MERGE. I say that its time to move forwards with useful
> solutions, rather than wait another 10 years for the perfect one, even
> assuming it exists.

As far as I'm concerned, you're the one arguing for an unobtainable
solution over a good one, not me. I *don't* think you should solve the
problems that I raise -- you should instead implement MERGE without
any of the ON CONFLICT guarantees, just like everyone else has.
Building MERGE on top of the ON CONFLICT guarantees, and ultimately
arriving at something that is comparable to other implementations over
many releases might be okay if anyone had the slightest idea of what
that would look like. You haven't even _described the semantics_,
which you could do by addressing the specific points that I raised.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-28 Thread Peter Geoghegan
On Fri, Oct 27, 2017 at 3:00 PM, Serge Rielau  wrote:
>> What other systems *do* have this restriction? I've never seen one that did.
>
> Not clear what you are leading up to here.
> When I did MERGE in DB2 there was also no limitation:
> "Each row in the target can only be operated on once. A row in the target can 
> only be identified as MATCHED with one row in the result table of the 
> table-reference”
> What there was however was a significant amount of code I had to write and 
> test to enforce the above second sentence.

Then it seems that we were talking about two different things all along.

> Maybe in PG there is a trivial way to detect an expanding join and block it 
> at runtime.

There is for ON CONFLICT. See the cardinality violation logic within
ExecOnConflictUpdate(). (There are esoteric cases where this error can
be raised due to a wCTE that does an insert "from afar", which is
theoretically undesirable but not actually a problem.)

The MERGE implementation that I have in mind would probably do almost
the same thing, and make the "HeapTupleSelfUpdated" case within
ExecUpdate() raise an error when the caller happened to be a MERGE,
rather than following the historic UPDATE behavior. (The behavior is
to silently suppress a second or subsequent UPDATE attempt from the
same command, a behavior that Simon's mock MERGE documentation
references.)

> So the whole point I’m trying to make is that I haven’t seen the need for the 
> extra work I had to do once the feature appeared in the wild.

That seems pretty reasonable to me.

My whole point is that I think it's a mistake to do things like lock
rows ahead of evaluating any UPDATE predicate, in the style of ON
CONFLICT, in order to replicate the ON CONFLICT guarantees [1].

I'm arguing for implementation simplicity, too. Trying to implement
MERGE in a way that extends ON CONFLICT seems like a big mistake to
me, because ON CONFLICT updates rows on the basis of a would-be
duplicate violation, along with all the baggage that that carries.
This is actually enormously different to an equi-join that is fed by a
scan using an MVCC snapshot. The main difference is that there
actually is no MVCC snapshot in play in most cases [2]. If *no* row
with the PK value of 5 is visible to our MVCC snapshot, but an xact
committed having inserted such a row, that still counts as a CONFLICT
with READ COMMITTED.

[1] https://wiki.postgresql.org/wiki/UPSERT#Goals_for_implementation
[2] 
https://www.postgresql.org/docs/devel/static/transaction-iso.html#xact-read-committed
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-28 Thread Simon Riggs
On 28 October 2017 at 20:39, Peter Geoghegan  wrote:
> On Sat, Oct 28, 2017 at 3:10 AM, Simon Riggs  wrote:
>> SQL:2011 specifically states "The extent to which an
>> SQL-implementation may disallow independent changes that are not
>> significant is implementation-defined”, so in my reading the above
>> behaviour would make us fully spec compliant. Thank you to Peter for
>> providing the infrastructure on which this is now possible for PG11.
>>
>> Serge puts this very nicely by identifying two different use cases for MERGE.
>
> MERGE benefits from having a join that is more or less implemented in
> the same way as any other join. It can be a merge join, hash join, or
> nestloop join. ON CONFLICT doesn't work using a join.
>
> Should I to take it that you won't be supporting any of these
> alternative join algorithms? If not, then you'll have something that
> really isn't comparable to MERGE as implemented in Oracle, SQL Server,
> or DB2. They *all* do this.
>
> Would the user be able to omit WHEN NOT MATCHED/INSERT, as is the case
> with every existing MERGE implementation? If so, what actually happens
> under the hood when WHEN NOT MATCHED is omitted? For example, would
> you actually use a regular "UPDATE FROM" style join, as opposed to the
> ON CONFLICT infrastructure? And, if that is so, can you justify the
> semantic difference for rows that are updated in each scenario
> (omitted vs. not omitted) in READ COMMITTED mode? Note that this could
> be the difference between updating a row when *no* version is visible
> to our MVCC snapshot, as opposed to doing the EPQ stuff and updating
> the latest row version if possible. That's a huge, surprising
> difference. On top of all this, you risk live-lock if INSERT isn't a
> possible outcome (this is also why ON CONFLICT can never accept a
> predicate on its INSERT portion -- again, quite unlike MERGE).
>
> Why not just follow what other systems do? It's actually easier to go
> that way, and you get a better outcome. ON CONFLICT involves what you
> could call a sleight of hand, and I fear that you don't appreciate
> just how specialized the internal infrastructure is.
>
>> Now, I accept that you might also want a MERGE statement that
>> continues to work even if there is no unique constraint, but it would
>> need to have different properties to the above. I do not in any way
>> argue against adding that.
>
> Maybe you *should* be arguing against it, though, and arguing against
> ever supporting anything but equijoins, because these things will
> *become* impossible if you go down that road. By starting with the ON
> CONFLICT infrastructure, while framing no-unique-index-support as work
> for some unspecified future release, you're leaving it up to someone
> else to resolve the problems. Someone else must square the circle of
> mixing ON CONFLICT semantics with fully generalized MERGE semantics.
> But who?

Nothing I am proposing blocks later work.

Everything you say makes it clear that a fully generalized solution is
going to be many years in the making, assuming we agree.

"The extent to which an SQL-implementation may disallow independent
changes that are not significant is implementation-defined”.

So we get to choose. I recommend that we choose something practical.
We're approaching the 10 year anniversary of my first serious attempt
to do MERGE. I say that its time to move forwards with useful
solutions, rather than wait another 10 years for the perfect one, even
assuming it exists.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-28 Thread Peter Geoghegan
On Sat, Oct 28, 2017 at 3:10 AM, Simon Riggs  wrote:
> SQL:2011 specifically states "The extent to which an
> SQL-implementation may disallow independent changes that are not
> significant is implementation-defined”, so in my reading the above
> behaviour would make us fully spec compliant. Thank you to Peter for
> providing the infrastructure on which this is now possible for PG11.
>
> Serge puts this very nicely by identifying two different use cases for MERGE.

MERGE benefits from having a join that is more or less implemented in
the same way as any other join. It can be a merge join, hash join, or
nestloop join. ON CONFLICT doesn't work using a join.

Should I to take it that you won't be supporting any of these
alternative join algorithms? If not, then you'll have something that
really isn't comparable to MERGE as implemented in Oracle, SQL Server,
or DB2. They *all* do this.

Would the user be able to omit WHEN NOT MATCHED/INSERT, as is the case
with every existing MERGE implementation? If so, what actually happens
under the hood when WHEN NOT MATCHED is omitted? For example, would
you actually use a regular "UPDATE FROM" style join, as opposed to the
ON CONFLICT infrastructure? And, if that is so, can you justify the
semantic difference for rows that are updated in each scenario
(omitted vs. not omitted) in READ COMMITTED mode? Note that this could
be the difference between updating a row when *no* version is visible
to our MVCC snapshot, as opposed to doing the EPQ stuff and updating
the latest row version if possible. That's a huge, surprising
difference. On top of all this, you risk live-lock if INSERT isn't a
possible outcome (this is also why ON CONFLICT can never accept a
predicate on its INSERT portion -- again, quite unlike MERGE).

Why not just follow what other systems do? It's actually easier to go
that way, and you get a better outcome. ON CONFLICT involves what you
could call a sleight of hand, and I fear that you don't appreciate
just how specialized the internal infrastructure is.

> Now, I accept that you might also want a MERGE statement that
> continues to work even if there is no unique constraint, but it would
> need to have different properties to the above. I do not in any way
> argue against adding that.

Maybe you *should* be arguing against it, though, and arguing against
ever supporting anything but equijoins, because these things will
*become* impossible if you go down that road. By starting with the ON
CONFLICT infrastructure, while framing no-unique-index-support as work
for some unspecified future release, you're leaving it up to someone
else to resolve the problems. Someone else must square the circle of
mixing ON CONFLICT semantics with fully generalized MERGE semantics.
But who?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-28 Thread Simon Riggs
On 28 October 2017 at 00:31, Michael Paquier  wrote:
> On Fri, Oct 27, 2017 at 9:00 AM, Robert Haas  wrote:
>> On Fri, Oct 27, 2017 at 4:41 PM, Simon Riggs  wrote:
>>> I didn't say it but my intention was to just throw an ERROR if no
>>> single unique index can be identified.
>>>
>>> It could be possible to still run MERGE in that situaton but we would
>>> need to take a full table lock at ShareRowExclusive. It's quite likely
>>> that such statements would throw duplicate update errors, so I
>>> wouldn't be aiming to do anything with that for PG11.
>>
>> Like Peter, I think taking such a strong lock for a DML statement
>> doesn't sound like a very desirable way forward.  It means, for
>> example, that you can only have one MERGE in progress on a table at
>> the same time, which is quite limiting.  It could easily be the case
>> that you have multiple MERGE statements running at once but they touch
>> disjoint groups of rows and therefore everything works.  I think the
>> code should be able to cope with concurrent changes, if nothing else
>> by throwing an ERROR, and then if the user wants to ensure that
>> doesn't happen by taking ShareRowExclusiveLock they can do that via an
>> explicit LOCK TABLE statement -- or else they can prevent concurrency
>> by any other means they see fit.
>
> +1, I would suspect users to run this query in parallel of the same
> table for multiple data sets.
>
> Peter has taken some time to explain me a bit his arguments today, and
> I agree that it does not sound much appealing to have constraint
> limitations for MERGE. Particularly using the existing ON CONFLICT
> structure gets the feeling of having twice a grammar for what's
> basically the same feature, with pretty much the same restrictions.
>
> By the way, this page sums up nicely the situation about many
> implementations of UPSERT taken for all systems:
> https://en.wikipedia.org/wiki/Merge_(SQL)

That Wikipedia article is badly out of date and regrettably does NOT
sum up the current situation nicely any more since MERGE has changed
in definition in SQL:2011 since its introduction in SQL:2003.

I'm proposing a MERGE statement for PG11 that
i) takes a RowExclusiveLock on rows, so can be run concurrently
ii) uses the ON CONFLICT infrastructure to do that
and so requires a unique constraint.

The above is useful behaviour that will be of great benefit to
PostgreSQL users. There are no anomalies remaining.

SQL:2011 specifically states "The extent to which an
SQL-implementation may disallow independent changes that are not
significant is implementation-defined”, so in my reading the above
behaviour would make us fully spec compliant. Thank you to Peter for
providing the infrastructure on which this is now possible for PG11.

Serge puts this very nicely by identifying two different use cases for MERGE.

Now, I accept that you might also want a MERGE statement that
continues to work even if there is no unique constraint, but it would
need to have different properties to the above. I do not in any way
argue against adding that. I also agree that adding RETURNING at a
later stage would be fine as well. I am proposing that those and any
other additional properties people come up with can be added in later
releases once we have the main functionality in core in PG11.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-27 Thread Nico Williams
On Fri, Oct 27, 2017 at 02:13:27PM -0700, srielau wrote:
> While the standard may not require a unique index for the ON clause I have
> never seen a MERGE statement that did not have this property.  So IMHO this
> is a reasonable restrictions.

I don't understand how one could have a conflict upon which to turn
INSERT into UPDATE without having a UNIQUE constraint violated...

The only question is whether one should have control over -or have to
specify- which constraint violations lead to UPDATE vs. which ones lead
to failure vs. which ones lead to doing nothing.

The row to update is the one that the to-be-inserted row conflicted with
-- there can only have been one if the constraint violated was a PRIMARY
KEY constraint, or if there is a PRIMARY KEY at all, but if there's no
PRIMARY KEY, then there can have been more conflicting rows because of
NULL columns in the to-be-inserted row.  If the to-be-inserted row
conflicts with multiple rows, then just fail, or don't allow MERGE on
tables that have no PK (as you know, many think it makes no sense to not
have a PK on a table in SQL).

In the common case one does not care about which UNIQUE constraint is
violated because there's only one that could have been violated, or
because if the UPDATE should itself cause some other UNIQUE constraint
to be violated, then the whole statement should fail.

PG's UPSERT is fantastic -- it allows very fine-grained control, but it
isn't as pithy as it could be when the author doesn't care to specify
all that detail.

Also, something like SQLite3's INSERT OR REPLACE is very convenient:
pithy, INSERT syntax, upsert-like semantics[*].

I'd like to have this in PG:

  INSERT INTO ..
  ON CONFLICT DO UPDATE;  -- I.e., update all columns of the existing
  -- row to match the ones from the row that
  -- would have been inserted had there not been
  -- a conflict.
  --
  -- If an INSERTed row conflicts and then the
  -- UPDATE it devolves to also conflicts, then
  -- fail.

and

  INSERT INTO ..
  ON CONFLICT DO UPDATE   -- I.e., update all columns of the existing
  -- row to match the ones from the row that
  -- would have been inserted had there not been
  -- a conflict.
  --
  ON CONFLICT DO NOTHING; -- If an INSERTed row conflicts and then the
  -- UPDATE it devolves to also conflicts, then
  -- DO NOTHING.


[*] SQLite3's INSERT OR REPLACE is NOT an insert-or-update, but an
insert-or-delete-and-insert, and any deletions that occur in the
process do fire triggers.  INSERT OR UPDATE would be much more
useful.


Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-27 Thread Michael Paquier
On Fri, Oct 27, 2017 at 9:00 AM, Robert Haas  wrote:
> On Fri, Oct 27, 2017 at 4:41 PM, Simon Riggs  wrote:
>> I didn't say it but my intention was to just throw an ERROR if no
>> single unique index can be identified.
>>
>> It could be possible to still run MERGE in that situaton but we would
>> need to take a full table lock at ShareRowExclusive. It's quite likely
>> that such statements would throw duplicate update errors, so I
>> wouldn't be aiming to do anything with that for PG11.
>
> Like Peter, I think taking such a strong lock for a DML statement
> doesn't sound like a very desirable way forward.  It means, for
> example, that you can only have one MERGE in progress on a table at
> the same time, which is quite limiting.  It could easily be the case
> that you have multiple MERGE statements running at once but they touch
> disjoint groups of rows and therefore everything works.  I think the
> code should be able to cope with concurrent changes, if nothing else
> by throwing an ERROR, and then if the user wants to ensure that
> doesn't happen by taking ShareRowExclusiveLock they can do that via an
> explicit LOCK TABLE statement -- or else they can prevent concurrency
> by any other means they see fit.

+1, I would suspect users to run this query in parallel of the same
table for multiple data sets.

Peter has taken some time to explain me a bit his arguments today, and
I agree that it does not sound much appealing to have constraint
limitations for MERGE. Particularly using the existing ON CONFLICT
structure gets the feeling of having twice a grammar for what's
basically the same feature, with pretty much the same restrictions.

By the way, this page sums up nicely the situation about many
implementations of UPSERT taken for all systems:
https://en.wikipedia.org/wiki/Merge_(SQL)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-27 Thread Serge Rielau
via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=dx=9.8.79=10.12.6=email_footer_2]
On Fri, Oct 27, 2017 at 2:42 PM, Peter Geoghegan  wrote:
On Fri, Oct 27, 2017 at 2:13 PM, srielau  wrote:
> While the standard may not require a unique index for the ON clause I have
> never seen a MERGE statement that did not have this property. So IMHO this
> is a reasonable restrictions.

The Oracle docs on MERGE say nothing about unique indexes or
constraints. They don't even mention them in passing. They do say
"This statement is a convenient way to combine multiple operations. It
lets you avoid multiple INSERT, UPDATE, and DELETE DML statements."

SQL Server's MERGE docs do mention unique indexes, but only in
passing, saying something about unique violations, and that unique
violations *cannot* be suppressed in MERGE, even though that's
possible with other DML statements (with something called
IGNORE_DUP_KEY).

What other systems *do* have this restriction? I've never seen one that did. 
Not clear what you are leading up to here. When I did MERGE in DB2 there was 
also no limitation: " Each row in the target can only be operated on once. A 
row in the target can only be identified as MATCHED with one row in the result 
table of the table-reference” What there was however was a significant amount 
of code I had to write and test to enforce the above second sentence. IIRC it 
involved, in the absence of a proof that the join could not expand, adding a 
row_number() over() AS rn over the target leg of the join and then a 
row_number() over(partition by rn) > 1 THEN RAISE_ERROR() to catch violators. 
Maybe in PG there is a trivial way to detect an expanding join and block it at 
runtime.
So the whole point I’m trying to make is that I haven’t seen the need for the 
extra work I had to do once the feature appeared in the wild.
Cheers Serge

Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-27 Thread Peter Geoghegan
On Fri, Oct 27, 2017 at 2:13 PM, srielau  wrote:
> While the standard may not require a unique index for the ON clause I have
> never seen a MERGE statement that did not have this property.  So IMHO this
> is a reasonable restrictions.

The Oracle docs on MERGE say nothing about unique indexes or
constraints. They don't even mention them in passing. They do say
"This statement is a convenient way to combine multiple operations. It
lets you avoid multiple INSERT, UPDATE, and DELETE DML statements."

SQL Server's MERGE docs do mention unique indexes, but only in
passing, saying something about unique violations, and that unique
violations *cannot* be suppressed in MERGE, even though that's
possible with other DML statements (with something called
IGNORE_DUP_KEY).

What other systems *do* have this restriction? I've never seen one that did.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-27 Thread srielau
Simon,

Nice writeup.

While the standard may not require a unique index for the ON clause I have
never seen a MERGE statement that did not have this property.  So IMHO this
is a reasonable restrictions.

In fact I have only ever seen two flavors of usage:
* Single row source (most often simply a VALUES clause) in OLTP
  In that case there was lots of concurrency
* Massive source which affects a significant portion of the target table in
DW.
 In this case there were no concurrent MERGEs

I believe support for returning rows at a later stage would prove to be very
powerful, especially in combination with chaining MERGE statements in CTEs.
To do that would require language extensions to pass the coloring of the
source row through, especially for rows that fell into "do nothing".




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-27 Thread Robert Haas
On Fri, Oct 27, 2017 at 4:41 PM, Simon Riggs  wrote:
> I didn't say it but my intention was to just throw an ERROR if no
> single unique index can be identified.
>
> It could be possible to still run MERGE in that situaton but we would
> need to take a full table lock at ShareRowExclusive. It's quite likely
> that such statements would throw duplicate update errors, so I
> wouldn't be aiming to do anything with that for PG11.

Like Peter, I think taking such a strong lock for a DML statement
doesn't sound like a very desirable way forward.  It means, for
example, that you can only have one MERGE in progress on a table at
the same time, which is quite limiting.  It could easily be the case
that you have multiple MERGE statements running at once but they touch
disjoint groups of rows and therefore everything works.  I think the
code should be able to cope with concurrent changes, if nothing else
by throwing an ERROR, and then if the user wants to ensure that
doesn't happen by taking ShareRowExclusiveLock they can do that via an
explicit LOCK TABLE statement -- or else they can prevent concurrency
by any other means they see fit.

Other problems with taking ShareRowExclusiveLock include (1) probable
lock upgrade hazards and (2) do you really want MERGE to kick
autovacuum off of your giant table?  Probably not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-27 Thread Peter Geoghegan
On Fri, Oct 27, 2017 at 6:24 AM, Robert Haas  wrote:
> I think one of the reasons why Peter Geoghegan decided to pursue
> INSERT .. ON CONFLICT UPDATE was that, because it is non-standard SQL
> syntax, he felt free to mandate a non-standard SQL requirement, namely
> the presence of a unique index on the arbiter columns.

That's true, but what I was really insistent on, more than anything
else, was that the user would get a practical guarantee about an
insert-or-update outcome under concurrency. There could be no
"unprincipled deadlocks", nor could there be spurious unique
violations.

This is the kind of thing that the SQL standard doesn't really concern
itself with, and yet it's of significant practical importance to
users. Both Oracle and SQL Server allow these things that I
specifically set out to avoid. I think that that's mostly a good
thing, though; they do a really bad job of explaining what's what, and
don't provide for a very real need ("upsert") in some other way, but
their MERGE semantics do make sense to me.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-27 Thread Peter Geoghegan
On Fri, Oct 27, 2017 at 7:41 AM, Simon Riggs  wrote:
> Good points.
>
> I didn't say it but my intention was to just throw an ERROR if no
> single unique index can be identified.

You'd also throw an error when there was no "upsert compatible" join
quals, I take it?

I don't see the point in that. That's just mapping one syntax on to another.

> It could be possible to still run MERGE in that situaton but we would
> need to take a full table lock at ShareRowExclusive. It's quite likely
> that such statements would throw duplicate update errors, so I
> wouldn't be aiming to do anything with that for PG11.

I would avoid mixing up ON CONFLICT DO UPDATE and MERGE. The
"anomalies" you describe in MERGE are not really anomalies IMV.
They're simply how the feature is supposed to operate, and how it's
possible to make MERGE use alternative join algorithms based only on
the underlying cost. You might use a merge join for a bulk load
use-case, for example.

I think an SQL MERGE feature would be compelling, but I don't think
that it should take much from ON CONFLICT. As I've said many times
[1], they're really two different features (Teradata had features
similar to each, for example). I suggest that you come up with
something that has the semantics that the standard requires, and
therefore makes none of the ON CONFLICT guarantees about an outcome
under concurrency (INSERT or UPDATE). Those guarantees are basically
incompatible with how MERGE needs to work.

In case it matters, I think that the idea of varying relation
heavyweight lock strength based on subtle semantics within a DML
statement is a bad one. Frankly, I think that that's going to be a
nonstarter.

[1] 
https://www.postgresql.org/message-id/CAM3SWZRP0c3g6+aJ=yydgyactzg0xa8-1_fcvo5xm7hrel3...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-27 Thread Simon Riggs
On 27 October 2017 at 15:24, Robert Haas  wrote:
> On Fri, Oct 27, 2017 at 10:55 AM, Simon Riggs  wrote:
>> Questions?
>
> I think one of the reasons why Peter Geoghegan decided to pursue
> INSERT .. ON CONFLICT UPDATE was that, because it is non-standard SQL
> syntax, he felt free to mandate a non-standard SQL requirement, namely
> the presence of a unique index on the arbiter columns.  If MERGE's
> join clause happens to involve equality conditions on precisely the
> same set of columns as some unique index on the target table, then I
> think you can reuse the INSERT .. ON CONFLICT UPDATE infrastructure
> and I suspect there won't be too many problems.

Agreed

> However, if it
> doesn't, then what?  You could decree that such cases will fail, but
> that might not meet your use case for developing the feature.  Or you
> could try to soldier on without the INSERT .. ON CONFLICT UPDATE
> machinery, but that means, I think, that sometimes you will get
> serialization anomalies - at least, I think, you will sometimes obtain
> results that couldn't have been obtained under any serial order of
> execution, and maybe it would end up being possible to fail with
> serialization errors or unique index violations.
>
> In the past, there have been objections to implementations of MERGE
> which would give rise to such serialization anomalies, but I'm not
> sure we should feel bound by those discussions.  One thing that's
> different is that the common and actually-useful case can now be made
> to work in a fairly satisfying way using INSERT .. ON CONFLICT UPDATE;
> if less useful cases are vulnerable to some weirdness, maybe it's OK
> to just document the problems.

Good points.

I didn't say it but my intention was to just throw an ERROR if no
single unique index can be identified.

It could be possible to still run MERGE in that situaton but we would
need to take a full table lock at ShareRowExclusive. It's quite likely
that such statements would throw duplicate update errors, so I
wouldn't be aiming to do anything with that for PG11.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-27 Thread Robert Haas
On Fri, Oct 27, 2017 at 10:55 AM, Simon Riggs  wrote:
> Questions?

I think one of the reasons why Peter Geoghegan decided to pursue
INSERT .. ON CONFLICT UPDATE was that, because it is non-standard SQL
syntax, he felt free to mandate a non-standard SQL requirement, namely
the presence of a unique index on the arbiter columns.  If MERGE's
join clause happens to involve equality conditions on precisely the
same set of columns as some unique index on the target table, then I
think you can reuse the INSERT .. ON CONFLICT UPDATE infrastructure
and I suspect there won't be too many problems.  However, if it
doesn't, then what?  You could decree that such cases will fail, but
that might not meet your use case for developing the feature.  Or you
could try to soldier on without the INSERT .. ON CONFLICT UPDATE
machinery, but that means, I think, that sometimes you will get
serialization anomalies - at least, I think, you will sometimes obtain
results that couldn't have been obtained under any serial order of
execution, and maybe it would end up being possible to fail with
serialization errors or unique index violations.

In the past, there have been objections to implementations of MERGE
which would give rise to such serialization anomalies, but I'm not
sure we should feel bound by those discussions.  One thing that's
different is that the common and actually-useful case can now be made
to work in a fairly satisfying way using INSERT .. ON CONFLICT UPDATE;
if less useful cases are vulnerable to some weirdness, maybe it's OK
to just document the problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-27 Thread Alvaro Herrera
Simon Riggs wrote:

> Earlier thoughts on how this could/could not be done were sometimes
> imprecise or inaccurate, so I have gone through the command per
> SQL:2011 spec and produced a definitive spec in the form of an SGML
> ref page. This is what I intend to deliver for PG11.

Nice work.  I didn't verify the SQL spec, just read your HTML page;
some very minor comments based on that:

* use "and" not "where" as initial words in "when_clause" and
  "merge_update" clause definitions

* missing word here: "the DELETE privilege on the if you specify"

* I think the word "match." is leftover from some editing in the phrase
  " that specifies which rows in the data_source match rows in the
  target_table_name. match."  In the same paragraph, it is not clear
  whether all columns must be matched or it can be a partial match.

* In the when_clause note, it is not clear whether you can have multiple
  WHEN MATCHED and WHEN NOT MATCHED clauses.  Obviously you can have one
  of each, but I think your doc says it is possible to have more than one of
  each, with different conditions (WHEN MATCHED AND foo THEN bar WHEN
  MATCHED AND baz THEN qux).  No example shows more than one.

  On the same point: Is there short-circuiting of such conditions, i.e.
  will the execution will stop looking for further WHEN matches if some
  rule matches, or will it rather check all rules and raise an error if
  more than one WHEN rules match each given row?

* Your last example uses ELSE but that appears nowhere in the synopsys.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-27 Thread Szymon Lipiński
Hey,
It looks quite nice. Personally I'd like to also have the returning
statement, and have the number of deleted and inserted rows as separate
numbers in the output message.

regards
Szymon Lipiński

pt., 27.10.2017, 10:56 użytkownik Simon Riggs 
napisał:

> I'm working on re-submitting MERGE for PG11
>
> Earlier thoughts on how this could/could not be done were sometimes
> imprecise or inaccurate, so I have gone through the command per
> SQL:2011 spec and produced a definitive spec in the form of an SGML
> ref page. This is what I intend to deliver for PG11.
>
> MERGE will use the same mechanisms as INSERT ON CONFLICT, so
> concurrent behavior does not require further infrastructure changes,
> just detailed work on the statement itself.
>
> I'm building up the code from scratch based upon the spec, rather than
> trying to thwack earlier attempts into shape. This looks more likely
> to yield a commitable patch.
>
> Major spanners or objections, please throw them in now cos I don't see any.
>
> Questions?
>
> --
> Simon Riggshttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>