Re: [HACKERS] Parallel Seq Scan

2015-09-17 Thread Amit Kapila
On Thu, Sep 17, 2015 at 1:40 AM, Robert Haas  wrote:
>
> On Thu, Sep 10, 2015 at 12:12 AM, Amit Kapila 
wrote:
> >> 2. I think it's probably a good idea - at least for now, and maybe
> >> forever - to avoid nesting parallel plans inside of other parallel
> >> plans.  It's hard to imagine that being a win in a case like this, and
> >> it certainly adds a lot more cases to think about.
> >
> > I also think that avoiding nested parallel plans is a good step forward.
>
> Doing that as a part of the assess parallel safety patch was trivial, so
I did.
>

As per my understanding, what you have done there will not prohibit such
cases.

+* For now, we don't try to use parallel mode if we're running inside
+* a parallel worker.  We might eventually be able to relax this
+* restriction, but for now it seems best not to have parallel workers
+* trying to create their own parallel workers.
+*/
+   glob->parallelModeOK = (cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
+   IsUnderPostmaster && dynamic_shared_memory_type != DSM_IMPL_NONE &&
+   parse->commandType == CMD_SELECT && !parse->hasModifyingCTE &&
+   parse->utilityStmt == NULL && !IsParallelWorker() &&
+   !contain_parallel_unsafe((Node *) parse);


IIUC, your are referring to !IsParallelWorker() check in above code.  If
yes,
then I think it won't work because we generate the plan in master backend,
parallel worker will never exercise this code.  I have tested it as well
with
below example and it still generates SubPlan as Funnel.

CREATE TABLE t1(c1, c2) AS SELECT g, repeat('x', 5) FROM
generate_series(1, 1000) g;

CREATE TABLE t2(c1, c2) AS SELECT g, repeat('x', 5) FROM
generate_series(1, 100) g;

set parallel_seqscan_degree=2;
set cpu_tuple_comm_cost=0.01;

explain select * from t1 where c1 not in (select c1 from t2 where c2 =
'');
 QUERY PLAN



 Funnel on t1  (cost=11536.88..126809.17 rows=3432492 width=36)
   Filter: (NOT (hashed SubPlan 1))
   Number of Workers: 2
   ->  Partial Seq Scan on t1  (cost=11536.88..58159.32 rows=3432492
width=36)
 Filter: (NOT (hashed SubPlan 1))
 SubPlan 1
   ->  Funnel on t2  (cost=0.00..11528.30 rows=3433 width=4)
 Filter: (c2 = ''::text)
 Number of Workers: 2
 ->  Partial Seq Scan on t2  (cost=0.00..4662.68 rows=3433
width
=4)
   Filter: (c2 = ''::text)
   SubPlan 1
 ->  Funnel on t2  (cost=0.00..11528.30 rows=3433 width=4)
   Filter: (c2 = ''::text)
   Number of Workers: 2
   ->  Partial Seq Scan on t2  (cost=0.00..4662.68 rows=3433
width=4)
 Filter: (c2 = ''::text)
(17 rows)


Here the subplan is generated before the top level plan and while generation
of subplan we can't predict whether it is okay to generate it as Funnel or
not,
because it might be that top level plan is non-Funnel.  Also if such a
subplan
is actually an InitPlan, then we are safe (as we execute the InitPlans in
master backend and then pass the result to parallel worker) even if top
level
plan is Funnel.  I think the place where we can catch this is during the
generation of Funnel path, basically we can evaluate if any nodes beneath
Funnel node has 'filter' or 'targetlist' as another Funnel node, then we
have
two options to proceed:
a. Mark such a filter or target list as non-pushable which will indicate
that
they need to be executed only in master backend.  If we go with this
option, then we have to make Funnel node capable of evaluating Filter
and Targetlist which is not a big thing.
b. Don't choose the current path as Funnel path.

I prefer second one as that seems to be simpler as compare to first and
there doesn't seem to be much benefit in going by first.

Any better ideas?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Shulgin, Oleksandr
On Wed, Sep 16, 2015 at 8:07 PM, Pavel Stehule 
wrote:

>
> 2015-09-16 16:31 GMT+02:00 Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de>:
>
>>
>> I've added the timeout parameter to the pg_cmdstatus call, and more
>> importantly to the sending side of the queue, so that one can limit the
>> potential effect of handling the interrupt in case something goes really
>> wrong.
>>
>
> I don't think so introduction new user visible timer is good idea. This
> timer should be invisible
>
> My idea - send a signal and wait 1sec, then check if target process is
> living still. Stop if not. Wait next 5 sec, then check target process. Stop
> if this process doesn't live or raise warning "requested process doesn't
> communicate, waiting.." The final cancel should be done by
> statement_timeout.
>
> what do you think about it?
>

That won't work really well with something like I use to do when testing
this patch, namely:

postgres=# select pid, array(select pg_cmdstatus(pid, 1, 10)) from
pg_stat_activity where pid<>pg_backend_pid() \watch 1

while also running pgbench with -C option (to create new connection for
every transaction).  When a targeted backend exits before it can handle the
signal, the receiving process keeps waiting forever.

The statement_timeout in this case will stop the whole select, not just
individual function call.  Unless you wrap it to set statement_timeout and
catch QUERY_CANCELED in plpgsql, but then you won't be able to ^C the whole
select.  The ability to set a (short) timeout for the function itself
proves to be a really useful feature to me.

We can still communicate some warnings to the client when no timeout is
specified (and make 0 the default value actually).

What I'm now thinking about is probably we can extend this backend
>> communication mechanism in order to query GUC values effective in a
>> different backend or even setting the values.  The obvious candidate to
>> check when you see some query misbehaving would be work_mem, for example.
>> Or we could provide a report of all settings that were overridden in the
>> backend's session, to the effect of running something like this:
>>
>> select * from pg_settings where context = 'user' and setting != reset_val;
>>
>
> this is a good idea. More times I interested what is current setting of
> query. We cannot to use simple query - because it is not possible to push
> PID to function simply, but you can to write function  pg_settings_pid() so
> usage can look like
>
> select * from pg_settings_pid(, possible other params) where ...
>

I would rather have a more general way to run *readonly* queries in the
other backend than invent a new function for every occurrence.

The obvious candidates to be set externally are the
>> cmdstatus_analyze/instrument_*: when you decided you want to turn them on,
>> you'd rather do that carefully for a single backend than globally
>> per-cluster.  One can still modify the postgresql.conf and then send SIGHUP
>> to just a single backend, but I think a more direct way to alter the
>> settings would be better.
>>
>
> I am 100% for possibility to read the setting. But reconfiguration from
> other process is too hot coffee  - it can be available via extension, but
> not via usually used tools.
>

It can be reserved to superuser, and nobody is forcing one to use it
anyway. :-)

In this light should we rename the API to something like "backend control"
>> instead of "command status"?  The SHOW/SET syntax could be extended to
>> support the remote setting retrieval/update.
>>
>
> prepare API, and this functionality can be part of referential
> implementation in contrib.
>
> This patch should not to have too range be finished in this release cycle.
>

These are just the thoughts on what could be achieved using this
cross-backend communication mechanism and ideas for generalization of the
API.

--
Alex


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Pavel Stehule
2015-09-17 11:55 GMT+02:00 Shulgin, Oleksandr 
:

> On Wed, Sep 16, 2015 at 8:07 PM, Pavel Stehule 
> wrote:
>
>>
>> 2015-09-16 16:31 GMT+02:00 Shulgin, Oleksandr <
>> oleksandr.shul...@zalando.de>:
>>
>>>
>>> I've added the timeout parameter to the pg_cmdstatus call, and more
>>> importantly to the sending side of the queue, so that one can limit the
>>> potential effect of handling the interrupt in case something goes really
>>> wrong.
>>>
>>
>> I don't think so introduction new user visible timer is good idea. This
>> timer should be invisible
>>
>> My idea - send a signal and wait 1sec, then check if target process is
>> living still. Stop if not. Wait next 5 sec, then check target process. Stop
>> if this process doesn't live or raise warning "requested process doesn't
>> communicate, waiting.." The final cancel should be done by
>> statement_timeout.
>>
>> what do you think about it?
>>
>
> That won't work really well with something like I use to do when testing
> this patch, namely:
>
> postgres=# select pid, array(select pg_cmdstatus(pid, 1, 10)) from
> pg_stat_activity where pid<>pg_backend_pid() \watch 1
>
> while also running pgbench with -C option (to create new connection for
> every transaction).  When a targeted backend exits before it can handle the
> signal, the receiving process keeps waiting forever.
>

no - every timeout you have to check, if targeted backend is living still,
if not you will do cancel. It is 100% safe.


>
> The statement_timeout in this case will stop the whole select, not just
> individual function call.  Unless you wrap it to set statement_timeout and
> catch QUERY_CANCELED in plpgsql, but then you won't be able to ^C the whole
> select.  The ability to set a (short) timeout for the function itself
> proves to be a really useful feature to me.
>

you cannot to handle QUERY_CANCELED in plpgsql. There is need some internal
timeout - but this timeout should not be visible - any new GUC increase
PostgreSQL complexity - and there is not a reason why do it.


>
> We can still communicate some warnings to the client when no timeout is
> specified (and make 0 the default value actually).
>
> What I'm now thinking about is probably we can extend this backend
>>> communication mechanism in order to query GUC values effective in a
>>> different backend or even setting the values.  The obvious candidate to
>>> check when you see some query misbehaving would be work_mem, for example.
>>> Or we could provide a report of all settings that were overridden in the
>>> backend's session, to the effect of running something like this:
>>>
>>> select * from pg_settings where context = 'user' and setting !=
>>> reset_val;
>>>
>>
>> this is a good idea. More times I interested what is current setting of
>> query. We cannot to use simple query - because it is not possible to push
>> PID to function simply, but you can to write function  pg_settings_pid() so
>> usage can look like
>>
>> select * from pg_settings_pid(, possible other params) where ...
>>
>
> I would rather have a more general way to run *readonly* queries in the
> other backend than invent a new function for every occurrence.
>
> The obvious candidates to be set externally are the
>>> cmdstatus_analyze/instrument_*: when you decided you want to turn them on,
>>> you'd rather do that carefully for a single backend than globally
>>> per-cluster.  One can still modify the postgresql.conf and then send SIGHUP
>>> to just a single backend, but I think a more direct way to alter the
>>> settings would be better.
>>>
>>
>> I am 100% for possibility to read the setting. But reconfiguration from
>> other process is too hot coffee  - it can be available via extension, but
>> not via usually used tools.
>>
>
> It can be reserved to superuser, and nobody is forcing one to use it
> anyway. :-)
>
> In this light should we rename the API to something like "backend control"
>>> instead of "command status"?  The SHOW/SET syntax could be extended to
>>> support the remote setting retrieval/update.
>>>
>>
>> prepare API, and this functionality can be part of referential
>> implementation in contrib.
>>
>> This patch should not to have too range be finished in this release cycle.
>>
>
> These are just the thoughts on what could be achieved using this
> cross-backend communication mechanism and ideas for generalization of the
> API.
>

ok


>
> --
> Alex
>
>


Re: [HACKERS] a funnel by any other name

2015-09-17 Thread Nicolas Barbier
2015-09-17 Robert Haas :

> 1. Exchange Bushy
> 2. Exchange Inter-Operator (this is what's currently implemented)
> 3. Exchange Replicate
> 4. Exchange Merge
> 5. Interchange

> 1. ?
> 2. Gather
> 3. Broadcast (sorta)
> 4. Gather Merge
> 5. Redistribute

> 1. Parallel Child
> 2. Parallel Gather
> 3. Parallel Replicate
> 4. Parallel Merge
> 5. Parallel Redistribute

FYI, SQL Server has these in its execution plans:

* Distribute Streams: read from one thread, write to multiple threads
* Repartition Streams: both read and write from/to multiple threads
* Gather Streams: read from multiple threads, write to one thread

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] LW_SHARED_MASK macro

2015-09-17 Thread Alexander Korotkov
Hackers,

while exploring lwlock.c I found following macro to be strange.

#define LW_SHARED_MASK ((uint32)(1 << 23))

This is macro is used to extract number of shared locks from state.

ereport(LOG,
(errhidestmt(true),
errhidecontext(true),
errmsg("%d: %s(%s): excl %u shared %u haswaiters %u waiters %u rOK %d",
MyProcPid,
where, MainLWLockNames[id],
!!(state & LW_VAL_EXCLUSIVE),
state & LW_SHARED_MASK,
!!(state & LW_FLAG_HAS_WAITERS),
pg_atomic_read_u32(>nwaiters),
!!(state & LW_FLAG_RELEASE_OK;


Should it be ((uint32) ((1 << 24)-1)) instead?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


lw_shared_mask.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Robert Haas
On Tue, Sep 15, 2015 at 5:00 AM, Shulgin, Oleksandr
 wrote:
> Please see attached for implementation of this approach.  The most
> surprising thing is that it actually works :)

It's cool to see these interesting ideas for using some of the code
I've been working on for the last couple of years.  However, it seems
to me that a better design would avoid the use of shm_mq.  Instead of
having the guy who asks the question create a DSM, have the guy who
sends back the answer create it.  Then, he can just make the DSM big
enough to store the entire string.  I think that's going to be a lot
more robust than what you have here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Robert Haas
On Wed, Sep 16, 2015 at 10:31 AM, Shulgin, Oleksandr
 wrote:
> I've also decided we really ought to suppress any possible ERROR level
> messages generated during the course of processing the status request in
> order not to prevent the originally running transaction to complete.  The
> errors so caught are just logged using LOG level and ignored in this new
> version of the patch.

This plan has almost zero chance of surviving committer-level
scrutiny, and if it somehow does, some other committer will scream
bloody murder as soon as they notice it.

It's not safe to just catch an error and continue on executing.  You
have to abort the (sub)transaction once an error happens.

Of course, this gets at a pretty crucial design question for this
patch, which is whether it's OK for one process to interrogate another
process's state, and what the consequences of that might be.  What
permissions are needed?  Can you DOS the guy running a query by
pinging him for status at a very high rate?

The other question here is whether it's really safe to do this.
ProcessInterrupts() gets called at lots of places in the code, and we
may freely add more in the future.  How are you going to prove that
ProcessCmdStatusInfoRequest() is safe to call in all of those places?
How do you know that some data structure you find by chasing down from
ActivePortal or current_query_stack doesn't have a NULL pointer
someplace that you don't expect, because the code that initializes
that pointer hasn't run yet?

I'd like to see this made to work and be safe, but I think proving
that it's truly robust in all cases is going to be hard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Proposal for \rotate in psql

2015-09-17 Thread Pavel Stehule
2015-09-08 22:55 GMT+02:00 Daniel Verite :

> Pavel Stehule wrote:
>
> > rotate ~ sounds like transpose matrix, what is not true in this case.
>
> The various definitions that I can see, such as
> http://dictionary.reference.com/browse/rotate
> make no mention of matrices. It applies to anything that
> moves around a pivot or axis.
>
> OTOH, the established term for the matrix operation you're
> referring to appears to be "transpose", as you mention.
> https://en.wikipedia.org/wiki/Transpose
>
> I notice that according to
> http://www.thesaurus.com/browse/transpose
> "rotate" is not present in the 25+ synonyms they suggest for
> "transpose".
>
> In the above wikipedia article about matrix transposition,
> "rotate" is also never used anywhere.
>
> "rotate matrix" does not exist for google ngrams, whereas
> "transpose matrix" does.
> https://books.google.com/ngrams
>
> Overall I don't see the evidence that "rotate" alone  would
> suggest transposing a matrix.
>
> Now it appears that there is a concept in linear algebra named
> "rotation matrix", defined as:
> https://en.wikipedia.org/wiki/Rotation_matrix
> that seems quite relevant for 3D software.
>
> But as psql is not a tool for linear algebra or 3D in the first place,
> who could realistically be deceived?
>

in the help inside your last patch, you are using "crosstab". Cannto be
crosstab the name for this feature?

Regards

Pavel


Re: [HACKERS] Parallel Seq Scan

2015-09-17 Thread Amit Kapila
On Tue, Sep 15, 2015 at 8:34 AM, Haribabu Kommi 
wrote:
>
> >
>
> I reviewed the parallel_seqscan_funnel_v17.patch and following are my
comments.
> I will continue my review with the
parallel_seqscan_partialseqscan_v17.patch.
>

Thanks for the review.

> + if (inst_options)
> + {
> + instoptions = shm_toc_lookup(toc, PARALLEL_KEY_INST_OPTIONS);
> + *inst_options = *instoptions;
> + if (inst_options)
>
> Same pointer variable check, it should be if (*inst_options) as per the
> estimate and store functions.
>

makes sense, will change in next version of patch.

>
> + if (funnelstate->ss.ps.ps_ProjInfo)
> + slot = funnelstate->ss.ps.ps_ProjInfo->pi_slot;
> + else
> + slot = funnelstate->ss.ss_ScanTupleSlot;
>
> Currently, there will not be a projinfo for funnel node.

No, that's not true, it has projinfo for the cases where it is required.

> So always it uses
> the scan tuple slot. In case if it is different, we need to add the
ExecProject
> call in ExecFunnel function.
>

It will not use Scan tuple slot for cases for column list contains
expression
or other cases where projection is required.  Currently we don't need
separate
ExecProject as there is no case where workers won't do projection for us.
However in future we might need for something like restrictive functions.
I think it is better to add a comment explaining the same which I will do in
next version.  Does that makes sense?


>
>
> + if (!((*dest->receiveSlot) (slot, dest)))
> + break;
>
> and
>
> +void
> +TupleQueueFunnelShutdown(TupleQueueFunnel *funnel)
> +{
> + if (funnel)
> + {
> + int i;
> + shm_mq_handle *mqh;
> + shm_mq   *mq;
> + for (i = 0; i < funnel->nqueues; i++)
> + {
> + mqh = funnel->queue[i];
> + mq = shm_mq_get_queue(mqh);
> + shm_mq_detach(mq);
> + }
> + }
> +}
>
>
> Using this function, the backend detaches from the message queue, so
> that the workers
> which are trying to put results into the queues gets an error message
> as SHM_MQ_DETACHED.
> Then worker finshes the execution of the plan. For this reason all the
> printtup return
> types are changed from void to bool.
>
> But this way the worker doesn't get exited until it tries to put a
> tuple in the queue.
> If there are no valid tuples that satisfy the condition, then it may
> take time for the workers
> to exit. Am I correct? I am not sure how frequent such scenarios can
occur.
>

Yes, you are right.  The main reason to keep it like this is that we
want to finish execution without going into error path, so that
the collected stats can be communicated back.  I am not sure trying
to do better in this case is worth at this point.

>
> + if (parallel_seqscan_degree >= MaxConnections)
> + {
> + write_stderr("%s: parallel_scan_degree must be less than
> max_connections\n", progname);
> + ExitPostmaster(1);
> + }
>
> The error condition works only during server start. User still can set
> parallel seqscan degree
> more than max connection at super user session level and etc.
>

I think we can remove this check as pointed out by Robert as well.

>
> + if (!parallelstmt->inst_options)
> + (*receiver->rDestroy) (receiver);
>
> Why only when there is no instruementation only, the receiver needs to
> be destroyed?
>

No, receiver should be destroyed unconditionally.  This is remnant of the
previous versions when receiver was created for no instrumentation case.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] PATCH: index-only scans with partial indexes

2015-09-17 Thread Kyotaro HORIGUCHI
Hi, sorry in advance for hardly readable long descriptions..

At Mon, 14 Sep 2015 13:27:47 +0200, Tomas Vondra  
wrote in <55f6af33.8000...@2ndquadrant.com>
> I don't think this is particularly related to the patch, because some
> of the anomalies can be observed even on master. For example, let's
> force the index scans to be non-IOS by requesting another column from
> the heap:
...
> I claim that either both or none of the indexes should use "Index
> Cond".

Perhaps I understood your point and I think I understood this.

Planner dicides whether to use the partial index far before
creating specific paths, when measuring baserel sizes. If some
partial index is implied by the restriction list,
check_partial_indexes() marks the index as predOk, which means
that the index is usable for the baserel scan even if the
restriction clauses doesn't match the index columns.

Then create_index_paths allows to generating paths for partial
indexes with predOK. Index clauses of the created paths for the
partial indexes don't contain clauses doesn't match the index
columns. It is empty for your first example and it is the same
with restrictinfo for the second.

Finally, create_indexscan_plan strips baserestriction caluses
implied by index predicate in addtion to index conditions. So
both of your example has no filter conditions.

The following example would show you the result of the above
steps.

explain select c from t where b between 30 + 1 and 60 and c = 3;
QUERY PLAN
--
 Index Scan using idx1 on t  (cost=0.42..11665.77 rows=1 width=4)
   Filter: ((b >= 31) AND (c = 3))

The index predicate (b >= 30 AND b <= 60) implies the
restrction (b >= 31 AND b <= 60) so idx1 is allowed to be
used, then conversely the restriction b >= 31 is implied by
the index predicate b >= 30 so it is not shown as Index Cond
and the other two are not, so they are shown and executed as
Filter.

But regardless of whether stripped as implied conditions or not
at planning phase, the cost for all clauses that don't match the
index columns are added when creating index paths. That will be
one of the cause of cost error.

> This is exactly the same reason that lead to the strange behavior
> after applying the patch, but in this case the heap access actually
> introduces some additional cost so the issue is not that obvious.

So you're right on the point that check_index_only is doing
wrong. It should properly ignore restrictions implied by index
predicates as your patch is doing. But cost_index doesn't know
that some nonindex-conditions of baserestrictinfo is finally
useless, and it is assuming that nonindex conditions are always
applied on heap tuples.


After all, what should be done to properly ignore useless
conditions would be,

 1. Make create_index_paths() to make the list of restrict
clauses which are implied by the index predicate of the index
in focus. The list would be stored as a member in
IndexOptInfo. Then create index clauses excluding the listed
clauses and call get_index_paths using them. Modify
check_index_only() to ignore the listed clauses when pulling
varattnos. This is similar but different a bit to what I said
in the previous mail.

 2. Pass the listed clauses to generated IndexPath.

 3. Modify extract_nonindex_conditions to be called with the
exclusion list and the cluases are exluded from the result of
the function.

 4. Make create_indexscan_plan to extract qpqual excluding the
exclusion list.

The same result could be obtained by more smarter way but the
steps above will archive right decision on whether to do index
only scan on partial index and correct cost estimate propery
ignoring unused restrictions.

Does this make sense for you?

> But in reality the costs should be pretty much exactly the same - the
> indexes have exactly the same size, statistics, selectivity etc.
> 
> Also, the plan difference suggests this is not merely a costing issue,
> because while with idx1 (first plan) it was correctly detected we
> don't need to evaluate the condition on the partial index, on idx2
> that's not true and we'll waste time doing that. So we probably can't
> just tweak the costing a bit - this probably needs to be addressed
> when actually building the index path.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-09-17 Thread Robert Haas
On Thu, Sep 17, 2015 at 2:54 AM, Amit Kapila  wrote:
> As per my understanding, what you have done there will not prohibit such
> cases.
>
> +* For now, we don't try to use parallel mode if we're running inside
> +* a parallel worker.  We might eventually be able to relax this
> +* restriction, but for now it seems best not to have parallel workers
> +* trying to create their own parallel workers.
> +*/
> +   glob->parallelModeOK = (cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
> +   IsUnderPostmaster && dynamic_shared_memory_type != DSM_IMPL_NONE &&
> +   parse->commandType == CMD_SELECT && !parse->hasModifyingCTE &&
> +   parse->utilityStmt == NULL && !IsParallelWorker() &&
> +   !contain_parallel_unsafe((Node *) parse);
>
>
> IIUC, your are referring to !IsParallelWorker() check in above code.  If
> yes,
> then I think it won't work because we generate the plan in master backend,
> parallel worker will never exercise this code.  I have tested it as well
> with
> below example and it still generates SubPlan as Funnel.

You're right.  That's still a good check, because some function called
in the worker might try to execute a query all of its own, but it
doesn't prevent the case you are talking about.

> Here the subplan is generated before the top level plan and while generation
> of subplan we can't predict whether it is okay to generate it as Funnel or
> not,
> because it might be that top level plan is non-Funnel.  Also if such a
> subplan
> is actually an InitPlan, then we are safe (as we execute the InitPlans in
> master backend and then pass the result to parallel worker) even if top
> level
> plan is Funnel.  I think the place where we can catch this is during the
> generation of Funnel path, basically we can evaluate if any nodes beneath
> Funnel node has 'filter' or 'targetlist' as another Funnel node, then we
> have
> two options to proceed:
> a. Mark such a filter or target list as non-pushable which will indicate
> that
> they need to be executed only in master backend.  If we go with this
> option, then we have to make Funnel node capable of evaluating Filter
> and Targetlist which is not a big thing.
> b. Don't choose the current path as Funnel path.
>
> I prefer second one as that seems to be simpler as compare to first and
> there doesn't seem to be much benefit in going by first.
>
> Any better ideas?

I haven't studied the planner logic in enough detail yet to have a
clear opinion on this.  But what I do think is that this is a very
good reason why we should bite the bullet and add outfuncs/readfuncs
support for all Plan nodes.  Otherwise, we're going to have to scan
subplans for nodes we're not expecting to see there, which seems
silly.  We eventually want to allow all of those nodes in the worker
anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] planstate_tree_walker

2015-09-17 Thread Robert Haas
In my reviews of Amit's parallel sequential scan patches yesterday, I
complained that he was using planstate_tree_walker incorrectly, but I
failed to notice that this was because he'd defined
planstate_tree_walker incorrectly.  This morning I figured that out.
:-)

Here is a patch that *just* introduces planstate_tree_walker and which
is hopefully correct.  I stole the logic from ExplainPreScanNode,
which I also refactored to use the new walker instead of duplicating
the logic.

I'd like to go ahead and commit this if nobody sees a problem with it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


planstate-tree-walker.ptch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a funnel by any other name

2015-09-17 Thread Amit Kapila
On Thu, Sep 17, 2015 at 8:09 AM, Robert Haas  wrote:
>
> Or, yet another option, we could combine the similar operators under
> one umbrella while keeping the things that are more different as
> separate nodes:
>
> 1, 2. Exchange (or Gather or Funnel)
> 3, 5. Distribute (or Redistribute or Interchange or Exchange)
> 4. Exchange Merge (or Gather Merge or Funnel Merge)
>

+1 for combining, but it seems better to call 1,2 as Parallel Gather
and similarly for others.  Adding Parallel to Gather makes it
self-explanatory.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] tsvector work with citext

2015-09-17 Thread Teodor Sigaev

However, it does seem like this function is not implementing its
specification.  Why isn't it just "IsBinaryCoercible(typid, TEXTOID)"?

Oversight, I suppose. is_text_type() was introduced by

commit 635aaab278afc1af972a4b6a55ff632ab763505d
Author: Tom Lane 
Date:   Tue Apr 8 18:20:29 2008 +

Fix tsvector_update_trigger() to be domain-friendly: it needs to allow all
the columns it works with to be domains over the expected type, not just
exactly the expected type.  In passing, fix ts_stat() the same way.
Per report from Markus Wollny.

Will fix. Suppose, is_expected_type() could be replaced to IsBinaryCoercible 
too.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] honour TEMP_CONFIG in pg_upgrade tests

2015-09-17 Thread Andrew Dunstan



On 09/17/2015 09:42 AM, Tom Lane wrote:

Andrew Dunstan  writes:

I propose to have the pg_upgrade test honour TEMP_CONFIG as pg_regress
(or its Makefile) does, by applying the patch below. There might be
opther places this should also be done, but this would be a start.

No objection to the concept, but I wonder if the -f test should
instead be a -r test (in the original place too, I guess).




In the original place permissions are in effect tested in pg_regress.c.

I'll go ahead with -r.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Microvacuum for gist.

2015-09-17 Thread Teodor Sigaev

But I don't understand this comment:

+   /*
+* While we delete only one tuple at once we could mix calls
+* PageIndexTupleDelete() here and PageIndexMultiDelete() in
+* gistRedoPageUpdateRecord()
+*/

Does this mean:

Since we delete only one tuple per WAL record here, we can call
PageIndexTupleDelete() here and re-play it with PageIndexMultiDelete() in
gistRedoPageUpdateRecord()


yes. The problem was when we delete tuples with offset (2,4,6) with 
PageIndexMultiDelete() we will delete exctly pointed tuples. Bur if we delete 
tuple by PageIndexTupleDelete() with offset  2 then 4-th tuple will be moved 3 3 
and 6 become 5. So, next tuple to delete now is 3 and we should call 
PageIndexTupleDelete(3) and so on. And bug was: we deleted tuples in 
ginpagevacuum with a help of PageIndexMultiDelete() and write to WAL his 
argument, and recovery process uses  PageIndexTupleDelete() without correcting 
offset.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Shulgin, Oleksandr
On Thu, Sep 17, 2015 at 2:14 PM, Robert Haas  wrote:

> On Wed, Sep 16, 2015 at 10:31 AM, Shulgin, Oleksandr
>  wrote:
> > I've also decided we really ought to suppress any possible ERROR level
> > messages generated during the course of processing the status request in
> > order not to prevent the originally running transaction to complete.  The
> > errors so caught are just logged using LOG level and ignored in this new
> > version of the patch.
>
> This plan has almost zero chance of surviving committer-level
> scrutiny, and if it somehow does, some other committer will scream
> bloody murder as soon as they notice it.
>
> It's not safe to just catch an error and continue on executing.  You
> have to abort the (sub)transaction once an error happens.
>

Hm... is this really different from WHEN EXCEPTION clause in plpgsql?

Of course, this gets at a pretty crucial design question for this
> patch, which is whether it's OK for one process to interrogate another
> process's state, and what the consequences of that might be.  What
> permissions are needed?


I think superuser or same user is pretty reasonable requirement.  Can be
limited to superuser only if there are some specific concerns.

Can you DOS the guy running a query by
> pinging him for status at a very high rate?
>

Probably, but if you've got enough permissions to do that
pg_cancel/terminate_backend() would be easier.

The other question here is whether it's really safe to do this.
> ProcessInterrupts() gets called at lots of places in the code, and we
> may freely add more in the future.  How are you going to prove that
> ProcessCmdStatusInfoRequest() is safe to call in all of those places?
> How do you know that some data structure you find by chasing down from
> ActivePortal or current_query_stack doesn't have a NULL pointer
> someplace that you don't expect, because the code that initializes
> that pointer hasn't run yet?
>
> I'd like to see this made to work and be safe, but I think proving
> that it's truly robust in all cases is going to be hard.
>

Yes, this is a perfectly good point.  For the purpose of obtaining the
explain plans, I believe anything we capture in ExecutorRun ought to be
safe to run Explain on: the plannedstmt is there and isn't going to change.

In a more general case, if we seek to extend this approach, then yes one
should be very careful in what is allowed to run during ProcessInterrupts().

What we could do instead to be extra-safe, is make the running backend
prepare the information for other backends to obtain from the shared
memory.  Explain plans can be easily captured at the ExecutorRun hook.  The
obvious downside is that you can no longer choose the explain format, or
obtain partially collected instrumentation data.

--
Alex


Re: [HACKERS] creating extension including dependencies

2015-09-17 Thread Jeff Janes
On Tue, Sep 15, 2015 at 8:44 PM, Petr Jelinek  wrote:

> On 2015-09-08 04:06, Michael Paquier wrote:
>
>> On Tue, Sep 8, 2015 at 10:44 AM, Michael Paquier
>>  wrote:
>>
>>>
>>> Attached are as well changes for the documentation that I spotted in
>>> earlier reviews but were not included in the last version sent by Petr
>>> yesterday. Feel free to discard them if you think they are not
>>> adapted, the patch attached applies on top of Petr's patch.
>>>
>>
>> And /log/ is missing in src/test/modules/extensions/.gitignore.
>>
>>
> Ah sorry, I based it on my branch which didn't contain your changes.
> Merged.


If I fail to specify CASCADE and get an ERROR, I think there should be a
HINT which suggests the use of CASCADE.


create extension earthdistance ;
ERROR:  required extension "cube" is not installed

(no hint)

There is a HINT on the reverse operation:
drop extension cube;
ERROR:  cannot drop extension cube because other objects depend on it
DETAIL:  extension earthdistance depends on extension cube
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

Also, It would be nice to have psql tab complete the word CASCADE.

Cheers,

Jeff


Re: [HACKERS] [PATCH] Microvacuum for gist.

2015-09-17 Thread Jeff Janes
On Wed, Sep 16, 2015 at 8:36 AM, Teodor Sigaev  wrote:

> But It seems to me that it would be better to rewrite all mentions of
>> TupleDelete to MultiDelete in gist code.
>>
>
> Sure. Patch is attached, and it changes WAL format, so be carefull with
> testing.
> Please, have a look.
>
> Also in attach scripts reproduce bug from Jeff's report:
> g.pl - creates and fills test table
> w.pl - worker, could run in several session
>
> Usage
> perl g.pl | psql contrib_regression
> perl w.pl |  psql contrib_regression | grep 'UPDATE 0'
>
> and killall -9 postgres while w.pl is running. Recovery will fail with
> high probability.
>
> Thank you, Jeff, for report.


Thanks, that seems to have fixed it.

But I don't understand this comment:

+   /*
+* While we delete only one tuple at once we could mix calls
+* PageIndexTupleDelete() here and PageIndexMultiDelete() in
+* gistRedoPageUpdateRecord()
+*/

Does this mean:

Since we delete only one tuple per WAL record here, we can call
PageIndexTupleDelete() here and re-play it with PageIndexMultiDelete() in
gistRedoPageUpdateRecord()

Thanks,

Jeff


Re: [HACKERS] PATCH: index-only scans with partial indexes

2015-09-17 Thread Tomas Vondra

Hello Horiguchi-san,

On 09/17/2015 12:45 PM, Kyotaro HORIGUCHI wrote:


After all, what should be done to properly ignore useless
conditions would be,

  1. Make create_index_paths() to make the list of restrict
 clauses which are implied by the index predicate of the index
 in focus. The list would be stored as a member in
 IndexOptInfo. Then create index clauses excluding the listed
 clauses and call get_index_paths using them. Modify
 check_index_only() to ignore the listed clauses when pulling
 varattnos. This is similar but different a bit to what I said
 in the previous mail.

  2. Pass the listed clauses to generated IndexPath.

  3. Modify extract_nonindex_conditions to be called with the
 exclusion list and the cluases are exluded from the result of
 the function.

  4. Make create_indexscan_plan to extract qpqual excluding the
 exclusion list.

The same result could be obtained by more smarter way but the
steps above will archive right decision on whether to do index
only scan on partial index and correct cost estimate propery
ignoring unused restrictions.

Does this make sense for you?


Yes, this seems sane. I've been poking at this a bit too, and I came to 
the same plan in general, except that I think it's better to build list 
of clauses that are *not* implied by the index, because that's what we 
need both in cost_index and check_index_only.


It also seems to me that this change (arguably a bug fix) should pretty 
much make the original patch irrelevant, because check_index_only can 
simply walk over the new list.


regards

--
Tomas Vondra   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_resetxlog sentences

2015-09-17 Thread Alvaro Herrera
Fujii Masao wrote:
> On Thu, Sep 17, 2015 at 5:00 AM, Alvaro Herrera
>  wrote:
> > Robert Haas wrote:
> >> On Tue, Sep 15, 2015 at 9:52 PM, Euler Taveira  
> >> wrote:
> >> > While updating translations, I came across those almost similar 
> >> > sentences.
> >> >
> >> > pg_controldata.c
> >> >
> >> > 273 printf(_("Latest checkpoint's oldestCommitTs:   %u\n"),
> >> > 274ControlFile.checkPointCopy.oldestCommitTs);
> >> >
> >> > pg_resetxlog.c
> >> >
> >> >  668 printf(_("Latest checkpoint's oldest CommitTs:  %u\n"),
> >> >  669ControlFile.checkPointCopy.oldestCommitTs);
> >> >  670 printf(_("Latest checkpoint's newest CommitTs:  %u\n"),
> >> >  671ControlFile.checkPointCopy.newestCommitTs);
> >> >
> >> > To be consistent, let's change pg_resetxlog to mimic pg_controldata
> >> > sentence. Patch is attached. It is new in 9.5 so backpatch is needed.
> >>
> >> Seems like a good idea to me.  Anyone disagree?
> >
> > OK with me.
> 
> +1
> 
> One relevant question is; why doesn't pg_controldata report newestCommitTs?

Most likely an oversight.  As I recall, we added newestCommitTs in a
later version of the patch; we probably patched pg_controldata earlier
and then failed to realize that we needed to add the new field.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] planstate_tree_walker

2015-09-17 Thread Robert Haas
On Thu, Sep 17, 2015 at 9:33 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Here is a patch that *just* introduces planstate_tree_walker and which
>> is hopefully correct.  I stole the logic from ExplainPreScanNode,
>> which I also refactored to use the new walker instead of duplicating
>> the logic.
>
> It seems a little odd to have removed functions from explain.c altogether
> but left their header comments behind.  Otherwise this seems sound.

D'oh!

OK, will fix that before committing.  Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] honour TEMP_CONFIG in pg_upgrade tests

2015-09-17 Thread Andrew Dunstan
I propose to have the pg_upgrade test honour TEMP_CONFIG as pg_regress 
(or its Makefile) does, by applying the patch below. There might be 
opther places this should also be done, but this would be a start. The 
motivation is to see if it stops the errors we often see on the somewhat 
slow axolotl buildfarm member when checking pg_upgrade - axolotl's 
TEMP_CONFIG sets up the stats temp directory on a ramdisk, as well as 
turning off fsync.


This would be backpatched to 9.2.


cheers

andrew


diff --git a/src/bin/pg_upgrade/test.sh b/src/bin/pg_upgrade/test.sh
index ec3a7ed..c0dcf8b 100644
--- a/src/bin/pg_upgrade/test.sh
+++ b/src/bin/pg_upgrade/test.sh
@@ -21,6 +21,10 @@ unset MAKELEVEL
 # authentication configuration.
 standard_initdb() {
"$1" -N
+   if [ -n "$TEMP_CONFIG" -a -f "$TEMP_CONFIG" ]
+   then
+   cat "$TEMP_CONFIG" >> "$PGDATA/postgresql.conf"
+   fi
../../test/regress/pg_regress --config-auth "$PGDATA"
 }




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving test coverage of extensions with pg_dump

2015-09-17 Thread Michael Paquier
On Thu, Sep 17, 2015 at 7:24 AM, Andres Freund wrote:
> On 2015-09-16 22:18:55 -0700, Michael Paquier wrote:
>> Problem is similar with --column-inserts, --inserts and COPY. We could
>> use --exclude-table like in the patch attached when taking the dump
>> from source database but that's grotty, or we could improve pg_dump
>> itself, though it may not be worth it for just this purpose.
>> Thoughts?
>
> Hm. To me that sounds like a moderately serious bug worth fixing. I mean
> if you have a COPY command that worked before a dump/restore but that's
> wrong afterwards, it's pretty ugly, no?

COPY or INSERT include the column list in dumps, so that's not really
an issue here, what is potentially a problem (looking at that freshly)
is --inserts with data-only dumps though. So yes we had better fix it
and perhaps consider a backpatch...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Pavel Stehule
2015-09-17 16:46 GMT+02:00 Robert Haas :

> On Thu, Sep 17, 2015 at 10:28 AM, Pavel Stehule 
> wrote:
> > Please, can you explain what is wrong on using of shm_mq? It works pretty
> > well now.
> >
> > There can be a contra argument, why don't use shm_mq, because it does
> data
> > exchange between processes and this patch introduce new pattern for same
> > thing.
>
> Sure, I can explain that.
>
> First, when you communication through a shm_mq, the writer has to wait
> when the queue is full, and the reader has to wait when the queue is
> empty.  If the message is short enough to fit in the queue, then you
> can send it and be done without blocking.  But if it is long, then you
> will have each process repeatedly waiting for the other one until the
> whole message has been sent.  That is going to make sending the
> message potentially a lot slower than writing it all in one go,
> especially if the system is under heavy load.
>


Is there some risk if we take too much large DSM segment? And what is max
size of DSM segment? When we use shm_mq, we don't need to solve limits.


>
> Also, if there are any bugs in the way the shm_mq is being used,
> they're likely to be quite rare and hard to find, because the vast
> majority of messages will probably be short enough to be sent in a
> single chunk, so whatever bugs may exist when the processes play
> ping-ping are unlikely to occur in practice except in unusual cases
> where the message being returned is very long.
>

This is true for any functionality based on shm_mq - parallel seq scan,


>
> Second, using a shm_mq manipulates the state of the process latch.  I
> don't think you can make the assumption that it's safe to reset the
> process latch at any and every place where we check for interrupts.
> For example, suppose the process is already using a shm_mq and the
> CHECK_FOR_INTERRUPTS() call inside that code then discovers that
> somebody has activated this mechanism and you now go try to send and
> receive from a new shm_mq.  But even if that and every other
> CHECK_FOR_INTERRUPTS() in the code can tolerate a process latch reset
> today, it's a new coding rule that could easily trip people up in the
> future.
>
>
It is valid, and probably most important. But if we introduce own
mechanism, we will play with process latch too (although we can use LWlocks)


> Using a shm_mq is appropriate when the amount of data that needs to be
> transmitted might be very large - too large to just allocate a buffer
> for the whole thing - or when the amount of data can't be predicted
> before memory is allocated.  But there is obviously no rule that a
> shm_mq should be used any time we have "data exchange between
> processes"; we have lots of shared-memory based IPC already and have
> for many years, and shm_mq is newer than the vast majority of that
> code.
>

I am little bit disappointed - I hoped so shm_mq can be used as generic
interprocess mechanism - that will ensure all corner cases for work with
shared memory. I understand to shm_mq is new, and nobody used it, but this
risk is better than invent wheels again and again.

Regards

Pavel


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] planstate_tree_walker

2015-09-17 Thread Tom Lane
Robert Haas  writes:
> Here is a patch that *just* introduces planstate_tree_walker and which
> is hopefully correct.  I stole the logic from ExplainPreScanNode,
> which I also refactored to use the new walker instead of duplicating
> the logic.

It seems a little odd to have removed functions from explain.c altogether
but left their header comments behind.  Otherwise this seems sound.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving test coverage of extensions with pg_dump

2015-09-17 Thread Andres Freund
On 2015-09-16 22:18:55 -0700, Michael Paquier wrote:
> So, here we go.

Cool.

> I have found something quite interesting when playing with the patch
> attached: dump does not guarantee the column ordering across databases
> for some inherited tables, see that example from the main regression
> test suite the following diff between a dump taken from a source
> database and a target database where the source dump has been restored
> in first:
> -INSERT INTO b_star (class, aa, bb, a) VALUES ('b', 3, 'mumble', NULL);
> -INSERT INTO b_star (class, aa, bb, a) VALUES ('b', 4, NULL, NULL);
> -INSERT INTO b_star (class, aa, bb, a) VALUES ('b', NULL, 'bumble', NULL);
> -INSERT INTO b_star (class, aa, bb, a) VALUES ('b', NULL, NULL, NULL);
> +INSERT INTO b_star (class, aa, a, bb) VALUES ('b', 3, NULL, 'mumble');
> +INSERT INTO b_star (class, aa, a, bb) VALUES ('b', 4, NULL, NULL);
> +INSERT INTO b_star (class, aa, a, bb) VALUES ('b', NULL, NULL, 'bumble');
> +INSERT INTO b_star (class, aa, a, bb) VALUES ('b', NULL, NULL, NULL);
> 
> Problem is similar with --column-inserts, --inserts and COPY. We could
> use --exclude-table like in the patch attached when taking the dump
> from source database but that's grotty, or we could improve pg_dump
> itself, though it may not be worth it for just this purpose.
> Thoughts?

Hm. To me that sounds like a moderately serious bug worth fixing. I mean
if you have a COPY command that worked before a dump/restore but that's
wrong afterwards, it's pretty ugly, no?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Pavel Stehule
2015-09-17 16:06 GMT+02:00 Shulgin, Oleksandr 
:

> On Thu, Sep 17, 2015 at 12:06 PM, Pavel Stehule 
> wrote:
>
>>
>>> That won't work really well with something like I use to do when testing
>>> this patch, namely:
>>>
>>> postgres=# select pid, array(select pg_cmdstatus(pid, 1, 10)) from
>>> pg_stat_activity where pid<>pg_backend_pid() \watch 1
>>>
>>> while also running pgbench with -C option (to create new connection for
>>> every transaction).  When a targeted backend exits before it can handle the
>>> signal, the receiving process keeps waiting forever.
>>>
>>
>> no - every timeout you have to check, if targeted backend is living
>> still, if not you will do cancel. It is 100% safe.
>>
>
> But then you need to make this internal timeout rather short, not 1s as
> originally suggested.
>

can be - 1 sec is max, maybe 100ms is optimum.

>
> The statement_timeout in this case will stop the whole select, not just
>>> individual function call.  Unless you wrap it to set statement_timeout and
>>> catch QUERY_CANCELED in plpgsql, but then you won't be able to ^C the whole
>>> select.  The ability to set a (short) timeout for the function itself
>>> proves to be a really useful feature to me.
>>>
>>
>> you cannot to handle QUERY_CANCELED in plpgsql.
>>
>
> Well, you can but its not that useful of course:
>

hmm, some is wrong - I remember from some older plpgsql, so CANCEL message
is not catchable. Maybe I have bad memory. I have to recheck it.


>
> =# create or replace function test_query_cancel() returns void language
> plpgsql as $$ begin
>  perform pg_sleep(1);
>  exception when query_canceled then raise notice 'cancel';
> end; $$;
> CREATE FUNCTION
> =# set statement_timeout to '100ms';
> SET
> =# select test_query_cancel();
> NOTICE:  cancel
>  test_query_cancel
> ---
>
> (1 row)
> =# select test_query_cancel() from generate_series(1, 100) x;
> NOTICE:  cancel
> ^CCancel request sent
> NOTICE:  cancel
> ^CCancel request sent
>
> Now you cannot cancel this query unless you do pg_terminate_backend() or
> equivalent.
>
> There is need some internal timeout - but this timeout should not be
>> visible - any new GUC increase PostgreSQL complexity - and there is not a
>> reason why do it.
>>
>
> But the GUC was added for the timeout on the sending side, not the
> receiving one.  There is no "one value fits all" for this, but you would
> still want to make the effects of this as limited as possible.
>

I still believe so any new GUC is not necessary. If you don't like
statement_timeout, we can copy the behave of CREATE DATABASE - there are
few 5sec cycles (when template1 is occupated) and break.

Regards

Pavel


>
> --
> Alex
>
>


Re: [HACKERS] honour TEMP_CONFIG in pg_upgrade tests

2015-09-17 Thread Tom Lane
Andrew Dunstan  writes:
> I propose to have the pg_upgrade test honour TEMP_CONFIG as pg_regress 
> (or its Makefile) does, by applying the patch below. There might be 
> opther places this should also be done, but this would be a start.

No objection to the concept, but I wonder if the -f test should
instead be a -r test (in the original place too, I guess).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Pavel Stehule
2015-09-17 14:06 GMT+02:00 Robert Haas :

> On Tue, Sep 15, 2015 at 5:00 AM, Shulgin, Oleksandr
>  wrote:
> > Please see attached for implementation of this approach.  The most
> > surprising thing is that it actually works :)
>
> It's cool to see these interesting ideas for using some of the code
> I've been working on for the last couple of years.  However, it seems
> to me that a better design would avoid the use of shm_mq.  Instead of
> having the guy who asks the question create a DSM, have the guy who
> sends back the answer create it.  Then, he can just make the DSM big
> enough to store the entire string.  I think that's going to be a lot
> more robust than what you have here.
>

Please, can you explain what is wrong on using of shm_mq? It works pretty
well now.

There can be a contra argument, why don't use shm_mq, because it does data
exchange between processes and this patch introduce new pattern for same
thing.

Regards

Pavel


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] LW_SHARED_MASK macro

2015-09-17 Thread Andres Freund
Hi,


On 2015-09-17 14:35:20 +0300, Alexander Korotkov wrote:
> while exploring lwlock.c I found following macro to be strange.
> 
> #define LW_SHARED_MASK ((uint32)(1 << 23))
> 
> This is macro is used to extract number of shared locks from state.
> 
> ereport(LOG,
> (errhidestmt(true),
> errhidecontext(true),
> errmsg("%d: %s(%s): excl %u shared %u haswaiters %u waiters %u rOK %d",
> MyProcPid,
> where, MainLWLockNames[id],
> !!(state & LW_VAL_EXCLUSIVE),
> state & LW_SHARED_MASK,
> !!(state & LW_FLAG_HAS_WAITERS),
> pg_atomic_read_u32(>nwaiters),
> !!(state & LW_FLAG_RELEASE_OK;
> 
> 
> Should it be ((uint32) ((1 << 24)-1)) instead?

Argh, that's somewhat embarassing. You're absolutely right. Luckily it's
only used for LOCK_DEBUG, but still...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsvector work with citext

2015-09-17 Thread Teodor Sigaev

Oversight, I suppose. is_text_type() was introduced by

commit 635aaab278afc1af972a4b6a55ff632ab763505d
Author: Tom Lane 
Date:   Tue Apr 8 18:20:29 2008 +

 Fix tsvector_update_trigger() to be domain-friendly: it needs to allow all
 the columns it works with to be domains over the expected type, not just
 exactly the expected type.  In passing, fix ts_stat() the same way.
 Per report from Markus Wollny.


I'm wrong, in this commit it was just renamed. It was originally coded by me. 
But it's still oversight.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Shulgin, Oleksandr
On Thu, Sep 17, 2015 at 12:06 PM, Pavel Stehule 
wrote:

>
>> That won't work really well with something like I use to do when testing
>> this patch, namely:
>>
>> postgres=# select pid, array(select pg_cmdstatus(pid, 1, 10)) from
>> pg_stat_activity where pid<>pg_backend_pid() \watch 1
>>
>> while also running pgbench with -C option (to create new connection for
>> every transaction).  When a targeted backend exits before it can handle the
>> signal, the receiving process keeps waiting forever.
>>
>
> no - every timeout you have to check, if targeted backend is living still,
> if not you will do cancel. It is 100% safe.
>

But then you need to make this internal timeout rather short, not 1s as
originally suggested.

The statement_timeout in this case will stop the whole select, not just
>> individual function call.  Unless you wrap it to set statement_timeout and
>> catch QUERY_CANCELED in plpgsql, but then you won't be able to ^C the whole
>> select.  The ability to set a (short) timeout for the function itself
>> proves to be a really useful feature to me.
>>
>
> you cannot to handle QUERY_CANCELED in plpgsql.
>

Well, you can but its not that useful of course:

=# create or replace function test_query_cancel() returns void language
plpgsql as $$ begin
 perform pg_sleep(1);
 exception when query_canceled then raise notice 'cancel';
end; $$;
CREATE FUNCTION
=# set statement_timeout to '100ms';
SET
=# select test_query_cancel();
NOTICE:  cancel
 test_query_cancel
---

(1 row)
=# select test_query_cancel() from generate_series(1, 100) x;
NOTICE:  cancel
^CCancel request sent
NOTICE:  cancel
^CCancel request sent

Now you cannot cancel this query unless you do pg_terminate_backend() or
equivalent.

There is need some internal timeout - but this timeout should not be
> visible - any new GUC increase PostgreSQL complexity - and there is not a
> reason why do it.
>

But the GUC was added for the timeout on the sending side, not the
receiving one.  There is no "one value fits all" for this, but you would
still want to make the effects of this as limited as possible.

--
Alex


[HACKERS] Postgres releases scheduled for first week of October

2015-09-17 Thread Tom Lane
It's been over three months since our last set of back-branch releases.
It also seems to be about time for a beta release of 9.5.  The Postgres
release team has accordingly decided to wrap 9.5beta1 as well as all
active branches on Monday Oct 5, for public announcement Thursday Oct 8.

Please note this will be the last release for the 9.0.x branch, since
its published EOL date is September 2015.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Shulgin, Oleksandr
On Thu, Sep 17, 2015 at 5:16 PM, Pavel Stehule 
wrote:

> 2015-09-17 16:46 GMT+02:00 Robert Haas :
>
>>
>> Second, using a shm_mq manipulates the state of the process latch.  I
>> don't think you can make the assumption that it's safe to reset the
>> process latch at any and every place where we check for interrupts.
>> For example, suppose the process is already using a shm_mq and the
>> CHECK_FOR_INTERRUPTS() call inside that code then discovers that
>> somebody has activated this mechanism and you now go try to send and
>> receive from a new shm_mq.  But even if that and every other
>> CHECK_FOR_INTERRUPTS() in the code can tolerate a process latch reset
>> today, it's a new coding rule that could easily trip people up in the
>> future.
>>
>
> It is valid, and probably most important. But if we introduce own
> mechanism, we will play with process latch too (although we can use LWlocks)
>

Careful manipulation of the process latch is a valid point.  Probably we
were way too optimistic about the limits we can hit with this approach. :-(

But if we make the sender backend create the DSM with the response payload,
it suddenly becomes really unclear at which point and who should make the
final detach of that DSM.  We're getting back to the problem of
synchronization between these processes all over again.

--
Alex


Re: [HACKERS] numbering plan nodes

2015-09-17 Thread Tom Lane
Robert Haas  writes:
> To meet these needs, what I propose to do is have
> set_plan_references() assign an integer ID to every plan node that is
> unique across the entire plan tree.

OK.

> It would be nice if we could assign the integer IDs with no gaps, and
> with the IDs of a node's children immediately following that of their
> parent.

I do not think this should be a goal, either explicitly or implicitly.
For one thing, it will be very much in the eye of individual beholders
which nodes are children-that-should-be-consecutive.  What about
initplans, subplans, grandchild nodes?  Different use cases could easily
have different opinions about what to do with those.

> The advantage of this is that if you want to have a data
> structure for every node in the tree passed to some worker - like a
> struct Instrumentation in dynamic shared memory - you can just create
> an array and index it by (node ID) - (node ID of uppermost child
> passed to worker), and every slot will be in use, so no memory is
> wasted and no lookup table is needed.

I think you could just waste the memory, or more likely you could cope
with one level of indirection so that you only waste a pointer per
node (ie, array of pointers indexed by node ID, only fill the ones that
are relevant).

> My main concern with this design is how future-proof it is.  I know
> Tom is working on ripping the planner apart and putting it back
> together again to allow the upper levels of the planner to use paths,
> and the discussion of the SS_finalize_plan() changes made some
> reference to possibly wanting to mess with set_plan_references().

I can't imagine I'd be doing anything that would break the simple case
of "give every node a distinct ID".  If you are building in weird
assumptions about traversal order, that might indeed be a problem.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] numbering plan nodes

2015-09-17 Thread Robert Haas
Thanks much for the quick reply.

On Thu, Sep 17, 2015 at 2:23 PM, Tom Lane  wrote:
>> It would be nice if we could assign the integer IDs with no gaps, and
>> with the IDs of a node's children immediately following that of their
>> parent.
>
> I do not think this should be a goal, either explicitly or implicitly.
> For one thing, it will be very much in the eye of individual beholders
> which nodes are children-that-should-be-consecutive.  What about
> initplans, subplans, grandchild nodes?  Different use cases could easily
> have different opinions about what to do with those.

I was aiming for: the IDs of the descendents of any given node are
consecutive and have IDs which follow that's node ID, in any old order
you like.

>> The advantage of this is that if you want to have a data
>> structure for every node in the tree passed to some worker - like a
>> struct Instrumentation in dynamic shared memory - you can just create
>> an array and index it by (node ID) - (node ID of uppermost child
>> passed to worker), and every slot will be in use, so no memory is
>> wasted and no lookup table is needed.
>
> I think you could just waste the memory, or more likely you could cope
> with one level of indirection so that you only waste a pointer per
> node (ie, array of pointers indexed by node ID, only fill the ones that
> are relevant).

Hmm, I guess that wouldn't be too bad.  A straight join between N
relations will have N base relations and N - 1 joins and maybe a few
sorts or similar.  You could get a plan tree with a lot more nodes if
your query involves inheritance hierarchies - e.g. join 4 tables each
of which has 1000 children.  But even in that case you're only talking
about 32kB of memory for the indirect table, and that's not going to
be very high on your list of problems in that case.

>> My main concern with this design is how future-proof it is.  I know
>> Tom is working on ripping the planner apart and putting it back
>> together again to allow the upper levels of the planner to use paths,
>> and the discussion of the SS_finalize_plan() changes made some
>> reference to possibly wanting to mess with set_plan_references().
>
> I can't imagine I'd be doing anything that would break the simple case
> of "give every node a distinct ID".  If you are building in weird
> assumptions about traversal order, that might indeed be a problem.

Good to know, thanks.  With the design you proposal above, we can make
this insensitive to traversal order.  The only thing that would cause
trouble is if you somehow ended up with massive gaps in the numbering
sequence - e.g. if the final plan had 6 nodes, but the IDs were all 5
digit numbers, you'd waste a silly amount of memory relative to the
size of the plan.  But a few gaps (e.g. for removed SubqueryScan
nodes) won't be a problem.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Proposal for \rotate in psql

2015-09-17 Thread Pavel Stehule
Hi

2015-09-16 11:35 GMT+02:00 Daniel Verite :

>   Hi,
>
> This is the 2nd iteration of this patch, for comments and review.
>
>
>
>
my comments:

1. I don't understand why you are use two methods for sorting columns
(qsort, and query with ORDER BY)

2. Data column are not well aligned - numbers are aligned as text

3. When data are multiattribute - then merging together with space
separator is not practical

  * important information is lost
  * same transformation can be done as expression, so this feature is
useless

Is possible to use one cell per attribute (don't do merge)?

DATA QUERY: SELECT dim1, dim2, sum(x), avg(x) FROM .. GROUP BY dim1, dim2

and result header of rotate can be

DIM1   | dim2_val1/sum | dim2_val1/avg | dim2_val2/sum | dim2_val2/avg | ...


Re: [HACKERS] Improving test coverage of extensions with pg_dump

2015-09-17 Thread Michael Paquier
On Thu, Sep 17, 2015 at 9:47 AM, Michael Paquier wrote:
> COPY or INSERT include the column list in dumps, so that's not really
> an issue here, what is potentially a problem (looking at that freshly)
> is --inserts with data-only dumps though. So yes we had better fix it
> and perhaps consider a backpatch...

When adding a column to a parent table, attnum gets confused on the
child table... Take this example:
=# create table aa (a int, b int);
CREATE TABLE
=# create table bb (c int) inherits (aa);
CREATE TABLE
=# alter table aa add column d text;
ALTER TABLE
=# select relname, attname, attnum from pg_attribute join pg_class on
(attrelid = oid) where attrelid in( 'bb'::regclass, 'aa'::regclass)
and attnum > 0;
 relname | attname | attnum
-+-+
 aa  | d   |  3
 aa  | b   |  2
 aa  | a   |  1
 bb  | d   |  4
 bb  | c   |  3
 bb  | b   |  2
 bb  | a   |  1
(7 rows)

When this is dumped and restored on another database the ordering gets
different, c and d are switched for child relation bb here:
 relname | attname | attnum
-+-+
 aa  | d   |  3
 aa  | b   |  2
 aa  | a   |  1
 bb  | c   |  4
 bb  | d   |  3
 bb  | b   |  2
 bb  | a   |  1
(7 rows)

pg_dump relies on attnum to define the column ordering, so one
possibility would be to do things more consistently at backend level.
Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] numbering plan nodes

2015-09-17 Thread Robert Haas
Hi,

One of the problems which the remaining parallel query patches need to
solve is to correlate Plan or PlanState nodes across multiple
backends.  This need arises in a couple of cases.  First, if a group
of workers are all executing the same plan nodes, and es_instrument !=
0, then we'll accumulate instrumentation data in each worker, but we
need to collect all of the instrumentation data from each individual
worker and add the totals from each counter in each worker to the
master's totals, so that what finally gets reported includes the
effort spent in all workers.  And of course the instrumentation data
for any given node in the worker needs to be added to the
*corresponding node* in the master, not any random node.

Second, for nodes that are actually parallel aware, like the proposed
Partial Seq Scan node, there's bound to be some shared state.  In the
case of a Partial Seq Scan, for example, we need to keep track of
which blocks are yet to be scanned.  Amit's patch is currently silly
about this: it assumes there will be only one Partial Seq Scan in a
plan, which makes it easy to find the relevant state.  But that's not
an assumption we want to be stuck with.  Here again, if there are two
or more Partial Seq Scans in the plan passed to every worker, perhaps
with an Append node on top of them, then we need a way to match them
up across all the backends, so that each group of corresponding nodes
is looking at the same bit of shared state.

To meet these needs, what I propose to do is have
set_plan_references() assign an integer ID to every plan node that is
unique across the entire plan tree.  Then, when we ship part of the
plan tree off to a worker to be executed, the integer IDs will also
get copied (by copyfuncs.c support yet to be added, but it should be
pretty boilerplate ... I think), so that whatever plan node #42 is in
the parallel group leader will also be plan node #42 in the worker.
When I started out, I had the idea of trying to number the PlanState
nodes rather than the Plan nodes, and count on the fact the master and
the worker would traverse the query tree in the same order; since the
worker had only a subtree of the whole plan, it's numbering would be
offset from the master's, but the idea was that you could fix this by
adding an offset.  This turns out to not to work, at least not the way
I tried to do it, because ExecInitNode() visits all initPlans
everywhere in the tree before recursing over the main tree, so you
only get identical numbering everywhere if you start with the same
exact tree.  In any case, relying on two traversals in separate
processes to visit everything in the same order seems a bit fragile;
putting the ID into the plan node - which is what will actually be
passed down to the worker - seems much more robust.

It would be nice if we could assign the integer IDs with no gaps, and
with the IDs of a node's children immediately following that of their
parent.  The advantage of this is that if you want to have a data
structure for every node in the tree passed to some worker - like a
struct Instrumentation in dynamic shared memory - you can just create
an array and index it by (node ID) - (node ID of uppermost child
passed to worker), and every slot will be in use, so no memory is
wasted and no lookup table is needed.  Assigning the IDs in
set_plan_references() ... almost succeeds at this.  The removal of
SubqueryScan nodes as part of that process could create gaps in the
numbering sequence, but probably few enough that we could just ignore
the problem and waste an entry in whatever arrays we create whenever
that optimization applies.

My main concern with this design is how future-proof it is.  I know
Tom is working on ripping the planner apart and putting it back
together again to allow the upper levels of the planner to use paths,
and the discussion of the SS_finalize_plan() changes made some
reference to possibly wanting to mess with set_plan_references().  I'm
not sure if any of those changes would disturb what I'm proposing
here.

Thoughts?  Better ideas?  Concept patch attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 62355aa..4b4ddec 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -112,6 +112,7 @@ CopyPlanFields(const Plan *from, Plan *newnode)
 	COPY_SCALAR_FIELD(total_cost);
 	COPY_SCALAR_FIELD(plan_rows);
 	COPY_SCALAR_FIELD(plan_width);
+	COPY_SCALAR_FIELD(plan_node_id);
 	COPY_NODE_FIELD(targetlist);
 	COPY_NODE_FIELD(qual);
 	COPY_NODE_FIELD(lefttree);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e1b49d5..d7d00c0 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -269,6 +269,7 @@ _outPlanInfo(StringInfo str, const Plan *node)
 	WRITE_FLOAT_FIELD(total_cost, "%.2f");
 	WRITE_FLOAT_FIELD(plan_rows, "%.0f");
 	

Re: [HACKERS] pg_resetxlog sentences

2015-09-17 Thread Euler Taveira

On 17-09-2015 00:25, Fujii Masao wrote:

One relevant question is; why doesn't pg_controldata report newestCommitTs?

I thought about it while looking at the code but forgot to ask. AFAICS 
it is an oversight. See attached patch.



--
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
>From 5716c69d7287d97c6eacb13c736c939d0e9223e4 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Thu, 17 Sep 2015 13:48:25 -0300
Subject: [PATCH] Standardize sentences.

Also, add newest CommitTs to pg_controldata. Oversight spotted by Fujii
Masao.
---
 src/bin/pg_controldata/pg_controldata.c | 2 ++
 src/bin/pg_resetxlog/pg_resetxlog.c | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 704f72d..32e1d81 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -273,6 +273,8 @@ main(int argc, char *argv[])
 		   ControlFile.checkPointCopy.oldestMultiDB);
 	printf(_("Latest checkpoint's oldestCommitTs:   %u\n"),
 		   ControlFile.checkPointCopy.oldestCommitTs);
+	printf(_("Latest checkpoint's newestCommitTs:   %u\n"),
+		   ControlFile.checkPointCopy.newestCommitTs);
 	printf(_("Time of latest checkpoint:%s\n"),
 		   ckpttime_str);
 	printf(_("Fake LSN counter for unlogged rels:   %X/%X\n"),
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index c8c1ac3..d7ac2ba 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -665,9 +665,9 @@ PrintControlValues(bool guessed)
 		   ControlFile.checkPointCopy.oldestMulti);
 	printf(_("Latest checkpoint's oldestMulti's DB: %u\n"),
 		   ControlFile.checkPointCopy.oldestMultiDB);
-	printf(_("Latest checkpoint's oldest CommitTs:  %u\n"),
+	printf(_("Latest checkpoint's oldestCommitTs:   %u\n"),
 		   ControlFile.checkPointCopy.oldestCommitTs);
-	printf(_("Latest checkpoint's newest CommitTs:  %u\n"),
+	printf(_("Latest checkpoint's newestCommitTs:   %u\n"),
 		   ControlFile.checkPointCopy.newestCommitTs);
 	printf(_("Maximum data alignment:   %u\n"),
 		   ControlFile.maxAlign);
-- 
2.1.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Robert Haas
On Thu, Sep 17, 2015 at 11:16 AM, Pavel Stehule  wrote:
> Is there some risk if we take too much large DSM segment? And what is max
> size of DSM segment? When we use shm_mq, we don't need to solve limits.

I can't really think you are going to have a problem.  How much data
do you really intend to send back?  Surely it's going to be <100kB.
If you think it's not a problem to have a running query stop and send
a gigabyte of data someplace anytime somebody asks, well, I don't
think I agree.

>> Also, if there are any bugs in the way the shm_mq is being used,
>> they're likely to be quite rare and hard to find, because the vast
>> majority of messages will probably be short enough to be sent in a
>> single chunk, so whatever bugs may exist when the processes play
>> ping-ping are unlikely to occur in practice except in unusual cases
>> where the message being returned is very long.
>
> This is true for any functionality based on shm_mq - parallel seq scan,

Parallel sequential scan is likely to put a lot more data through a
shm_mq than you would for this.

>> Second, using a shm_mq manipulates the state of the process latch.  I
>> don't think you can make the assumption that it's safe to reset the
>> process latch at any and every place where we check for interrupts.
>> For example, suppose the process is already using a shm_mq and the
>> CHECK_FOR_INTERRUPTS() call inside that code then discovers that
>> somebody has activated this mechanism and you now go try to send and
>> receive from a new shm_mq.  But even if that and every other
>> CHECK_FOR_INTERRUPTS() in the code can tolerate a process latch reset
>> today, it's a new coding rule that could easily trip people up in the
>> future.
>
> It is valid, and probably most important. But if we introduce own mechanism,
> we will play with process latch too (although we can use LWlocks)

With the design I proposed, there is zero need to touch the process
latch, which is good, because I'm pretty sure that is going to be a
problem.  I don't think there is any need to use LWLocks here either.
When you get a request for data, you can just publish a DSM segment
with the data and that's it.  Why do you need anything more?  You
could set the requestor's latch if it's convenient; that wouldn't be a
problem.  But the process supplying the data can't end up in a
different state than it was before supplying that data, or stuff WILL
break.

>> Using a shm_mq is appropriate when the amount of data that needs to be
>> transmitted might be very large - too large to just allocate a buffer
>> for the whole thing - or when the amount of data can't be predicted
>> before memory is allocated.  But there is obviously no rule that a
>> shm_mq should be used any time we have "data exchange between
>> processes"; we have lots of shared-memory based IPC already and have
>> for many years, and shm_mq is newer than the vast majority of that
>> code.
>
> I am little bit disappointed - I hoped so shm_mq can be used as generic
> interprocess mechanism - that will ensure all corner cases for work with
> shared memory. I understand to shm_mq is new, and nobody used it, but this
> risk is better than invent wheels again and again.

shm_mq is useful, but if you insist on using a complicated tool when a
simple one is plenty sufficient, you may not get the results you're
hoping for.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] numbering plan nodes

2015-09-17 Thread Tom Lane
Robert Haas  writes:
> On Thu, Sep 17, 2015 at 2:23 PM, Tom Lane  wrote:
>> I can't imagine I'd be doing anything that would break the simple case
>> of "give every node a distinct ID".  If you are building in weird
>> assumptions about traversal order, that might indeed be a problem.

> Good to know, thanks.  With the design you proposal above, we can make
> this insensitive to traversal order.  The only thing that would cause
> trouble is if you somehow ended up with massive gaps in the numbering
> sequence - e.g. if the final plan had 6 nodes, but the IDs were all 5
> digit numbers, you'd waste a silly amount of memory relative to the
> size of the plan.  But a few gaps (e.g. for removed SubqueryScan
> nodes) won't be a problem.

Hm ... if you quit worrying about the order-of-assignment, maybe you
could prevent gaps from removed SubqueryScan nodes by not giving them
IDs until after that decision is made?  Although it may not be worth
any extra trouble.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some pg_rewind usability issues

2015-09-17 Thread Michael Paquier
On Wed, Sep 16, 2015 at 9:46 PM, Michael Paquier wrote:
> That's something that we discussed in this CF's patch to ease the
> handling of timeline switches when rewinding a node, I wouldn't have
> any objection to get that backpatched to 9.5 though (the
> DB_SHUTDOWNED_IN_RECOVERY part I mean).

Independent patch attached FWIW (I'm sure you had the same though).
-- 
Michael
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index 032301f..1e38122 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -374,10 +374,11 @@ sanityChecks(void)
 	/*
 	 * Target cluster better not be running. This doesn't guard against
 	 * someone starting the cluster concurrently. Also, this is probably more
-	 * strict than necessary; it's OK if the master was not shut down cleanly,
+	 * strict than necessary; it's OK if the node was not shut down cleanly,
 	 * as long as it isn't running at the moment.
 	 */
-	if (ControlFile_target.state != DB_SHUTDOWNED)
+	if (ControlFile_target.state != DB_SHUTDOWNED &&
+		ControlFile_target.state != DB_SHUTDOWNED_IN_RECOVERY)
 		pg_fatal("target server must be shut down cleanly\n");
 
 	/*
@@ -385,7 +386,9 @@ sanityChecks(void)
 	 * server is shut down. There isn't any very strong reason for this
 	 * limitation, but better safe than sorry.
 	 */
-	if (datadir_source && ControlFile_source.state != DB_SHUTDOWNED)
+	if (datadir_source &&
+		ControlFile_source.state != DB_SHUTDOWNED &&
+		ControlFile_source.state != DB_SHUTDOWNED_IN_RECOVERY)
 		pg_fatal("source data directory must be shut down cleanly\n");
 }
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] numbering plan nodes

2015-09-17 Thread Simon Riggs
On 17 September 2015 at 13:14, Robert Haas  wrote:


> My main concern with this design is how future-proof it is.
>

Passing array offsets sounds brittle to me.

It would screw up any attempts to post-process the plans. Later
enhancements seem certain to break that scheme.

It also assumes that all actors have access to a single memory structure
that describes everything.

Hopefully we are working on a parallel query system that will work
intranode as well as across nodes, so access to memory should not be
assumed.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] tsvector work with citext

2015-09-17 Thread David E. Wheeler
On Sep 17, 2015, at 6:17 AM, Teodor Sigaev  wrote:

> I'm wrong, in this commit it was just renamed. It was originally coded by me. 
> But it's still oversight.

Fixable?

Thanks,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] numbering plan nodes

2015-09-17 Thread Kouhei Kaigai
> One of the problems which the remaining parallel query patches need to
> solve is to correlate Plan or PlanState nodes across multiple
> backends.  This need arises in a couple of cases.  First, if a group
> of workers are all executing the same plan nodes, and es_instrument !=
> 0, then we'll accumulate instrumentation data in each worker, but we
> need to collect all of the instrumentation data from each individual
> worker and add the totals from each counter in each worker to the
> master's totals, so that what finally gets reported includes the
> effort spent in all workers.  And of course the instrumentation data
> for any given node in the worker needs to be added to the
> *corresponding node* in the master, not any random node.
> 
> Second, for nodes that are actually parallel aware, like the proposed
> Partial Seq Scan node, there's bound to be some shared state.  In the
> case of a Partial Seq Scan, for example, we need to keep track of
> which blocks are yet to be scanned.  Amit's patch is currently silly
> about this: it assumes there will be only one Partial Seq Scan in a
> plan, which makes it easy to find the relevant state.  But that's not
> an assumption we want to be stuck with.  Here again, if there are two
> or more Partial Seq Scans in the plan passed to every worker, perhaps
> with an Append node on top of them, then we need a way to match them
> up across all the backends, so that each group of corresponding nodes
> is looking at the same bit of shared state.
>
> To meet these needs, what I propose to do is have
> set_plan_references() assign an integer ID to every plan node that is
> unique across the entire plan tree.  Then, when we ship part of the
> plan tree off to a worker to be executed, the integer IDs will also
> get copied (by copyfuncs.c support yet to be added, but it should be
> pretty boilerplate ... I think), so that whatever plan node #42 is in
> the parallel group leader will also be plan node #42 in the worker.
> When I started out, I had the idea of trying to number the PlanState
> nodes rather than the Plan nodes, and count on the fact the master and
> the worker would traverse the query tree in the same order; since the
> worker had only a subtree of the whole plan, it's numbering would be
> offset from the master's, but the idea was that you could fix this by
> adding an offset.  This turns out to not to work, at least not the way
> I tried to do it, because ExecInitNode() visits all initPlans
> everywhere in the tree before recursing over the main tree, so you
> only get identical numbering everywhere if you start with the same
> exact tree.  In any case, relying on two traversals in separate
> processes to visit everything in the same order seems a bit fragile;
> putting the ID into the plan node - which is what will actually be
> passed down to the worker - seems much more robust.
> 
> It would be nice if we could assign the integer IDs with no gaps, and
> with the IDs of a node's children immediately following that of their
> parent.  The advantage of this is that if you want to have a data
> structure for every node in the tree passed to some worker - like a
> struct Instrumentation in dynamic shared memory - you can just create
> an array and index it by (node ID) - (node ID of uppermost child
> passed to worker), and every slot will be in use, so no memory is
> wasted and no lookup table is needed.
>
I entirely agree with the idea of plan-node identifier, however,
uncertain whether the node-id shall represent physical location on
the dynamic shared memory segment, because
(1) Relatively smaller number of node type needs shared state,
thus most of array items are empty.
(2) Extension that tries to modify plan-tree using planner_hook
may need to adjust node-id also.

Even though shm_toc_lookup() has to walk on the toc entries to find
out the node-id, it happens at once on beginning of the executor at
background worker side. I don't think it makes a significant problem.

Thanks,

> Assigning the IDs in
> set_plan_references() ... almost succeeds at this.  The removal of
> SubqueryScan nodes as part of that process could create gaps in the
> numbering sequence, but probably few enough that we could just ignore
> the problem and waste an entry in whatever arrays we create whenever
> that optimization applies.
> 
> My main concern with this design is how future-proof it is.  I know
> Tom is working on ripping the planner apart and putting it back
> together again to allow the upper levels of the planner to use paths,
> and the discussion of the SS_finalize_plan() changes made some
> reference to possibly wanting to mess with set_plan_references().  I'm
> not sure if any of those changes would disturb what I'm proposing
> here.
> 
> Thoughts?  Better ideas?  Concept patch attached.
>

--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make 

Re: [HACKERS] PATCH: index-only scans with partial indexes

2015-09-17 Thread Kyotaro HORIGUCHI
Hello,

At Thu, 17 Sep 2015 17:40:27 +0200, Tomas Vondra  
wrote in <55fadeeb.4000...@2ndquadrant.com>
> Yes, this seems sane. I've been poking at this a bit too, and I came
> to the same plan in general, except that I think it's better to build
> list of clauses that are *not* implied by the index, because that's
> what we need both in cost_index and check_index_only.

I intended to isolate IndexOptInfo from belonging RelOptInfo but
the exclusion list also bonds them tightly, and one IndexOptInfo
belongs to only one RelOptInfo so no need to isolate. So
not-implied-restrictclauses in IndexOptInfo would be preferable.

> It also seems to me that this change (arguably a bug fix) should
> pretty much make the original patch irrelevant, because
> check_index_only can simply walk over the new list.

Yeah. This seems to be a bug irrelevant to your index-only-scan
ptch.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix an O(N^2) problem in foreign key references.

2015-09-17 Thread Jan Wieck

On 09/15/2015 12:02 PM, Jan Wieck wrote:

On 09/15/2015 11:54 AM, Tom Lane wrote:

Jan Wieck  writes:

Would it be appropriate to use (Un)RegisterXactCallback() in case of
detecting excessive sequential scanning of that cache and remove all
invalid entries from it at that time?



Another idea is that it's not the size of the cache that's the problem,
it's the cost of finding the entries that need to be invalidated.
So we could also consider adding list links that chain only the entries
that are currently marked valid.  If the list gets too long, we could mark
them all invalid and thereby reset the list to nil.  This doesn't do
anything for the cache's space consumption, but that wasn't what you were
worried about anyway was it?


That sounds like a workable solution to this edge case. I'll see how
that works.


Attached is a complete rework of the fix from scratch, based on Tom's 
suggestion.


The code now maintains a double linked list as suggested, but only uses 
it to mark all currently valid entries as invalid when hashvalue == 0. 
If a specific entry is invalidated it performs a hash lookup for maximum 
efficiency in that case.


This code does pass the regression tests with CLOBBER_CACHE_ALWAYS, does 
have the desired performance impact on restore of hundreds of thousands 
of foreign key constraints and does not show any negative impact on bulk 
loading of data with foreign keys.



Regards, Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 61edde9..eaec737 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -125,6 +125,8 @@ typedef struct RI_ConstraintInfo
  * PK) */
 	Oid			ff_eq_oprs[RI_MAX_NUMKEYS];		/* equality operators (FK =
  * FK) */
+	struct RI_ConstraintInfo *prev_valid;		/* Previous valid entry */
+	struct RI_ConstraintInfo *next_valid;		/* Next valid entry */
 } RI_ConstraintInfo;
 
 
@@ -185,6 +187,7 @@ typedef struct RI_CompareHashEntry
 static HTAB *ri_constraint_cache = NULL;
 static HTAB *ri_query_cache = NULL;
 static HTAB *ri_compare_cache = NULL;
+static RI_ConstraintInfo *ri_constraint_cache_valid_list = NULL;
 
 
 /* --
@@ -2924,6 +2927,19 @@ ri_LoadConstraintInfo(Oid constraintOid)
 
 	ReleaseSysCache(tup);
 
+	/*
+	 * At the time a cache invalidation message is processed there may be
+	 * active references to the cache. Because of this we never remove entries
+	 * from the cache, but only mark them invalid. For efficient processing
+	 * of invalidation messages below we keep a double linked list of
+	 * currently valid entries.
+	 */
+	if (ri_constraint_cache_valid_list != NULL)
+		ri_constraint_cache_valid_list->prev_valid = riinfo;
+	riinfo->prev_valid = NULL;
+	riinfo->next_valid = ri_constraint_cache_valid_list;
+	ri_constraint_cache_valid_list = riinfo;
+
 	riinfo->valid = true;
 
 	return riinfo;
@@ -2940,17 +2956,55 @@ ri_LoadConstraintInfo(Oid constraintOid)
 static void
 InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
 {
-	HASH_SEQ_STATUS status;
-	RI_ConstraintInfo *hentry;
+	RI_ConstraintInfo  *hentry;
+	RI_ConstraintInfo  *hnext;
+	boolfound;
 
 	Assert(ri_constraint_cache != NULL);
 
-	hash_seq_init(, ri_constraint_cache);
-	while ((hentry = (RI_ConstraintInfo *) hash_seq_search()) != NULL)
+	if (hashvalue == 0)
 	{
-		if (hentry->valid &&
-			(hashvalue == 0 || hentry->oidHashValue == hashvalue))
+		/*
+		 * Set all valid entries in the cache to invalid and clear the
+		 * list of valid entries.
+		 */
+		hnext = ri_constraint_cache_valid_list;
+		while (hnext)
+		{
+			hentry = hnext;
+			hnext = hnext->next_valid;
+
+			Assert(hentry->valid);
+
+			hentry->prev_valid = NULL;
+			hentry->next_valid = NULL;
 			hentry->valid = false;
+		}
+		ri_constraint_cache_valid_list = NULL;
+	}
+	else
+	{
+		/*
+		 * Search for the specified entry and set that one invalid
+		 * and remove it from the list of valid entries.
+		 */
+		hentry = (RI_ConstraintInfo *) hash_search(ri_constraint_cache,
+   (void *) ,
+   HASH_FIND, );
+		if (found && hentry->valid)
+		{
+			Assert(hentry->oidHashValue == hashvalue);
+
+			if (hentry->prev_valid != NULL)
+hentry->prev_valid->next_valid = hentry->next_valid;
+			else
+ri_constraint_cache_valid_list = hentry->next_valid;
+			if (hentry->next_valid != NULL)
+hentry->next_valid->prev_valid = hentry->prev_valid;
+			hentry->prev_valid = NULL;
+			hentry->next_valid = NULL;
+			hentry->valid = false;
+		}
 	}
 }
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cache type info in json_agg and friends

2015-09-17 Thread Andrew Dunstan



On 09/14/2015 03:41 PM, Alvaro Herrera wrote:

Andrew Dunstan wrote:


Currently, json_agg, jsonb_agg, json_object_agg and jsonb_object_agg do type
classification on their arguments on each call to the transition function.
This is quite unnecessary, as the argument types won't change. This patch
remedies the defect by caching the necessary values in the aggregate state
object.

Seems a reasonable idea to me.  This is 9.6 only, right?



I think we can reasonably backpatch it to 9.5, which is where the jsonb 
functions were actually introduced. It's not at all user visible, and 
we're still in alpha. Seem fair?


I have addressed your stylistic concerns, but I'll leave the fmgr_info 
question Teodor raised for another day. Before I do anything more than 
this I want to do some profiling to find out where the time is actually 
going for various workloads.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] numbering plan nodes

2015-09-17 Thread Robert Haas
On Thu, Sep 17, 2015 at 4:22 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Sep 17, 2015 at 2:23 PM, Tom Lane  wrote:
>>> I can't imagine I'd be doing anything that would break the simple case
>>> of "give every node a distinct ID".  If you are building in weird
>>> assumptions about traversal order, that might indeed be a problem.
>
>> Good to know, thanks.  With the design you proposal above, we can make
>> this insensitive to traversal order.  The only thing that would cause
>> trouble is if you somehow ended up with massive gaps in the numbering
>> sequence - e.g. if the final plan had 6 nodes, but the IDs were all 5
>> digit numbers, you'd waste a silly amount of memory relative to the
>> size of the plan.  But a few gaps (e.g. for removed SubqueryScan
>> nodes) won't be a problem.
>
> Hm ... if you quit worrying about the order-of-assignment, maybe you
> could prevent gaps from removed SubqueryScan nodes by not giving them
> IDs until after that decision is made?

Hmm, that might work.  I'll try it.

> Although it may not be worth
> any extra trouble.

Right.  If it doesn't turn out to be easy, I won't worry about it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] numbering plan nodes

2015-09-17 Thread Robert Haas
On Thu, Sep 17, 2015 at 5:19 PM, Simon Riggs  wrote:
> Passing array offsets sounds brittle to me.

How would this case be any different from other places in the planner
where we already do exactly that?  For example, RTIs.

> It would screw up any attempts to post-process the plans. Later enhancements
> seem certain to break that scheme.

I'm proposing to this very late in plan generation precisely for this
reason.  I think it would be quite difficult to post-process the plans
after the planner is completely through with them.  I've never heard
of anyone wanting to do that, but even if someone did, they'd still be
fine as long as they only deleted nodes or moved them around in the
tree.  If someone wanted to do surgery on a plan to create new nodes,
they would need to set the ID field properly, but that's no harder
(and probably for the most part quite a bit easier) than setting all
the other fields in the Plan correctly.  So I'm not seeing the
problem.

> It also assumes that all actors have access to a single memory structure
> that describes everything.

Our planner assumes that in general.  How is this case different?

> Hopefully we are working on a parallel query system that will work intranode
> as well as across nodes, so access to memory should not be assumed.

Currently, the only method we have for accessing data on another node
is FDWs.  When you access data via an FDW, there is both a local plan
(constructed by the local PostgreSQL) and perhaps a remote plan
(depending on what the FDW is accessing).  The local plan is
accessible via local memory, and the remote plan doesn't need to be.
What I'm talking about here doesn't change any of that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] creating extension including dependencies

2015-09-17 Thread Petr Jelinek

On 2015-09-17 17:31, Jeff Janes wrote:


If I fail to specify CASCADE and get an ERROR, I think there should be a
HINT which suggests the use of CASCADE.


create extension earthdistance ;
ERROR:  required extension "cube" is not installed

(no hint)

There is a HINT on the reverse operation:
drop extension cube;
ERROR:  cannot drop extension cube because other objects depend on it
DETAIL:  extension earthdistance depends on extension cube
HINT:  Use DROP ... CASCADE to drop the dependent objects too.


Makes sense.



Also, It would be nice to have psql tab complete the word CASCADE.



Hmm, it already does?


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extend pgbench expressions with functions

2015-09-17 Thread Kyotaro HORIGUCHI
Hello, 

At Tue, 15 Sep 2015 19:26:51 +0200 (CEST), Fabien COELHO  
wrote in 
> > 1.evalInt and evalDouble are mutually calling on single expr
> > node.
> 
> Indeed.
> 
> > It looks simple and seems to work but could easily broken
> > to fall into infinite call and finally (in a moment) exceeds
> > stack depth.
> 
> The recursion is on the tree-structure of the expression, which is
> evaluated depth-first. As the tree must be finite, and there is no
> such thing as recursive functions in the expression tree, this should
> never happen. If it happened, it would be a bug in the eval functions,
> probably a forgotten "case", and could be fixed there easily.

My description should have been obscure. Indeed the call tree is
finite for *sane* expression node. But it makes infinit call for
a value of expr->etype unknown by both evalDouble and
evalInt. This is what I intended to express by the words 'easily
broken'. AFAICS most of switches for nodes in the core would
stops and emit the message like following,

| default:
|   elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));

So I think the following code would do well.

| evalDoubld()
| {
|if (IS_INT_EXPR_NODE(expr->etype))
|{
|   if (!evalInt(...))
|  return false;
|   *retval = (double) ival;
|   return true;
|} 
| 
|switch (expr->etype)
|{
|  case ENODE_DOUBLE_CONSTANT:
|  ...
|  default:
| elog(ERROR, "unrecognized expr type: %d", (int) expr->etype);

I suppose here's a big difference between SEGV after run through
the stack bottom and server's return with ERROR for the same
bug. But IS_INT_EXPR_NODE() would be redundant with the
switch-case..

Do you think that I'm taking the difference too serious? However
this is needelss discussion if the following change is acceptable
for you.

> > I think it is better not to do so. For example, reunioning
> > evalDouble and evalInt into one evalulateExpr() which can return
> > both double and int result would do so. The interface would be
> > something like the follwings or others. Some individual
> > functions might be better to be split out.
> >
> >  static ExprRetType evaluateExpr(TState,CState,PgBenchExpr,
> >  int *iret, double *dret);
> 
> That would not work as simply as that: this signature does not say
> whether a double or an int is expected, and without this information
> you would not know what to do on some operators (the typing is
> explicit and descendant in the current implementation).
> Even if you add this information, or you use the returned type
> information to do the typing dynamically, that would mean testing the
> result types and implementing the necessary casts depending on this
> information for every function within the generic eval, which would
> result in repetitive and ugly code for every function and operator:
> for instance for '+', you would have to implement 4 cases explicitely,
> i+i, f+i, i+f, f+f as "int add", "cast left and double add", "cast
> right and double add", and finally "double add". Then you have other
> operators to consider... Although some sharing can be done, this would
> end in lengthy & ugly code.

Yes, I also think it is ugly as I anticipated:(.

By the way, the complexity comes from separating integer and
double. If there is no serios reason to separate them, handling
all values as double makes things far simpler.

Could you let me know the reason why it strictly separates
integer and double?

|   evaluateExpr(..., double *retval)
|   {
|  switch (expr->etype)
|  {
|case ENODE_CONSTANT:
|  *retval = expr->u.constant.val;
|  return true;
|case ENODE_OPERATOR
|  /* Nothing to worry about */
|case ENODE_FUNCTION:
|{
|switch (func)
|{ 
|   case PGBENCH_ABS/SQRT/DEBUG:
| /* Nothing to worry about, too */
|   /* case PGBENCH_DOUBLE is useless ever after */
|   case PGBENCH_INT:
| double arg;
| if (!evalulateExpr(..., );
|return false;
|*retval = floor(arg);

I don't see no problem in possible errors of floating point
calculations for this purpose. Is there any?
  
Anyway set meta command finally converts the result as integer
regardless of the real value so the following conversion does the
same as your current code.

in doCustom()
> else if (pg_strcasecmp(argv[0], "set") == 0)
> {
>   charres[64];
>   PgBenchExpr *expr = commands[st->state]->expr;
>   int64   result;
> 
-   if (!evalInt(thread, st, expr, ))
+   if (!evaluateExpr(thread, st, expr, ))
+   result = (int64)dresult;

This should make exprparse.y simpler.

> A possible alternative could be to explicitely and statically type all
> operations, adding the necessary casts explicitely, 

Re: [HACKERS] numbering plan nodes

2015-09-17 Thread Robert Haas
On Thu, Sep 17, 2015 at 9:01 PM, Kouhei Kaigai  wrote:
> I entirely agree with the idea of plan-node identifier, however,
> uncertain whether the node-id shall represent physical location on
> the dynamic shared memory segment, because
> (1) Relatively smaller number of node type needs shared state,
> thus most of array items are empty.
> (2) Extension that tries to modify plan-tree using planner_hook
> may need to adjust node-id also.
>
> Even though shm_toc_lookup() has to walk on the toc entries to find
> out the node-id, it happens at once on beginning of the executor at
> background worker side. I don't think it makes a significant problem.

Yes, I was thinking that what would make sense is to have each
parallel-aware node call shm_toc_insert() using its ID as the key.
Then, we also need Instrumentation nodes.  For those, I thought we
could use some fixed, high-numbered key, and Tom's idea.

Are there extensions that use planner_hook to do surgery on the plan
tree?  What do they do, exactly?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vacuumdb sentence

2015-09-17 Thread Amit Kapila
On Fri, Sep 18, 2015 at 8:30 AM, Euler Taveira  wrote:
>
> Hi,
>
> Is there a reason to quote "jobs" at this sentence?
>
> 190 fprintf(stderr, _("%s: number of parallel \"jobs\" must be at least
1\n"),
> progname);
>
> AFAICS "jobs" is neither an identifier nor an option to justify the
quotation. Also, another message a few lines above (correctly) does not use
quotes.
>

jobs is an option, refer vacuumdb docs[1].

[1] - http://www.postgresql.org/docs/devel/static/app-vacuumdb.html

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Pavel Stehule
2015-09-17 16:37 GMT+02:00 Pavel Stehule :

>
>
> 2015-09-17 16:06 GMT+02:00 Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de>:
>
>> On Thu, Sep 17, 2015 at 12:06 PM, Pavel Stehule 
>> wrote:
>>
>>>
 That won't work really well with something like I use to do when
 testing this patch, namely:

 postgres=# select pid, array(select pg_cmdstatus(pid, 1, 10)) from
 pg_stat_activity where pid<>pg_backend_pid() \watch 1

 while also running pgbench with -C option (to create new connection for
 every transaction).  When a targeted backend exits before it can handle the
 signal, the receiving process keeps waiting forever.

>>>
>>> no - every timeout you have to check, if targeted backend is living
>>> still, if not you will do cancel. It is 100% safe.
>>>
>>
>> But then you need to make this internal timeout rather short, not 1s as
>> originally suggested.
>>
>
> can be - 1 sec is max, maybe 100ms is optimum.
>
>>
>> The statement_timeout in this case will stop the whole select, not just
 individual function call.  Unless you wrap it to set statement_timeout and
 catch QUERY_CANCELED in plpgsql, but then you won't be able to ^C the whole
 select.  The ability to set a (short) timeout for the function itself
 proves to be a really useful feature to me.

>>>
>>> you cannot to handle QUERY_CANCELED in plpgsql.
>>>
>>
>> Well, you can but its not that useful of course:
>>
>
> hmm, some is wrong - I remember from some older plpgsql, so CANCEL message
> is not catchable. Maybe I have bad memory. I have to recheck it.
>

it is ok. I didn't remeber well this behave. You cannot to catch CANCEL
(and today ASSERT) in OTHER handler. It can be handled if it is explicitly
mentioned.

Regards

Pavel


Re: [HACKERS] vacuumdb sentence

2015-09-17 Thread Euler Taveira

On 18-09-2015 00:59, Amit Kapila wrote:

On Fri, Sep 18, 2015 at 8:30 AM, Euler Taveira > wrote:
 >
 > Hi,
 >
 > Is there a reason to quote "jobs" at this sentence?
 >
 > 190 fprintf(stderr, _("%s: number of parallel \"jobs\" must be at
least 1\n"),
 > progname);
 >
 > AFAICS "jobs" is neither an identifier nor an option to justify the
quotation. Also, another message a few lines above (correctly) does not
use quotes.
 >

jobs is an option, refer vacuumdb docs[1].

Yeah, I know. [Too sleepy to be writing emails...] What I want to say 
is: when we want to refer to an option, we usually add "option" after 
the quoted name (in this case, it won't make sense). I propose to remove 
the quotation because the way the sentence is written it seems we are 
referring to the task instead of the option.



--
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] vacuumdb sentence

2015-09-17 Thread Euler Taveira

Hi,

Is there a reason to quote "jobs" at this sentence?

190 fprintf(stderr, _("%s: number of parallel \"jobs\" must be at least 
1\n"),
progname); 



AFAICS "jobs" is neither an identifier nor an option to justify the 
quotation. Also, another message a few lines above (correctly) does not 
use quotes.



--
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-09-17 Thread Amit Kapila
On Thu, Sep 17, 2015 at 6:58 AM, Robert Haas  wrote:
>
> On Thu, Sep 3, 2015 at 6:21 AM, Amit Kapila 
wrote:
> > [ new patches ]
>
> +   pscan = shm_toc_lookup(node->ss.ps.toc,
PARALLEL_KEY_SCAN);
>
> This is total nonsense.  You can't hard-code the key that's used for
> the scan, because we need to be able to support more than one parallel
> operator beneath the same funnel.  For example:
>
> Append
> -> Partial Seq Scan
> -> Partial Seq Scan
>

Okay, but I think the same can be achieved with this as well.  Basic idea
is that each worker will work on one planned statement at a time and in
above case there will be two different planned statements and they will
store partial seq scan related information in two different loctions in
toc, although the key (PARALLEL_KEY_SCAN) would be same and I think this
will quite similar to what we are already doing for response queues.
The worker will work on one of those keys based on planned statement
which it chooses to execute.  I have explained this in somewhat more details
in one of my previous mails [1].

> Each partial sequential scan needs to have a *separate* key, which
> will need to be stored in either the Plan or the PlanState or both
> (not sure exactly).  Each partial seq scan needs to get assigned a
> unique key there in the master, probably starting from 0 or 100 or
> something and counting up, and then this code needs to extract that
> value and use it to look up the correct data for that scan.
>

In that case also, multiple workers can worker on same key, assuming
in your above example, multiple workers will be required to execute
each partial seq scan.  In this case we might need to see how to map
instrumentation information for a particular execution.

[1] -
http://www.postgresql.org/message-id/caa4ek1lnt6wqbcxksmj_qc+gahbuwykwsqn6ul3nwvq2sav...@mail.gmail.com


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-17 Thread Pavel Stehule
2015-09-17 22:13 GMT+02:00 Robert Haas :

> On Thu, Sep 17, 2015 at 11:16 AM, Pavel Stehule 
> wrote:
> > Is there some risk if we take too much large DSM segment? And what is max
> > size of DSM segment? When we use shm_mq, we don't need to solve limits.
>
> I can't really think you are going to have a problem.  How much data
> do you really intend to send back?  Surely it's going to be <100kB.
> If you think it's not a problem to have a running query stop and send
> a gigabyte of data someplace anytime somebody asks, well, I don't
> think I agree.
>
>
I afraid so <100kB is too optimistic. I know so GoodData environment is a
exception - it uses query generator, but I found few plans >1MB . It is
unusual probably due long identifiers too - used 63bytes long hash - and
some queries and models are strange. It does translation between analytic
multi dimensional business oriented query language and SQL. Back to topic.
With high probability we can calculate <10MB.


> >> Also, if there are any bugs in the way the shm_mq is being used,
> >> they're likely to be quite rare and hard to find, because the vast
> >> majority of messages will probably be short enough to be sent in a
> >> single chunk, so whatever bugs may exist when the processes play
> >> ping-ping are unlikely to occur in practice except in unusual cases
> >> where the message being returned is very long.
> >
> > This is true for any functionality based on shm_mq - parallel seq scan,
>
> Parallel sequential scan is likely to put a lot more data through a
> shm_mq than you would for this.
>
> >> Second, using a shm_mq manipulates the state of the process latch.  I
> >> don't think you can make the assumption that it's safe to reset the
> >> process latch at any and every place where we check for interrupts.
> >> For example, suppose the process is already using a shm_mq and the
> >> CHECK_FOR_INTERRUPTS() call inside that code then discovers that
> >> somebody has activated this mechanism and you now go try to send and
> >> receive from a new shm_mq.  But even if that and every other
> >> CHECK_FOR_INTERRUPTS() in the code can tolerate a process latch reset
> >> today, it's a new coding rule that could easily trip people up in the
> >> future.
> >
> > It is valid, and probably most important. But if we introduce own
> mechanism,
> > we will play with process latch too (although we can use LWlocks)
>
> With the design I proposed, there is zero need to touch the process
> latch, which is good, because I'm pretty sure that is going to be a
> problem.  I don't think there is any need to use LWLocks here either.
> When you get a request for data, you can just publish a DSM segment
> with the data and that's it.  Why do you need anything more?  You
> could set the requestor's latch if it's convenient; that wouldn't be a
> problem.  But the process supplying the data can't end up in a
> different state than it was before supplying that data, or stuff WILL
> break.
>

sure - but the same behave have to have shm_mq - if not, then only one can
be used for communication between process - that is little bit limiting.

>
> >> Using a shm_mq is appropriate when the amount of data that needs to be
> >> transmitted might be very large - too large to just allocate a buffer
> >> for the whole thing - or when the amount of data can't be predicted
> >> before memory is allocated.  But there is obviously no rule that a
> >> shm_mq should be used any time we have "data exchange between
> >> processes"; we have lots of shared-memory based IPC already and have
> >> for many years, and shm_mq is newer than the vast majority of that
> >> code.
> >
> > I am little bit disappointed - I hoped so shm_mq can be used as generic
> > interprocess mechanism - that will ensure all corner cases for work with
> > shared memory. I understand to shm_mq is new, and nobody used it, but
> this
> > risk is better than invent wheels again and again.
>
> shm_mq is useful, but if you insist on using a complicated tool when a
> simple one is plenty sufficient, you may not get the results you're
> hoping for.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] track_commit_timestamp and COMMIT PREPARED

2015-09-17 Thread Fujii Masao
On Sat, Sep 5, 2015 at 7:48 PM, Petr Jelinek  wrote:
> On 2015-09-02 16:14, Fujii Masao wrote:
>>
>> On Wed, Aug 5, 2015 at 2:16 AM, Robert Haas  wrote:
>>>
>>> On Mon, Aug 3, 2015 at 10:31 AM, Fujii Masao 
>>> wrote:

 track_commit_timestamp tracks COMMIT PREPARED as expected in standby
 server,
 but not in master server. Is this intentional? It should track COMMIT
 PREPARED
 even in master? Otherwise, we cannot use commit_timestamp feature to
 check
 the replication lag properly while we use 2PC.
>>>
>>>
>>> That sounds like it must be a bug.  I think you should add it to the
>>> open items list.
>>
>>
>
> Attached fixes this. It includes advancement of replication origin as well.
> I didn't feel like doing refactor of commit code this late in 9.5 cycle
> though, so I went with the code duplication + note in xact.c.

Agreed. We can refactor the code later if needed.

The patch looks good to me except the following minor points.

 * or not.  Normal path through RecordTransactionCommit() will be related
 * to a transaction commit XLog record, and so should pass "false" here.

The above source comment of TransactionTreeSetCommitTsData() seems to
need to be updated.

+ * Note: if you change this functions you should also look at
+ * RecordTransactionCommitPrepared in twophase.c.

Typo: "this functions" should be "this function"

+if (replorigin_sesssion_origin == InvalidRepOriginId ||

This is not the problem of the patch, but I started wondering what
"sesssion" in the variable name "replorigin_sesssion_origin" means.
Is it just a typo and it should be "session"? Or it's the abbreviation
of something?

Regards,

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers