Re: [GENERAL] Error with DefineSavepoint:Unexpected state STARTED

2017-06-19 Thread Michael Paquier
On Tue, Jun 20, 2017 at 11:12 AM, Computer Study
 wrote:
> The UI application is to add/remove some permissions through a listbox  for
> a certain user. For any add/remove, it will first delete all records for
> this certain user in the table, then insert the values chose from UI.
>
> In my code snippet, it first execute the DELETE, then execute multiple
> INSERT to insert selected options into the table. After all done, commit.
>
> The server is in remote and I don't have detailed logs at this moment. Will
> try to ask for that.
>
> My question:  for that error 'DefineSavepoint: unexpected state STARTED',
> not quite sure happens in which step? In the middle of DELETE or INSERT?

This error has been around for a while, and funnily you are the second
reporter of this issue within a couple of weeks:
https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1F6BE40D@G01JPEXMBYT05
I have been thinking a bit about how to fix that, and wondered about
using a new transaction status to track that, but that finished by
being rather intrusive..
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Streaming replication bandwith per table

2017-06-19 Thread Andreas Kretschmer
Am 20. Juni 2017 03:06:05 MESZ schrieb Peter Eisentraut 
:
>On 6/19/17 20:50, Maeldron T. wrote:
>> 
>
>Not easily.  You could play around with pg_xlogdump to see what's going
>on in the WAL.  But even if you figure it out, there is not much you
>can
>do about it.
>
>Try perhaps logical replication.  That would save you the bandwidth for
>updating all the indexes at least.  It might work for you.

Not only the traffic for indexes, for Vacuum too.
(and that can be a lot)

Regards, Andreas


-- 
2ndQuadrant - The PostgreSQL Support Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Error with DefineSavepoint:Unexpected state STARTED

2017-06-19 Thread Computer Study
Thanks Tom.

The version of Postgres is: 9.4.7

The UI application is to add/remove some permissions through a listbox  for
a certain user. For any add/remove, it will first delete all records for
this certain user in the table, then insert the values chose from UI.

In my code snippet, it first execute the DELETE, then execute multiple
INSERT to insert selected options into the table. After all done, commit.

The server is in remote and I don't have detailed logs at this moment. Will
try to ask for that.

My question:  for that error 'DefineSavepoint: unexpected state STARTED',
not quite sure happens in which step? In the middle of DELETE or INSERT?

On Mon, Jun 19, 2017 at 9:43 PM, Tom Lane  wrote:

> Computer Study  writes:
> > I am working on a project of DB migration from Oracle to Postgres. The
> > application is working well with Oracle. But when switch to Postgres,
> some
> > SQL executionss couldn't finish and got an error like:
>
> > FATAL: DefineSavepoint: unexpected state STARTED
> > STATEMENT: SAVEPOINT_per_query_svp;DEALLOCATE
> > "_PLAN04824560";RELEASE_per_query_SVP_
>
> Well, that's pretty interesting, but you haven't provided nearly enough
> information for anyone else to reproduce the problem.  (No, a snippet of
> C++ that depends on undescribed subroutines and uncertain context doesn't
> do it.)
>
> Perhaps you could turn on log_statement = all and capture the SQL that's
> actually being sent to the server by this program?
>
> Also, what version of Postgres is this exactly?
>
> > Any suggestions to fix this issue?
>
> https://wiki.postgresql.org/wiki/Guide_to_reporting_problems
>
> regards, tom lane
>


Re: [GENERAL] Error with DefineSavepoint:Unexpected state STARTED

2017-06-19 Thread Tom Lane
Computer Study  writes:
> I am working on a project of DB migration from Oracle to Postgres. The
> application is working well with Oracle. But when switch to Postgres, some
> SQL executionss couldn't finish and got an error like:

> FATAL: DefineSavepoint: unexpected state STARTED
> STATEMENT: SAVEPOINT_per_query_svp;DEALLOCATE
> "_PLAN04824560";RELEASE_per_query_SVP_

Well, that's pretty interesting, but you haven't provided nearly enough
information for anyone else to reproduce the problem.  (No, a snippet of
C++ that depends on undescribed subroutines and uncertain context doesn't
do it.)

Perhaps you could turn on log_statement = all and capture the SQL that's
actually being sent to the server by this program?

Also, what version of Postgres is this exactly?

> Any suggestions to fix this issue?

https://wiki.postgresql.org/wiki/Guide_to_reporting_problems

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Error with DefineSavepoint:Unexpected state STARTED

2017-06-19 Thread Computer Study
Hi,

I am working on a project of DB migration from Oracle to Postgres. The
application is working well with Oracle. But when switch to Postgres, some
SQL executionss couldn't finish and got an error like:

FATAL: DefineSavepoint: unexpected state STARTED
STATEMENT: SAVEPOINT_per_query_svp;DEALLOCATE
"_PLAN04824560";RELEASE_per_query_SVP_

The below is the c++ code to execute SQL to complete the operation. From
UI, it tries to add/remove some values (remove or insert some records into
table). When running application, nothing happens but with the error above
on DB server side. I did some search for this error, looks not too much.
Any suggestions to fix this issue?

Thanks..


...

try {
_statement->prepareStatement( SQL_DELETE_SACTIONS_BY_SID_AND_CID );
_statement->setUnsignedInteger( 1, criteria->cid );
_statement->setUnsignedInteger( 2, criteria->sid );
_statement->execute();

for( size_t i = 0; i < ( criteria->sActions->size() ); i++ ) {
_statement->prepareStatement( SQL_INSERT_SACTION_BY_SID_AND_CID );
_statement->setUnsignedInteger( 1, criteria->cid );
_statement->setUnsignedInteger( 2, criteria->sid );
_statement->setString( 3, ( unsigned int )( ( *( criteria->sActions )
)[i].length() ), ( *( criteria->sActions ) )[i].c_str() );
if( _statement->execute() != ServiceConstants::SUCCESS ) {
return result;
}
}

result = true;
_statement->prepareStatement( "COMMIT" );
_statement->execute();

} catch( ServiceException ex ) {
DatabaseUtilities::logServiceException(
ex, "UpdateBySidAndCid" );
}

return result;
}

const char * const UpdateBySidAndCid::SQL_DELETE_SACTIONS_BY_SID_AND_CID
= "DELETE FROM tableA WHERE Cid = ? AND Sid = ?";

const char * const UpdateBySidAndCid::SQL_INSERT_SACTION_BY_SID_AND_CID
= "INSERT INTO tableA (Cid, Siid, SActionUid) SELECT ?, ?, SActionUid
FROM tableB WHERE SActionName = ?";


Re: [GENERAL] Streaming replication bandwith per table

2017-06-19 Thread Peter Eisentraut
On 6/19/17 20:50, Maeldron T. wrote:
> Streaming replication generates too much traffic to set it up between
> different regions for financial reasons. The streaming replication would
> cost more than every other hosting expense altogether (including every
> the traffic, even though it’s web and huge amount of emails).
> 
> Is there a way to see in the log how much bandwidth is used per table? 

Not easily.  You could play around with pg_xlogdump to see what's going
on in the WAL.  But even if you figure it out, there is not much you can
do about it.

Try perhaps logical replication.  That would save you the bandwidth for
updating all the indexes at least.  It might work for you.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Streaming replication bandwith per table

2017-06-19 Thread Maeldron T.
Hello,

tl;dr

Streaming replication generates too much traffic to set it up between
different regions for financial reasons. The streaming replication would
cost more than every other hosting expense altogether (including every the
traffic, even though it’s web and huge amount of emails).

Is there a way to see in the log how much bandwidth is used per table?

Details:

My only idea is to improve the design.

I believe the source of the issue is tables involved in many to many
relations with frequent updates.

The structure varies. Sometimes it’s:

table_a_id_1 | table_a_id_2

other times:

table_b_id_1 | table_b_id_2 | float value

and:

integer value | table_c_id


It’s simple, but the performance is a key factor.

Depending on the relation (whether it had an additional value or not), the
expected usage, and my mood, I implemented them using either classical
pairs or arrays with gin indices:

table_a_id_1 | [table_a_id_2, table_a_id_7, table_a_id_9,...]

and:

integer value | [table_a_id_1, table_a_id_5, table_a_id_6, ...]

There are millions of records in the tables with arrays. The "normal" pairs
have tens of million and above. One table could have about 40 billion
values in theory (it doesn’t but it’s growing).


I can guess which tables are problematic and what to change, but:

* It’s difficult to simulate real-life usage
* The usage patterns are different from table to table
* If I’m wrong, I waste time and resources (and downtime) to make it even
worse

I know the updates on the arrays cost much more (it’s also a performance
issue) but the table takes magnitudes less space this way. I even
considered jsonb when there are also float values for each pair.

What to change in the design depends on the real-life use. How can I
measure the bandwidth usage per table in streaming replication? I don’t see
a way to emulate it with realistic results.

M

PS: except the updates on the arrays, the performance itself is satisfying
for each table. It’s only the bandwidth usage that would hurt


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Peter Geoghegan
On Mon, Jun 19, 2017 at 4:51 PM, Peter Geoghegan  wrote:
> This would make only the first lookup for each distinct value on the
> outer side actually do an index scan on the inner side. I can imagine
> the optimization saving certain queries from consuming a lot of memory
> bandwidth, as well as saving them from pinning and locking the same
> buffers repeatedly.

Apparently this is sometimes called block nested loop join, and MySQL
has had it for a while now:

https://en.wikipedia.org/wiki/Block_nested_loop

It doesn't necessarily require that the outer side input be sorted,
because you might end up using a hash table, etc.

-- 
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Peter Geoghegan
On Mon, Jun 19, 2017 at 4:35 PM, Andres Freund  wrote:
>> I think that this is the way index scan prefetch is normally
>> implemented. Index scans will on average have a much more random
>> access pattern than what is typical for bitmap heap scans, making this
>> optimization more compelling, so hopefully someone will get around to
>> this.
>
> I think for index based merge and nestloop joins, it'd be hugely
> beneficial to do prefetching on the index, but more importantly on the
> heap level.  Not entirely trivial to do however.

Speaking of nestloop join, and on a similar note, we could do some
caching on the inner side of a nestloop join.

We already track if the outer side access path of a nestloop join
preserves sort order within the optimizer. It might not be that hard
to teach the optimizer to generate a plan where, when we know that
this has happened, and we know that the outer side is not unique, the
final plan hints to the executor to opportunistically cache every
lookup on the inner side.

This would make only the first lookup for each distinct value on the
outer side actually do an index scan on the inner side. I can imagine
the optimization saving certain queries from consuming a lot of memory
bandwidth, as well as saving them from pinning and locking the same
buffers repeatedly.

-- 
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Andres Freund
On 2017-06-19 15:21:20 -0700, Peter Geoghegan wrote:
> On Mon, Jun 19, 2017 at 8:36 AM, Jeff Janes  wrote:
> > Unfortunately, it is only implemented in very narrow circumstances.  You
> > have to be doing bitmap index scans of many widely scattered rows to make it
> > useful.  I don't think that this is all that common of a situation.  The
> > problem is that at every point in the scan, it has to be possible to know
> > what data block it is going to want N iterations in the future, so you can
> > inform the kernel to pre-fetch it.  That is only easy to know for bitmap
> > scans.
> 
> I think that you could prefetch in index scans by using the
> pointers/downlinks in the immediate parent page of the leaf page that
> the index scan currently pins. The sibling pointer in the leaf itself
> is no good for this, because there is only one block to prefetch
> available at a time.
> 
> I think that this is the way index scan prefetch is normally
> implemented. Index scans will on average have a much more random
> access pattern than what is typical for bitmap heap scans, making this
> optimization more compelling, so hopefully someone will get around to
> this.

I think for index based merge and nestloop joins, it'd be hugely
beneficial to do prefetching on the index, but more importantly on the
heap level.  Not entirely trivial to do however.

- Andres


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Peter Geoghegan
On Mon, Jun 19, 2017 at 3:25 PM, Alvaro Herrera
 wrote:
> Surely you could prefetch all the heap pages pointed to by index items
> in the current leaf index page ...

I'm sure that you could do that too. I'm not sure how valuable each
prefetching optimization is.

I can imagine prefetching heap pages mattering a lot less for a
primary key index, where there is a strong preexisting correlation
between physical and logical order, while also mattering a lot more
than what I describe in other cases. I suppose that you need both.

-- 
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Alvaro Herrera
Peter Geoghegan wrote:
> On Mon, Jun 19, 2017 at 8:36 AM, Jeff Janes  wrote:
> > Unfortunately, it is only implemented in very narrow circumstances.  You
> > have to be doing bitmap index scans of many widely scattered rows to make it
> > useful.  I don't think that this is all that common of a situation.  The
> > problem is that at every point in the scan, it has to be possible to know
> > what data block it is going to want N iterations in the future, so you can
> > inform the kernel to pre-fetch it.  That is only easy to know for bitmap
> > scans.
> 
> I think that you could prefetch in index scans by using the
> pointers/downlinks in the immediate parent page of the leaf page that
> the index scan currently pins. The sibling pointer in the leaf itself
> is no good for this, because there is only one block to prefetch
> available at a time.

Surely you could prefetch all the heap pages pointed to by index items
in the current leaf index page ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Peter Geoghegan
On Mon, Jun 19, 2017 at 8:36 AM, Jeff Janes  wrote:
> Unfortunately, it is only implemented in very narrow circumstances.  You
> have to be doing bitmap index scans of many widely scattered rows to make it
> useful.  I don't think that this is all that common of a situation.  The
> problem is that at every point in the scan, it has to be possible to know
> what data block it is going to want N iterations in the future, so you can
> inform the kernel to pre-fetch it.  That is only easy to know for bitmap
> scans.

I think that you could prefetch in index scans by using the
pointers/downlinks in the immediate parent page of the leaf page that
the index scan currently pins. The sibling pointer in the leaf itself
is no good for this, because there is only one block to prefetch
available at a time.

I think that this is the way index scan prefetch is normally
implemented. Index scans will on average have a much more random
access pattern than what is typical for bitmap heap scans, making this
optimization more compelling, so hopefully someone will get around to
this.

-- 
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance considerations of jsonb vs separate rows

2017-06-19 Thread Merlin Moncure
On Mon, Jun 19, 2017 at 2:29 PM, Rob Nikander  wrote:
> I'm wondering about the tradeoffs, specifically: is it possible to update
> one piece of a jsonb value without having to rewrite the entire field? There
> are cases where that data field was getting pretty big (500kb). Would you
> expect any obvious performance differences between these two options?

You are basically asking, what are the relative
advantages/disadvantages of document model vs data store?  This is a
complicated discussion.  Here are some tradeoffs:

*) Document is always read/written in bulk.  Row data reads/writes are
more discrete (but generally postgres always reads/writes 8kb
minimum!)

*) for documents transaction tracking is for the entire document.
This is more efficient for storage but can have very serious
consequences if sub-portions of the document are updated under heavy
concurrency.

*) Documents are a pain if the structure changes in such a way so as
to require invalidation of all of them.

*) Documents can be a *real* pain if the data relationships change in
some fundamental way.   This is a pain with traditional tables as
well, but relational type models tend to be the most flexible vs other
approaches.

Basically there is a reason why SQL and relational type systems won
the 'data wars' of the 1970's and 1980's.   There are downsides to the
basic approach (especially performance due to joining) but the
simplicity and elegance of being able to model just about any problem
tends to compensate certain performance disadvantages.

Document style storage tends to move the database model out of the
database and into the application (which is a very poor tradeoff IMO)
and fall over when some of the initial assumptions with respect to the
document modeling discrete business units break down; you end up
storing the same information over and over in different documents
which causes all kinds of problems.  They do tend to work well in low-
or no- update applications however.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Bruce Momjian
On Mon, Jun 19, 2017 at 10:49:59AM -0500, Merlin Moncure wrote:
> On Mon, Jun 19, 2017 at 10:36 AM, Jeff Janes  wrote:
> > If you have a RAID, set it to the number of spindles in your RAID and forget
> > it. It is usually one of the less interesting knobs to play with.  (Unless
> > your usage pattern of the database is unusual and exact fits the above
> > pattern.)
> 
> Isn't that advice obsolete in a SSD world though?  I was able to show
> values up to 256 for a single device provided measurable gains for a
> single S3500.  It's true though that the class of queries that this
> would help is pretty narrow.

Our developer docs are much clearer:


https://www.postgresql.org/docs/10/static/runtime-config-resource.html#runtime-config-resource-disk

For magnetic drives, a good starting point for this setting is the
number of separate drives comprising a RAID 0 stripe or RAID 1 mirror
being used for the database. (For RAID 5 the parity drive should not be
counted.) However, if the database is often busy with multiple queries
issued in concurrent sessions, lower values may be sufficient to keep
the disk array busy. A value higher than needed to keep the disks busy
will only result in extra CPU overhead. SSDs and other memory-based
storage can often process many concurrent requests, so the best value
might be in the hundreds.

I didn't backpatch this change since the original docs were not
incorrect.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] autovacuum holds exclusive lock on table preventing it from to be updated

2017-06-19 Thread Scott Marlowe
On Mon, Jun 19, 2017 at 1:53 PM, Dmitry O Litvintsev  wrote:
> yes, we had to restart database 4 days ago (and vacuum has resumed on start).
> I checked the log files and discovered that autovacuum on this table takes
>
> pages: 0 removed, 14072307 remain
> tuples: 43524292 removed, 395006545 remain
> buffer usage: -1493114028 hits, 107664973 misses, 30263658 dirtied
> avg read rate: 1.604 MB/s, avg write rate: 0.451 MB/s
> system usage: CPU 2055.81s/17710.94u sec elapsed 524356.57 sec
>
> 6 days. So it is perpetually being autovacuumed (which I assumed to be a good 
> thing)
>
> Table has 400M entries, 115 GB.
>
> I will try your suggestions in the test environment.
>
> Thank you,
> Dmitry

Once you get this sorted, look into using the checkpostgresql.pl
script and a monitoring solution like zabbix or nagios to monitor
things like transactions until wraparound etc so you don't wind up
back here again. Best of luck in. Note that if you drop the vacuum
delay to 0ms the vacuum will probably complete in a few hours tops.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] sub-select with multiple records, columns

2017-06-19 Thread Israel Brewster
On Jun 19, 2017, at 12:29 PM, Thomas Kellerer  wrote:
> 
> Israel Brewster schrieb am 19.06.2017 um 22:17:
>> SELECT
>> ...
>> (SELECT
>> array_agg(to_json(row(notedate,username,note)))
>> FROM sabrenotes
>> INNER JOIN users ON author=users.id
>> WHERE ticket=sabretickets.id ) notes
>> FROM tickets
>> WHERE ...
>> The only problem with this query is that the notes aren't sorted. Of
>> course, simply adding an ORDER BY clause to the sub-select doesn't
>> work - it throws an error about needing to use notedate in a GROUP BY
>> clause or aggregate function. Is there some way I can get sorting as
>> well here? Of course, I could just run a second query to get the
>> notes, and combine in code, but that's no fun... :-)
> 
> You can supply an ORDER BY to an aggregate function:
> 
>   array_agg(to_json(row(notedate,username,note)) order by ...)

Thanks (and to David G. Johnston). Didn't realize I could do that, but it makes 
perfect sense.

> 
> I have to admit, that I fail to see the the advantage of an array of JSON 
> objects, rather then having a single json with the elements inside.
> 
> json_object_agg() or json_agg() might be better suited for this.

You may be right. Actually, my first thought (and the ideal here) was to simply 
have an array of rows or the like. That is, wind up with a data structure where 
I could in my code do something like record['notes']['username'], or perhaps 
record['notes'][1]. However, while I didn't get any errors when I tried that, 
the parsing of the results fell apart at some point in the chain - I wound up 
with strings containing a bunch of escaped and double-escaped quotes and the 
like. Adding the to_json simply converted the rows to json strings, which I can 
work with easily enough. Since I do still have to parse the json anyway, 
perhaps making the entire array be a single json object that I could parse once 
would be a better approach.

> 
> Thomas
> 
> 
> 
> 
>   
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] sub-select with multiple records, columns

2017-06-19 Thread David G. Johnston
On Mon, Jun 19, 2017 at 1:32 PM, David G. Johnston
 wrote:
> On Mon, Jun 19, 2017 at 1:29 PM, Thomas Kellerer  wrote:
>>
>> Israel Brewster schrieb am 19.06.2017 um 22:17:
>>>
>>> SELECT
>>> ...
>>> (SELECT
>>> array_agg(to_json(row(notedate,username,note)))
>>> FROM sabrenotes
>>> INNER JOIN users ON author=users.id
>>> WHERE ticket=sabretickets.id ) notes
>>> FROM tickets
>>> WHERE ...
>>>
>>> The only problem with this query is that the notes aren't sorted. Of
>>> course, simply adding an ORDER BY clause to the sub-select doesn't
>>> work - it throws an error about needing to use notedate in a GROUP BY
>>> clause or aggregate function. Is there some way I can get sorting as
>>> well here? Of course, I could just run a second query to get the
>>> notes, and combine in code, but that's no fun... :-)
>>
>>
>> You can supply an ORDER BY to an aggregate function:
>>
>>array_agg(to_json(row(notedate,username,note)) order by ...)
>>
>> I have to admit, that I fail to see the the advantage of an array of JSON
>> objects, rather then having a single json with the elements inside.
>>
>> json_object_agg() or json_agg() might be better suited for this.
>>

You could also write:
SELECT ...,
ARRAY(SELECT to_json(...) [...] ORDER BY) AS notes
FROM tickets

David J.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] sub-select with multiple records, columns

2017-06-19 Thread Thomas Kellerer

Israel Brewster schrieb am 19.06.2017 um 22:17:

SELECT
...
(SELECT
array_agg(to_json(row(notedate,username,note)))
FROM sabrenotes
INNER JOIN users ON author=users.id
WHERE ticket=sabretickets.id ) notes
FROM tickets
WHERE ...

The only problem with this query is that the notes aren't sorted. Of
course, simply adding an ORDER BY clause to the sub-select doesn't
work - it throws an error about needing to use notedate in a GROUP BY
clause or aggregate function. Is there some way I can get sorting as
well here? Of course, I could just run a second query to get the
notes, and combine in code, but that's no fun... :-)


You can supply an ORDER BY to an aggregate function:

   array_agg(to_json(row(notedate,username,note)) order by ...)

I have to admit, that I fail to see the the advantage of an array of JSON 
objects, rather then having a single json with the elements inside.

json_object_agg() or json_agg() might be better suited for this.

Thomas




   




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] sub-select with multiple records, columns

2017-06-19 Thread Israel Brewster
I have two tables, a ticket table and a notes table, set up where each ticket can have multiple notes. I'm trying to come up with a query that returns the ticket fields as well as a field that is an array type field with the values being json-encoded note records. I've come up with the following subselect query, which works:SELECT	...	(SELECT 		array_agg(to_json(row(notedate,username,note))) 	FROM sabrenotes 	INNER JOIN users
ON author=users.id 	WHERE ticket=sabretickets.id
) notesFROM ticketsWHERE ...The only problem with this query is that the notes aren't sorted. Of course, simply adding an ORDER BY clause to the sub-select doesn't work - it throws an error about needing to use notedate in a GROUP BY clause or aggregate function. Is there some way I can get sorting as well here? Of course, I could just run a second query to get the notes, and combine in code, but that's no fun... :-)
---Israel BrewsterSystems Analyst IIRavn Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7293---BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD




Re: [GENERAL] autovacuum holds exclusive lock on table preventing it from to be updated

2017-06-19 Thread Dmitry O Litvintsev
yes, we had to restart database 4 days ago (and vacuum has resumed on start). 
I checked the log files and discovered that autovacuum on this table takes 

pages: 0 removed, 14072307 remain
tuples: 43524292 removed, 395006545 remain
buffer usage: -1493114028 hits, 107664973 misses, 30263658 dirtied
avg read rate: 1.604 MB/s, avg write rate: 0.451 MB/s
system usage: CPU 2055.81s/17710.94u sec elapsed 524356.57 sec

6 days. So it is perpetually being autovacuumed (which I assumed to be a good 
thing) 

Table has 400M entries, 115 GB. 

I will try your suggestions in the test environment. 

Thank you,
Dmitry 

From: Jeff Janes 
Sent: Monday, June 19, 2017 1:16 PM
To: Dmitry O Litvintsev
Cc: Andreas Kretschmer; pgsql-general@postgresql.org
Subject: Re: [GENERAL] autovacuum holds exclusive lock on table preventing it 
from to be updated

On Mon, Jun 19, 2017 at 10:33 AM, Dmitry O Litvintsev 
> wrote:
Hi

Since I have posted this nothing really changed. I am starting to panic 
(mildly).

The source (production) runs :

  relname   |   mode   | granted |  
  substr|  query_start  
|  age
+--+-+--+---+
 t_inodes_iio_idx   | RowExclusiveLock | t   | autovacuum: 
VACUUM ANALYZE public.t_inodes (to prevent wraparound)   | 2017-06-15 
10:26:18.643209-05 | 4 days 01:58:56.697559


This is close to unreadable.  You can use use \x to get output from psql which 
survives email more readably.

Your first report was 6 days ago.  Why is the job only 4 days old?  Are you 
frequently restarting your production server, so that the vacuum job never gets 
a chance to finish?  If so, that would explain your predicament.

And how big is this table, that it takes at least 4 days to VACUUM?

vacuum_cost_delay = 50ms

That is a lot.  The default value for this is 0.  The default value for 
autovacuum_vacuum_cost_delay is 20, which is usually too high for giant 
databases.

I think you are changing this in the wrong direction.  Rather than increase 
vacuum_cost_delay, you need to decrease autovacuum_vacuum_cost_delay, so that 
you won't keep having problems in the future.


On your test server, change vacuum_cost_delay to zero and then initiate a 
manual vacuum of the table.  It will block on the autovacuum's lock, so then 
kill the autovacuum (best to have the manual vacuum queued up first, otherwise 
it will be race between when you start the manual vacuum, and when the 
autovacuum automatically restarts, to see who gets the lock). See how long it 
takes this unthrottled vacuum to run, and how much effect the IO it causes has 
on the performance of other tasks.  If acceptable, repeat this on production 
(although really, I don't that you have much of a choice on whether the effect 
it is acceptable or not--it needs to be done.)

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance considerations of jsonb vs separate rows

2017-06-19 Thread Dmitry Dolgov
> On 19 June 2017 at 21:29, Rob Nikander  wrote:
>
> I'm wondering about the tradeoffs, specifically: is it possible to update
one piece of a jsonb value without having to rewrite the entire field?
There are cases where that data field was getting pretty big (500kb). Would
you expect any obvious performance differences between these two options?

Unfortunately no, an entire jsonb field has to be written back even if
you've touched only one key.
>From my own benchmarks it looks like you'll scarcely notice this (e.g. in
comparison with MongoDB) only if you work
with small enough documents (about 2kb), and everything above this limit
more or less seriously hit the performance.
You can take a look at this presentation [1] from Oleg Bartunov, it
contains results of some benchmarks (from slide 44).

[1]: http://www.sai.msu.su/~megera/postgres/talks/jsonb-pgconf.us-2017.pdf


Re: [GENERAL] performance considerations of jsonb vs separate rows

2017-06-19 Thread Andreas Kretschmer
Am 19. Juni 2017 21:29:40 MESZ schrieb Rob Nikander :

>
>I'm wondering about the tradeoffs, specifically: is it possible to
>update one piece of a jsonb value without having to rewrite the entire
>field? 


Updates in PostgreSQL are always Delete & Insert. So the answer is no. For 
session-data please consider unlogged tables.


Regards, Andreas


-- 
2ndQuadrant - The PostgreSQL Support Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] performance considerations of jsonb vs separate rows

2017-06-19 Thread Rob Nikander
Hi,

I’ve got a web app where I want to store user’s session data. The schema in 
this data changes a lot so it may be useful here to store the session 
properties in either a jsonb column, or in multiple rows. Something like:

  session_id | data 
  100  { a: 1, bar: 2 ...
  101  { a: 3, baz: 123 …

or

  session_id | name | value
  100  a  1
  100  bar2
  101  baz123
  101  a  3
  ... 

The app currently does something like option 1, but on an older pre-jsonb 
version of postgres, so the field is just text. I’m hoping to upgrade Postgres 
soon so jsonb is an option.

I'm wondering about the tradeoffs, specifically: is it possible to update one 
piece of a jsonb value without having to rewrite the entire field? There are 
cases where that data field was getting pretty big (500kb). Would you expect 
any obvious performance differences between these two options?

Yes, I’ll need to build performance tests myself, but that’s a lot of work to 
get two realistic situations with millions of rows, so I’m wondering about 
guesses or common knowledge on this.

thanks,
Rob

Re: [GENERAL] autovacuum holds exclusive lock on table preventing it from to be updated

2017-06-19 Thread Bill Moran
On Mon, 19 Jun 2017 17:33:23 +
Dmitry O Litvintsev  wrote:
> 
> The test stand where I was to test schema upgrade is stuck cuz vacuum is 
> blocking. 

If you're in "panic mode" I would recommend cancelling the existing vacuum,
running your upgrades, then immeditely running VACUUM FREEZE ANALYZE on that 
table to
make up for cancelling the autovacuum. Note that the manual VACUUM may take 
quite a while,
so run it in a screen session or something. Additionally, autovacuum is going 
to try to
relaunch that vacuum pretty aggressively, so you might have to cancel it a few 
times (using
pg_terminate_backend()) before your other processes are able to sneak in ahead 
of it.

Once you're out of panic mode you can take some time to breathe and consider 
your options
for reducing the issue in the future.

I have to 2nd Alvaro's comments about the cost delay. Personally, I'd recommend
setting vacuum_cost_delay to zero, unless your hardware is crap. In my recent
experience, allowing vacuum to run full-bore is less intrustive on a busy 
database
with good hardware than forcing it to take it's time. Unfortunately, changing it
now isn't going to speed the current vacuum up any.

Another comment: schema changes almost always need exclusive locks on tables
that they're modifying. As a result, you really need to plan them out a bit. 
Anything
could block a schema update, even a simple SELECT statement; so it's important 
to
check the health of things before starting. While it's not _generally_ a good 
practice
to interrupt autovacuum, it _can_ be done if the schema upgrade is necessary. 
Keep in
mind that it's just going to start back up again, but hopefully your schema 
update will
be done by then and it can do it's work without interfering with things. 
Another thing
you can do is to monitor the transaction ID values (the Nagios check_postgres 
has a nice
mode for monitoring this) and manually launch a VACUUM FREEZE ahead of 
autovacuum, so that
_you_ can pick the time for it to run and not have it happen to crop up at the 
worst possible
time ;)

You might also find that things are easier to deal with if you tweak the 
autovacuum
settings on this table to cause it to be vacuumed more frequently. In my 
experience, more
frequent vacuums that do less work each time often lead to happier databases. 
See ALTER
TABLE and the available settings to tweak autovacuum behavior.

-- 
Bill Moran 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] autovacuum holds exclusive lock on table preventing it from to be updated

2017-06-19 Thread Jeff Janes
On Mon, Jun 19, 2017 at 10:33 AM, Dmitry O Litvintsev 
wrote:

> Hi
>
> Since I have posted this nothing really changed. I am starting to panic
> (mildly).
>
> The source (production) runs :
>
>   relname   |   mode   | granted |
> substr|
> query_start  |  age
> +--+
> -+--
> +---+
>  t_inodes_iio_idx   | RowExclusiveLock | t   |
> autovacuum: VACUUM ANALYZE public.t_inodes (to prevent wraparound)   |
> 2017-06-15 10:26:18.643209-05 | 4 days 01:58:56.697559
>


This is close to unreadable.  You can use use \x to get output from psql
which survives email more readably.

Your first report was 6 days ago.  Why is the job only 4 days old?  Are you
frequently restarting your production server, so that the vacuum job never
gets a chance to finish?  If so, that would explain your predicament.

And how big is this table, that it takes at least 4 days to VACUUM?

vacuum_cost_delay = 50ms
>

That is a lot.  The default value for this is 0.  The default value
for autovacuum_vacuum_cost_delay is 20, which is usually too high for giant
databases.

I think you are changing this in the wrong direction.  Rather than increase
vacuum_cost_delay, you need to decrease autovacuum_vacuum_cost_delay, so
that you won't keep having problems in the future.


On your test server, change vacuum_cost_delay to zero and then initiate a
manual vacuum of the table.  It will block on the autovacuum's lock, so
then kill the autovacuum (best to have the manual vacuum queued up first,
otherwise it will be race between when you start the manual vacuum, and
when the autovacuum automatically restarts, to see who gets the lock). See
how long it takes this unthrottled vacuum to run, and how much effect the
IO it causes has on the performance of other tasks.  If acceptable, repeat
this on production (although really, I don't that you have much of a choice
on whether the effect it is acceptable or not--it needs to be done.)

Cheers,

Jeff


Re: [GENERAL] autovacuum holds exclusive lock on table preventing it from to be updated

2017-06-19 Thread Alvaro Herrera
Dmitry O Litvintsev wrote:
> Hi
> 
> Since I have posted this nothing really changed. I am starting to panic 
> (mildly).  

...

> vacuum_cost_delay = 50ms

Most likely, this value is far too high.  You're causing autovacuum to
sleep for a very long time with this setting.  Hard to say for certain
without seeing the cost_limit value and the other related parameters,
but it's most certainly what's causing you pain.  The default of 20ms is
already too high for most users.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] autovacuum holds exclusive lock on table preventing it from to be updated

2017-06-19 Thread Dmitry O Litvintsev
Hi

Since I have posted this nothing really changed. I am starting to panic 
(mildly).  

The source (production) runs :

  relname   |   mode   | granted |  
  substr|  query_start  
|  age   
+--+-+--+---+
 t_inodes_iio_idx   | RowExclusiveLock | t   | autovacuum: 
VACUUM ANALYZE public.t_inodes (to prevent wraparound)   | 2017-06-15 
10:26:18.643209-05 | 4 days 01:58:56.697559
 t_inodes_pkey  | RowExclusiveLock | t   | autovacuum: 
VACUUM ANALYZE public.t_inodes (to prevent wraparound)   | 2017-06-15 
10:26:18.643209-05 | 4 days 01:58:56.697559
| ExclusiveLock| t   | autovacuum: 
VACUUM ANALYZE public.t_inodes (to prevent wraparound)   | 2017-06-15 
10:26:18.643209-05 | 4 days 01:58:56.697559
 t_inodes   | ShareUpdateExclusiveLock | t   | autovacuum: 
VACUUM ANALYZE public.t_inodes (to prevent wraparound)   | 2017-06-15 
10:26:18.643209-05 | 4 days 01:58:56.697559
 t_inodes_itype_idx | RowExclusiveLock | t   | autovacuum: 
VACUUM ANALYZE public.t_inodes (to prevent wraparound)   | 2017-06-15 
10:26:18.643209-05 | 4 days 01:58:56.697559
 t_inodes_imtime_idx| RowExclusiveLock | t   | autovacuum: 
VACUUM ANALYZE public.t_inodes (to prevent wraparound)   | 2017-06-15 
10:26:18.643209-05 | 4 days 01:58:56.697559

Above does not impact production activity a lot. 

On the test stand (where I pg_basebackupped from production and also upgraded 
to 9.6) I see:

   relname  |   mode   | granted |  
 substr   |  query_start
  |  age   
---+--+-++---+
 t_inodes  | ShareUpdateExclusiveLock | t   | 
autovacuum: VACUUM public.t_inodes (to prevent wraparound) | 2017-06-13 
15:27:54.872154-05 | 5 days 20:59:22.769404
 t_inodes_itype_idx| RowExclusiveLock | t   | 
autovacuum: VACUUM public.t_inodes (to prevent wraparound) | 2017-06-13 
15:27:54.872154-05 | 5 days 20:59:22.769404
 t_inodes_imtime_idx   | RowExclusiveLock | t   | 
autovacuum: VACUUM public.t_inodes (to prevent wraparound) | 2017-06-13 
15:27:54.872154-05 | 5 days 20:59:22.769404
 t_inodes_iio_idx  | RowExclusiveLock | t   | 
autovacuum: VACUUM public.t_inodes (to prevent wraparound) | 2017-06-13 
15:27:54.872154-05 | 5 days 20:59:22.769404
 t_inodes_pkey | RowExclusiveLock | t   | 
autovacuum: VACUUM public.t_inodes (to prevent wraparound) | 2017-06-13 
15:27:54.872154-05 | 5 days 20:59:22.769404
   | ExclusiveLock| t   | 
autovacuum: VACUUM public.t_inodes (to prevent wraparound) | 2017-06-13 
15:27:54.872154-05 | 5 days 20:59:22.769404
 t_inodes  | ShareUpdateExclusiveLock | f   | 
ANALYZE;   | 2017-06-13 
15:27:59.781285-05 | 5 days 20:59:17.860273
   | ExclusiveLock| t   | 
ANALYZE;   | 2017-06-13 
15:27:59.781285-05 | 5 days 20:59:17.860273


The test stand where I was to test schema upgrade is stuck cuz vacuum is 
blocking. 

Production settings follow:

version 9.3.9

max_connections = 512
shared_buffers = 8192MB
temp_buffers = 1024MB
work_mem = 512MB
#maintenance_work_mem = 2048MB 
maintenance_work_mem = 4096MB #increased after 3 days of vacuum analyze running 
max_stack_depth = 2MB
vacuum_cost_delay = 50ms
synchronous_commit = off
wal_buffers = 245MB
wal_writer_delay = 10s
checkpoint_segments = 64
checkpoint_completion_target = 0.9
random_page_cost = 2.0
effective_cache_size = 94GB
wal_level = hot_standby
hot_standby = on
archive_mode = on
archive_command = '/usr/loca/bin/wal_backup.sh %p %f'
max_wal_senders = 4
wal_keep_segments = 1024
max_standby_streaming_delay = 7200s

So, the problem : I cannot do schema change until vacuum has finished, and 
there 
seems to be no end in sight for vacuum to finish throwing off our software 
upgrade plans. 

Anything can be done here?

Thanks,
Dmitry


From: Andreas Kretschmer 
Sent: Tuesday, June 13, 2017 1:54 PM
To: pgsql-general@postgresql.org; Dmitry O Litvintsev; 
pgsql-general@postgresql.org
Subject: Re: [GENERAL] autovacuum holds exclusive lock on table 

Re: [GENERAL] Remote connection to PostgreSQL

2017-06-19 Thread Melvin Davidson
On Mon, Jun 19, 2017 at 12:21 PM, Igor Korot  wrote:

> Thx, David.
>
> On Mon, Jun 19, 2017 at 12:09 PM, David G. Johnston
>  wrote:
> > On Mon, Jun 19, 2017 at 9:02 AM, Igor Korot  wrote:
> >>
> >>  Hi, ALL,
> >> Is there some magic in order to turn on remote connection to PostgreSQL?
> >>
> >> There are some extra steps to turn it on for MS SQL and MySQL, so I
> >> figured
> >> it should be the same for Postgre.
> >
> >
> > See "listen_addresses"
> >
> > https://www.postgresql.org/docs/9.6/static/runtime-
> config-connection.html
> >
> > It forward links you to setting up client authentication for the same.
> >
> > David J.
> >
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

In addition to enabling  listen_addresses, you also have to add entries in
pg_hba.conf.

-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] Remote connection to PostgreSQL

2017-06-19 Thread Igor Korot
Thx, David.

On Mon, Jun 19, 2017 at 12:09 PM, David G. Johnston
 wrote:
> On Mon, Jun 19, 2017 at 9:02 AM, Igor Korot  wrote:
>>
>>  Hi, ALL,
>> Is there some magic in order to turn on remote connection to PostgreSQL?
>>
>> There are some extra steps to turn it on for MS SQL and MySQL, so I
>> figured
>> it should be the same for Postgre.
>
>
> See "listen_addresses"
>
> https://www.postgresql.org/docs/9.6/static/runtime-config-connection.html
>
> It forward links you to setting up client authentication for the same.
>
> David J.
>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Jeff Janes
On Mon, Jun 19, 2017 at 8:49 AM, Merlin Moncure  wrote:

> On Mon, Jun 19, 2017 at 10:36 AM, Jeff Janes  wrote:
> > If you have a RAID, set it to the number of spindles in your RAID and
> forget
> > it. It is usually one of the less interesting knobs to play with.
> (Unless
> > your usage pattern of the database is unusual and exact fits the above
> > pattern.)
>
> Isn't that advice obsolete in a SSD world though?  I was able to show
> values up to 256 for a single device provided measurable gains for a
> single S3500.  It's true though that the class of queries that this
> would help is pretty narrow.


I don't think it is obsolete, you just have to be creative with how you
interpret 'spindle' :)

With a single laptop hard-drive, I could get improvements of about 2 fold
by setting it to very high numbers, like 50 or 80. By giving the hard drive
the option of dozens of different possible sectors to read next, it could
minimize head-seek.  But that is with just one query running at a time.
With multiple queries all running simultaneously all trying to take
advantage of this, performance gains quickly fell apart.  I would expect
the SSD situation to be similar to that, where the improvements are
measurable but also fragile, but I haven't tested it.

Cheers,

Jeff


Re: [GENERAL] Remote connection to PostgreSQL

2017-06-19 Thread David G. Johnston
On Mon, Jun 19, 2017 at 9:02 AM, Igor Korot  wrote:

>  Hi, ALL,
> Is there some magic in order to turn on remote connection to PostgreSQL?
>
> There are some extra steps to turn it on for MS SQL and MySQL, so I figured
> it should be the same for Postgre.
>

​See "listen_addresses"​

https://www.postgresql.org/docs/9.6/static/runtime-config-connection.html

It forward links you to setting up client authentication for the same.

David J.


[GENERAL] Remote connection to PostgreSQL

2017-06-19 Thread Igor Korot
 Hi, ALL,
Is there some magic in order to turn on remote connection to PostgreSQL?

There are some extra steps to turn it on for MS SQL and MySQL, so I figured
it should be the same for Postgre.

Thank you.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Merlin Moncure
On Mon, Jun 19, 2017 at 10:36 AM, Jeff Janes  wrote:
> If you have a RAID, set it to the number of spindles in your RAID and forget
> it. It is usually one of the less interesting knobs to play with.  (Unless
> your usage pattern of the database is unusual and exact fits the above
> pattern.)

Isn't that advice obsolete in a SSD world though?  I was able to show
values up to 256 for a single device provided measurable gains for a
single S3500.  It's true though that the class of queries that this
would help is pretty narrow.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres Data Encryption Using LUKS with dm-crypt ?

2017-06-19 Thread Paul Jungwirth

On 06/19/2017 12:40 AM, Scott Marlowe wrote:

On Sun, Jun 18, 2017 at 2:20 PM, Condor  wrote:

What I should expect, what is good and bad things that can be happened.


I've run Postgres on a LUKS volume for a few years now and it's all been 
pretty quiet. One challenge is you need to supply the password if the 
server restarts. Automating that in a way that doesn't simply reveal the 
password is tricky.


I'm not using RAID, so I can't speak to combing LUKS + RAID.

If you are on AWS, nowadays they have encrypted EBS volumes which will 
do all this for you automatically. If I were setting up this system 
today that's probably what I would have used.


> I think the only real test here is to build a luks system, initiate
> some pgbench type runs, wait a minute, run checkpoint and then yank
> out the plug. Run a dozen or so times looking for data corruption.

I think this is really the right answer!

Paul




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Jeff Janes
On Sun, Jun 18, 2017 at 7:09 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Sun, Jun 18, 2017 at 6:02 PM, Patrick B 
> wrote:
>
>> Hi guys.
>>
>> I just wanna understand the effective_io_concurrency value better.
>>
>> My current Master database server has 16 vCPUS and I use
>> ​​
>>  effective_io_concurrency = 0.
>>
>
> ​It seems as though the number of virtual CPUs little to no bearing on
> whether, or to what value, you should set this parameter.  Obviously with
> only one CPU parallelism wouldn't be possible (I'm assuming a single query
> does not make multiple parallel requests for data)
>

Ah, but it does.  That is exactly what this parameter is for.

Unfortunately, it is only implemented in very narrow circumstances.  You
have to be doing bitmap index scans of many widely scattered rows to make
it useful.  I don't think that this is all that common of a situation.  The
problem is that at every point in the scan, it has to be possible to know
what data block it is going to want N iterations in the future, so you can
inform the kernel to pre-fetch it.  That is only easy to know for bitmap
scans.

If you have a RAID, set it to the number of spindles in your RAID and
forget it. It is usually one of the less interesting knobs to play with.
 (Unless your usage pattern of the database is unusual and exact fits the
above pattern.)


Cheers,

Jeff


Re: [GENERAL] inheritence children with integer columns of differing width

2017-06-19 Thread Tom Lane
Justin Pryzby  writes:
> I wondered if anyone had considered allowing inheritence children to have
> different column types than the parent (and each other).

No, and it's not sane to consider it.

> I'm thinking of the trivial (?) case of smallint/int/bigint.

What's trivial about that?  Consider

   select * from parent_table where somefunc(intcol);

If somefunc takes integer, this is fine, but what are we supposed to do
if the corresponding column in the child is bigint?  Throw error for any
row where the value doesn't fit in int?  If so, what's the point of
letting the child column type be different?

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] inheritence children with integer columns of differing width

2017-06-19 Thread Justin Pryzby
I wondered if anyone had considered allowing inheritence children to have
different column types than the parent (and each other).

I'm thinking of the trivial (?) case of smallint/int/bigint.  Reason is that
when we load data which exceeds the theshold for the current data type we have
to promote the column, rewriting the table, which can take a very long time,
and use very large amount of space.  We've had to start uninheriting all but
the most recent children before ALTERing to make it more reasonable (and then
separately ALTER+reinherit each child) - it's especially painful when a key
column grows beyond "int", and many tables need to be altered all at once..

It seems to me this is what would happen if one were to UNION ALL the children,
although I see the plan differs with differering type:

pryzbyj=# create table ii(i bigint);
pryzbyj=# create table i(i int);
pryzbyj=# explain SELECT * FROM ii UNION ALL SELECT * FROM i;
Append  (cost=0.00..110.80 rows=4540 width=6)
  ->  Seq Scan on ii  (cost=0.00..31.40 rows=2140 width=8)
  ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..58.00 rows=2400 width=4)
->  Seq Scan on i  (cost=0.00..34.00 rows=2400 width=4)

pryzbyj=# alter table i ALTER i TYPE bigint;
ALTER TABLE
pryzbyj=# explain SELECT * FROM ii UNION ALL SELECT * FROM i;
Append  (cost=0.00..62.80 rows=4280 width=8)
  ->  Seq Scan on ii  (cost=0.00..31.40 rows=2140 width=8)
  ->  Seq Scan on i  (cost=0.00..31.40 rows=2140 width=8)

If it were allowed for children to have int columns with differing widths, then
to promote int column, we would uninherit the historic children, ALTER the
parent (and most recent tables), and then reinherit the children (unless ALTER
on its own avoided rewriting tables in such a case).

Justin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres Data Encryption Using LUKS with dm-crypt ?

2017-06-19 Thread Scott Marlowe
On Sun, Jun 18, 2017 at 2:20 PM, Condor  wrote:
> Hello ppl,
>
> a few years ago I asked the same question but did not receive valued answers
> and we use different way to realize the project.
> Today I wanna ask did some one do it and most important for me, can some one
> share his experience ?
> What I should expect, what is good and bad things that can be happened.
>
> Im thinking the problems can be occurred if server is restarted and data is
> not synced, but for that is raid cache battery.
> Also if hard drive need to be checked for bad clusters or broken index /
> files on filesystem what will happened with data?
> Because postgresql does not support data level encryption, Im wanna realize
> with third party tools.

The one and only time I setup a server to us LUKS was for a demo
laptop so that if it was lost our code / data / db etc etc were not
accessible. In that instance we didn't test for fsync reliability
because it was an easily recreateable system.

Generally speaking PostgreSQL expects "perfect" storage that writes
when it says it writes and doesn't present bad sectors to the database
to handle but rather maps such sectors out of the way silently without
data corruption.

I think the only real test here is to build a luks system, initiate
some pgbench type runs, wait a minute, run checkpoint and then yank
out the plug. Run a dozen or so times looking for data corruption.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Andreas Kretschmer



Am 19.06.2017 um 03:02 schrieb Patrick B:

Hi guys.

I just wanna understand the effective_io_concurrency value better.

My current Master database server has 16 vCPUS and I 
use effective_io_concurrency = 0.


What can be the benefits of increasing that number? Also, do you guys 
have any recommendations?



as far as i know, at the moment only bitmap-index-scans would benefit 
from higher values. You can try 16 or 32 as starting point.

(if you have a proper io-controller with cache)


(it has nothing to do with parallel execution of queries)


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general