Re: [GENERAL] What is exactly a schema?

2017-07-14 Thread Bill Moran
On Fri, 14 Jul 2017 08:59:13 -0300
marcelo <marcelo.nico...@gmail.com> wrote:

> Thank you.
> Now I have a related question.
> Could I select a specific schema in the connection string? Say, by 
> example database=mydb.schemanumbertwo ?
> 
> I'm asking this because I will be using Devart's dotConnect and Entity 
> developer to access the database. I have not full control, so I cannot 
> set the search path immediately after the connection.
> 
> If the first example is possible, I will replace the schema name on the 
> fly, before connection attempt.

I don't think you can do exactly what you're asking. However, you should
be able to achieve the same result by setting a default schema for the
user that you're connecting as. See the docs for ALTER ROLE and SET.

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strange case of database bloat

2017-07-05 Thread Bill Moran
On Wed, 5 Jul 2017 13:28:29 +0200
Chris Travers <chris.trav...@gmail.com> wrote:

> On Wed, Jul 5, 2017 at 1:00 PM, PT <wmo...@potentialtech.com> wrote:
> 
> > 2x the working size for a frequently updated table isn't terrible bloat.
> > Or are
> > you saying it grows 2x every 24 hours and keeps growing? The real question
> > is
> > how often the table is being vacuumed. How long have you let the
> > experiment run
> > for? Does the table find an equilibrium size where it stops growing? Have
> > you
> > turned on logging for autovacuum to see how often it actually runs on this
> > table?
> 
> If it were only twice it would not bother me.  The fact that it is twice
> after 24 hrs, 3x after 48 hrs and 4x after 72 hrs is alarming.

Ok, yup, that seems like an issue.

> > No unremovable rows does not indicate that autovaccum is keeping up. It
> > just
> > indicates that you don't have a problem with uncommitted transactions
> > holding
> > rows for long periods of time.
> 
> Right.  I should have specified that I also have not seen auto vacuum in
> pg_stat_activity with an unusual duration.

How long does it take when you run it manually? My experience is that autovac
can take orders of magnitude longer with the default cost delays, but just
becuase you don't see it, doesn't mean it's not happening. Turn on autovac
logging and check the logs after a few days.

> > Have you looked at tuning the autovacuum parameters for this table? More
> > frequent
> > vacuums should keep things more under control. However, if the write load
> > is
> > heavy, you'll probably want to lower autovacuum_vacuum_cost_delay.
> > Personally,
> > I feel like the default value for this should be 0, but there are likely
> > those
> > that would debate that. In any event, if that setting is too high it can
> > cause
> > autovacuum to take so long that it can't keep up. In theory, setting it
> > too low
> > can cause autovaccum to have a negative performance impact, but I've never
> > seen
> > that happen on modern hardware.
> 
> Most of the writes are periodic (hourly?) batch updates which are fairly
> big.

Not sure how that statement is related to the comments I made preceeding it.

> > But that's all speculation until you know how frequently autovacuum runs on
> > that table and how long it takes to do its work.
> 
> Given the other time I have seen similar behaviour, the question in my mind
> is why free pages near the beginning of the table don't seem to be re-used.

It's possible that the early pages don't have enough usable space for the 
updated
rows. Depending on your update patterns, you may end up with bloat scattered 
across
many pages, with no individual page having enough space to be reused. That seems
unlikely as the bloat becomes many times the used space, though.

The pg_freespacemap extension should be useful in determining if that's what's
happening. Combine that with turning on logging to ensure that autovacuum is
actually operating effectively. 

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] autovacuum holds exclusive lock on table preventing it from to be updated

2017-06-19 Thread Bill Moran
On Mon, 19 Jun 2017 17:33:23 +
Dmitry O Litvintsev <litvi...@fnal.gov> wrote:
> 
> The test stand where I was to test schema upgrade is stuck cuz vacuum is 
> blocking. 

If you're in "panic mode" I would recommend cancelling the existing vacuum,
running your upgrades, then immeditely running VACUUM FREEZE ANALYZE on that 
table to
make up for cancelling the autovacuum. Note that the manual VACUUM may take 
quite a while,
so run it in a screen session or something. Additionally, autovacuum is going 
to try to
relaunch that vacuum pretty aggressively, so you might have to cancel it a few 
times (using
pg_terminate_backend()) before your other processes are able to sneak in ahead 
of it.

Once you're out of panic mode you can take some time to breathe and consider 
your options
for reducing the issue in the future.

I have to 2nd Alvaro's comments about the cost delay. Personally, I'd recommend
setting vacuum_cost_delay to zero, unless your hardware is crap. In my recent
experience, allowing vacuum to run full-bore is less intrustive on a busy 
database
with good hardware than forcing it to take it's time. Unfortunately, changing it
now isn't going to speed the current vacuum up any.

Another comment: schema changes almost always need exclusive locks on tables
that they're modifying. As a result, you really need to plan them out a bit. 
Anything
could block a schema update, even a simple SELECT statement; so it's important 
to
check the health of things before starting. While it's not _generally_ a good 
practice
to interrupt autovacuum, it _can_ be done if the schema upgrade is necessary. 
Keep in
mind that it's just going to start back up again, but hopefully your schema 
update will
be done by then and it can do it's work without interfering with things. 
Another thing
you can do is to monitor the transaction ID values (the Nagios check_postgres 
has a nice
mode for monitoring this) and manually launch a VACUUM FREEZE ahead of 
autovacuum, so that
_you_ can pick the time for it to run and not have it happen to crop up at the 
worst possible
time ;)

You might also find that things are easier to deal with if you tweak the 
autovacuum
settings on this table to cause it to be vacuumed more frequently. In my 
experience, more
frequent vacuums that do less work each time often lead to happier databases. 
See ALTER
TABLE and the available settings to tweak autovacuum behavior.

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance issue with Pointcloud extension

2017-06-08 Thread Bill Moran
On Thu, 8 Jun 2017 18:00:04 +0200
Eric Lemoine <eric.lemo...@oslandia.com> wrote:

> We have a rather strange performance issue with the Pointcloud extension
> [*]. The issue/bug may be in the extension, but we don't know for sure
> at this point. I'm writing to the list to hopefully get some guidance on
> how to further debug this.
> 
> [*] <https://github.com/pgpointcloud/pointcloud>
> 
> A query takes around 250 ms when executed first on a database
> connection. But it takes like 3 s when executed after a first very
> simple Pointcloud query.
> 
> Below is a test-case with psql.
> 
> Case #1 (works normally):
> 
> psql (9.6.3)
> Type "help" for help.
> 
> lopocs=# \timing
> Timing is on.
> lopocs=# select points from public.sthelens where pc_intersects(points,
> st_geomfromtext('polygon ((-2357334.41980829 -3742654.00016992,
> -2356120.91980829 -3742654.00016992, -2356120.91980829
> -3741278.00016992, -2357334.41980829 -3741278.00016992,
> -2357334.41980829 -3742654.00016992))', 4978));
> Time: 236.423 ms
> 
> 
> Case #2 (works abnormally):
> 
> psql (9.6.3)
> Type "help" for help.
> 
> lopocs=# \timing
> Timing is on.
> lopocs=# select pc_typmod_pcid(1);
>  pc_typmod_pcid
> 
>   1
> (1 row)
> 
> Time: 4.917 ms
> lopocs=# select points from public.sthelens where pc_intersects(points,
> st_geomfromtext('polygon ((-2357334.41980829 -3742654.00016992,
> -2356120.91980829 -3742654.00016992, -2356120.91980829
> -3741278.00016992, -2357334.41980829 -3741278.00016992,
> -2357334.41980829 -3742654.00016992))', 4978));
> Time: 2987.491 ms
> 
> 
> The query takes 236 ms in case #1, and 2987 ms in case #2! Huge difference.
> 
> Anyone has any idea where this performance drop may come from? The
> problem may be in the Pointcloud in the extension, but I have no idea
> where the bug may be.
> 
> Any idea? Any suggestion on how to debug this? This has been driving us
> crazy for some time now.

Lots of missing information here ...

Is there an index on public.sthelens.points? How many rows are in that table?
What are your shared_buffers settings? How much RAM does the server have?
What does EXPLAIN look like for that query? How large (in bytes) are the
tables in question? What does pc_typmod_pcid() actually do?

There are probably lots of other questions I could ask, but those questions
are based on the fact that this _looks_ like a classic cache blowout. I.e.,
the query runs quickly when all the related data is in RAM, but is
significantly slower when the data has to be pulled from disk. Answering
the quesitons above will likely help to determine if my guess is correct.

If my guess is correct, there are any number of potential ways to improve
things: Add RAM to the machine, enlarge shared_buffers, put a geo index on
public.sthelens.points so it doesn't have to scan the entire table; as a
few examples.

Understanding what pc_typmod_pcid() actually does would help, but even 
without that you can test things in a few ways. One would be to substitute
a different query in your testing for select pc_typmod_pcid(1) that is
known to push the contents of public.sthelens out of memory and see if
the behavior is similar. Any count(*) query on some other large table
would probably suffice. A better way would probalby be to install the
pg_buffercache module and see what's actually in the cache at each step
of the testing process.

In any event, if your testing doesn't help any; you'll probably need to
include answers to at least the above questions before the list will be
much help.

That is, of course, unless someone familar with pointcloud has seen this
exact problem and already knows the answer ...

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Slow query plan used

2017-06-01 Thread Bill Moran
On Thu, 1 Jun 2017 16:45:17 +0200
Andreas Kretschmer <andr...@a-kretschmer.de> wrote:
> 
> Am 01.06.2017 um 14:07 schrieb Wetzel, Juergen (Juergen):
> >
> > Am 31.05.2017 um 13:27 schrieb Wetzel, Juergen (Juergen):
> >>>> Only 130 rows out of the 3 have ARCHIVED = 0
> >>> in this case i would suggest a partial index:
> >>> create index  on  (archived) where archived = 0;
> >> Thanks, Andreas.
> >>
> >> Sorry for the confusion about the table names.
> >> The hint with the partial index sounds as it could solve the problem. I 
> >> will test it.
> >>
> > Hi,
> >
> > I created now a partial index
> > create index on document (archived) where archived = '0';
> 
> just to be sure: this syntay is wrong, missing index-name. But it seems 
> the index is document_archived_idx ...
> 
> > But result is same as before: a short like expression included in doubled 
> > %-signs leads to a fast query plan whereas a longer like expression or use 
> > of single %-sign creates a much slower query. Please see below query plans. 
> > Most surprisingly to me is the influence of the like expression, especially 
> > the doubled %-sign on short expressions. Any other ideas how to speed up 
> > that query or what is going on here in general?

LIKE queries are probably challenging to plan, especially when they're not
left-anchored: how can the planner be reasonalbly expected to estimate how
many rows will be matched by a given LIKE expression.

Not having looked at the code, I would guess that the length of the LIKE
expression will make the planner assume that the match is more restrictive,
while many % and _ in the LIKE expression make the planner assume that the
match is less restrictive. Extrapolate that into guessing a number of matched
tuples and how that fits into the overall plan and you'll probaby give
yourself a brain anuerism. While having a detailed understanding of exactly
how the planner makes such decisions is certainly worthwhile, I would
recommend a more pragmatic approach: try things and see what works.

That in mind, let me throw pg_trgm into the mix of things to try:
https://www.postgresql.org/docs/current/static/pgtrgm.html
The trigram module allows you to create indexes that LIKE can use
to do index searches instead of always having to do sequential scans
or push the LIKE matching to another part of the plan tree. Based on
your described situation, I have a theory that it might improve things
quite a bit.

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Current best practice for maximum shared_buffers settings on big hardware?

2017-05-24 Thread Bill Moran
On Wed, 24 May 2017 21:02:45 +0800
stevenchang1213 <stevenchang1...@gmail.com> wrote:

> hello, at most 40% total memory, official doc also says so.

The docs say, "it is unlikely that an allocation of more than 40% of RAM
to shared_buffers will work better than a smaller amount" which is a far
cry from defining a maximum amount, or even warning that there are dangers
for high values. Also, that is in the same paragraph that starts with
"If you have a dedicated database server with 1GB or more of RAM ..." so
I don't need to check the RCS logs to predict that that paragraph hasn't
been updated in a while.

> you can testify it using pg_prewarm and pgfincore.

Not sure how those tools are going to predict whether I'm going to see
database stalls or other performance inversions from adding more
shared_buffers. For the purposes of seeing if shared_buffers are being
used effectively, I need only graph the block hits and misses to see
that a huge number of cache pages are satisfying requests, but also that
the amount of cache misses is still high enough for me to know that my
working set does _not_ fit in shared_buffers. What I _don't_ know is
whether increasing shared_buffers (to say, 128G on a 750G machine) is
still going to result in the same, weird performance inversion I saw
back in the 9.2 days.

> btw, numa supported? if so, extra care is necessary when starting db cluster.

Egad how I hate Linux's default NUMA policy. But I do know how to manage
it, and it's not part of the issue.

> 從我的 Samsung Galaxy 智慧型手機傳送。
>  原始訊息 自: Bill Moran <wmo...@potentialtech.com> 日期: 2017/5/24  
> 20:24  (GMT+08:00) 至: pgsql-general@postgresql.org 主旨: [GENERAL] Current best 
> practice for maximum shared_buffers settings on big hardware? 
> 
> A few years ago, I was working with "big" servers. At least, they were
> big for that age, with *128G* of RAM!!!1 Holy mackeral, right?!!?
> 
> Anyway, at that time, I tried allocating 64G to shared buffers and we
> had a bunch of problems with inconsistent performance, including "stall"
> periods where the database would stop responding for 2 or 3 seconds.
> After trying all sorts of tuning options that didn't help, the problem
> finally went away after reducing shared_buffers to 32G. I speculated, at
> the time, that the shared buffer code hit performance issues managing
> that much memory, but I never had the opportunity to really follow up
> on it.
> 
> Now, this was back in 2012 or thereabouts. Seems like another lifetime.
> Probably PostgreSQL 9.2 at that time.
> 
> Nowadays, 128G is a "medium sized" server. I just got access to one
> with 775G. It would appear that I could order from Dell with 1.5T of
> RAM if I'm willing to sell my house ...
> 
> Yet, all the docs and advice I'm able to find online seem to have been
> written pre 2008 and say things like "if your server has more than 1G
> of RAM ..."
> 
> I feel like it's time for a documentation update ;) But I, personally
> don't have the experience recently enough to know what sort of
> recommendations to make. 
> 
> What are people's experience with modern versions of Postgres on hardware
> this size? Do any of the experts have specific recommendations on large
> shared_buffers settings? Any developers care to comment on any work
> that's been done since 2012 to make large values work better?
> 
> -- 
> Bill Moran <wmo...@potentialtech.com>
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Current best practice for maximum shared_buffers settings on big hardware?

2017-05-24 Thread Bill Moran

A few years ago, I was working with "big" servers. At least, they were
big for that age, with *128G* of RAM!!!1 Holy mackeral, right?!!?

Anyway, at that time, I tried allocating 64G to shared buffers and we
had a bunch of problems with inconsistent performance, including "stall"
periods where the database would stop responding for 2 or 3 seconds.
After trying all sorts of tuning options that didn't help, the problem
finally went away after reducing shared_buffers to 32G. I speculated, at
the time, that the shared buffer code hit performance issues managing
that much memory, but I never had the opportunity to really follow up
on it.

Now, this was back in 2012 or thereabouts. Seems like another lifetime.
Probably PostgreSQL 9.2 at that time.

Nowadays, 128G is a "medium sized" server. I just got access to one
with 775G. It would appear that I could order from Dell with 1.5T of
RAM if I'm willing to sell my house ...

Yet, all the docs and advice I'm able to find online seem to have been
written pre 2008 and say things like "if your server has more than 1G
of RAM ..."

I feel like it's time for a documentation update ;) But I, personally
don't have the experience recently enough to know what sort of
recommendations to make. 

What are people's experience with modern versions of Postgres on hardware
this size? Do any of the experts have specific recommendations on large
shared_buffers settings? Any developers care to comment on any work
that's been done since 2012 to make large values work better?

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can PG replace redis, amqp, s3 in the future?

2017-05-05 Thread Bill Moran
On Fri, 5 May 2017 19:52:42 +0100
Tony Finch <d...@dotat.at> wrote:

> Bill Moran <wmo...@potentialtech.com> wrote:
> >
> > There's a well-written article I saw recently that directly addresses
> > your question ... I'm too lazy to find it, but google will probably
> > turn it up for you.
> 
> This? http://renesd.blogspot.co.uk/2017/02/is-postgresql-good-enough.html

Oh, hey! You found it, thanks!

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can PG replace redis, amqp, s3 in the future?

2017-05-01 Thread Bill Moran
On Mon, 1 May 2017 22:05:03 +0200
Thomas Güttler <guettl...@thomas-guettler.de> wrote:
> > 
> > There's a well-written article I saw recently that directly addresses
> > your question ... I'm too lazy to find it, but google will probably
> > turn it up for you.
> 
> I tried to find it, but failed. Can you give me some keywords to find
> this well-written article?

I can't seem find it again. Sorry.

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can PG replace redis, amqp, s3 in the future?

2017-05-01 Thread Bill Moran
On Mon, 1 May 2017 22:02:15 +0200
Thomas Güttler <guettl...@thomas-guettler.de> wrote:
> Leaving two things open:
> 
>  - blob storage
>  - redis/caching

I've used Postgres for both of these purposes, and at the load
level we were experiencing at the time, it worked fine.

We later implemented Redis when our caching requirements exceeded
what Postgres could do in that capacity. We never switched to
anything else for blob storage, as Postgres was always sufficient.

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can PG replace redis, amqp, s3 in the future?

2017-04-30 Thread Bill Moran
On Sun, 30 Apr 2017 13:37:02 +0200
Thomas Güttler <guettl...@thomas-guettler.de> wrote:

> Is is possible that PostgreSQL will replace these building blocks in the 
> future?
>  
>  - redis (Caching)
>  - rabbitmq (amqp)
>  - s3 (Blob storage)
> 
> One question is "is it possible?", then next "is it feasible?"
> 
> I think it would be great if I could use PG only and if I could
> avoid the other types of servers.
> 
> The benefit is not very obvious on the first sight. I think it will saves you
> time, money and energy only in the long run.
> 
> What do you think?

There's a well-written article I saw recently that directly addresses
your question ... I'm too lazy to find it, but google will probably
turn it up for you.

The upshot is that Postgres does a lot of things well, but when the need
comes up to do them _REALLY_ well, you're generally better off picking a
tool that's specialized for your needs.

Take a message bus for example. PG's notify works pretty damn well as a
centralized message bus. But if you need a distributed message bus or you
need massive throughput, you're almost certainly better of with something
specifically designed for that purpose.

Of course, if you need structured, relational data to be stored reliably,
you can't do much better than Postgres.

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] WAL being written during SELECT * query

2017-04-10 Thread Bill Moran

> >> On Tue, Apr 4, 2017 at 9:46 AM, Tom DalPozzo <t.dalpo...@gmail.com>
> >> wrote:
> >> > Hi,
> >> > I have a very big table (10GB).
> >> > I noticed that many WAL segments are being written when elaborating read
> >> > only transactions like this:
> >> > select * from dati256 where id >4300 limit 100;
> >> > I don't understand why are there WAL writings during read only
> >> transactions.

These are hint bits. The mechanism and behavior are known and documented:
https://wiki.postgresql.org/wiki/Hint_Bits

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Aggregate query on large tables

2017-04-09 Thread Bill Moran
On Sun, 9 Apr 2017 17:05:56 +0200
Job <j...@colliniconsulting.it> wrote:

> Hi,
> 
> i have a table with about 400 millions of rows and i need to build some 
> aggregate function for reporting.
> I noticed that query performances are slowing down, even though indexes are 
> present.
> 
> Query is simple (i make an example, my table is in italian language):
> select a,sum(b) from table where a=x and c=y group by a
> 
> a is a varchar
> b is an integer
> 
> x and y are two field i use for filter results.
> 
> I tried to create different indexes to try to speed up performance:
> index1 (a)
> index2 (c)
> index3 (a,c)
> 
> I noticed, with query planner, that the mulfi-field index is not used.
> Postgresql 9.6.1 still use scan without indexes.
> 
> I obtain significant improvements only if i create a materialized view with 
> aggregated data.

It helps a lot if you provide EXPLAIN output for questions like this. Also,
definitions of what you consider "fast" and "slow" are helpful, as everyone
seems to have a different opinion on what those words mean.

However, my guess is that your WHERE condition isn't significantly restrictive
to make use of the index worth the time. If you'll be fetching a significant
percentage of the rows anyway, using the index would actually slow things
down.

You _might_ get better performance if you create an index on (a,c,b) which
would allow the query to run without ever needing to access the actual
table; but I'm just speculating.

In my experience, queries like these rarely benefit from filter indexes,
because most of the time involved is in the grouping and aggregate processing,
and the index does nothing to help with that. But, again, without EXPLAIN
output I'm only speculating.

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Keycloak and Postgres

2017-04-05 Thread Bill Moran
On Wed, 5 Apr 2017 07:24:32 +
Marc Tempelmeier <marc.tempelme...@flane.de> wrote:
> 
> Can you elaborate a bit on this part:
> " Because of how Postgres caches changes, you may find that a failover 
> requires some time in recovery mode."

https://www.postgresql.org/docs/9.6/static/wal-intro.html

The WAL requires that any unexpected shutdown of Postgres (where it doesn't get 
to explicitly
flush data pages to disk) go through a recovery cycle to fix anything in the 
WAL that is not
yet in the data pages.

Doing disk level replication and using that as a failover essentially 
duplicates a crash
on the PostgreSQL end when you failover.

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Keycloak and Postgres

2017-04-01 Thread Bill Moran
On Thu, 30 Mar 2017 13:58:36 +
Marc Tempelmeier <marc.tempelme...@flane.de> wrote:

> Hi,
> 
> I have a replication question, we have some big Cisco UCS VM thingy, where 
> VMs are snapshotted, the drives are abstracted etc. If a VM crashes it will 
> be resumed in 1 min from another rack. What brings us master slave 
> replication or some other kind of replication in this setup? Should we do it 
> because of other failures?

Because of how Postgres caches changes, you may find that a failover
requires some time in recovery mode. Those VM snapshot systems are great,
but they aren't quite perfect if they don't know what is being done with
the data on the drives.

Whether it's good enough depends heavily on what your expectation is.
Before trusting it to meet your needs, I would spend some time simulating
failures and seeing what actually happens.

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] HotSync Replicate doubt?

2017-03-30 Thread Bill Moran
On Thu, 30 Mar 2017 09:39:40 -0700
Periko Support <pheriko.supp...@gmail.com> wrote:

> Hi guys.
> 
> I had some doubt about PSQL internal replication mechanics(Streaming).
> 
> If I setup a Master-Slave Hot StandBy.
> 
> The slave is on a Read-Only stage, but we can query the server.
> 
> If a user create a table in the database:
> 
> CREATE DATABASE mynewdb;
> CREATE TABLE mytbale-name + fields.
> 
> Or if changes a table with new fields or remove fields.
> 
> Does the replication will send this commands to the SLAVE without user
> intervention?
> 
> I run bucardo for replication, but this feature doesn't work, we need
> to manually do it on the
> SLAVE and some other steps to have both DB sync.
> 
> We already know how to do it, bucardo works.
> 
> Just wondering if PSQL can handle this automatically?

Postgres' built-in streaming replication _does_ replicate this automatically.

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autovacuum stuck for hours, blocking queries

2017-02-23 Thread Bill Moran
On Wed, 22 Feb 2017 13:19:11 -0800
Jeff Janes <jeff.ja...@gmail.com> wrote:

> On Mon, Feb 20, 2017 at 5:40 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
> >
> > On Thursday, February 16, 2017, Tom Lane <t...@sss.pgh.pa.us> wrote:
> >
> >> Tim Bellis <tim.bel...@metaswitch.com> writes:
> >> > Even though this is a read only query, is it also expected to be
> >> blocked behind the vacuum? Is there a way of getting indexes for a table
> >> which won't be blocked behind a vacuum?
> >>
> >> It's not the vacuum that's blocking your read-only queries.  It's the
> >> ALTER TABLE, which needs an exclusive lock in order to alter the table's
> >> schema.  The ALTER is queued waiting for the vacuum to finish, and lesser
> >> lock requests queue up behind it.  We could let the non-exclusive lock
> >> requests go ahead of the ALTER, but that would create a severe risk of the
> >> ALTER *never* getting to run.
> >>
> >> I'd kill the ALTER and figure on trying again after the vacuum is done.
> >>
> >>
> > I've been drilled by this and similar lock stacking issues enough times to
> > make me near 100% sure deferring the ALTER would be the better choice
> >
> >
> This seems like a rather one-sided observation.  How could you know how
> often the unimplemented behavior also would have "drilled" you, since it is
> unimplemented?
> 
> There was a proposal for a "LOCK TABLE...DEFERABLE" which would allow other
> requestors jump the queue if they were compatible with the held lock.  If
> that is implemented, then you would just manually lock the table deferably
> before invoking the ALTER TABLE command, if that is the behavior you wanted
> (but it wouldn't work for things that can't be run in transactions)

This seems redundant to me.

We already have LOCK ... NOWAIT. It's fairly trivial to write an upgrade
script that uses LOCK to explicitly lock tables that it's going to ALTER,
then busy-waits if the lock is not immediately grantable.

The fact that so many ORMs and similar tools don't take advantage of that
functionality is rather depressing.

In my experience, I've also seen heavily loaded systems that this wouldn't
work on, essentially because there is _always_ _some_ lock on every table.
This is a case where experienced developers are required to take some
extra time to coordinate their upgrades to work around the high load. But
the proposed LOCK TABLE ... DEFERRABLE wouldn't help there either, because
the ALTER would be deferred indefinitely.

Personally, I feel like the existing behavior is preferrable. Software
teams need to take the time to understand the locking implications of their
actions or they'll have nothing but trouble anyway.

As I've seen time and again: writing an application that handles low load
and low concurrency is fairly trivial, but scaling that app up to high
load and/or high concurrency generally sorts out the truely brilliant
developers from the merely average.

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Running out of memory the hard way ...

2017-02-08 Thread Bill Moran
On Wed, 08 Feb 2017 10:44:24 -0500
Tom Lane <t...@sss.pgh.pa.us> wrote:

> Albe Laurenz <laurenz.a...@wien.gv.at> writes:
> > Bill Moran wrote:
> >> What I feel is the best way to mitigate the situation, is to have some
> >> setting that limits the maximum RAM any backend can consume.
> 
> > I'd delegate that problem to the operating system which, after all,
> > should know best of all how much memory a process uses.
> 
> I've had some success using ulimit in the past, although it does have
> the disadvantage that you have to impose the same limit on every PG
> process.  (You set it before starting the postmaster and it inherits
> to every child process.)  If memory serves, limiting with the -v switch
> works better than -d or -m on Linux; but I might be misremembering.
> Conceivably we could add code to let the ulimit be set per-process,
> if the use-case were strong enough.

Thanks, Tom. I'm not sure why I didn't think to use this. Although part
of the problem may be that most of the links that come up from a google
search on this topic don't seem to have this suggestion.

Hopefully having this in the list archives will make the search easier
for the next person who has this issue. Does anyone know if there are
any suggestions to this effect in the official documentation? If not,
I'll try to make some time to submit a patch.

> To implement a limit inside PG, we'd have to add expensive bookkeeping
> to the palloc/pfree mechanism, and even that would be no panacea because
> it would fail to account for memory allocated directly from malloc.
> Hence, you could be pretty certain that it would be wildly inaccurate
> for sessions using third-party code such as PostGIS or Python.  An
> OS-enforced limit definitely sounds better from here.

Unfortunate but understandable.

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Running out of memory the hard way ...

2017-02-08 Thread Bill Moran

If you run a transaction with lots of server side functions that use a
lot of memory, this can trigger the OOM killer in Linux, causing the
PosgreSQL backend to receive a SIGKILL and all the associated bad 
stuff.

Tuning the OOM killer is not sufficient. No setting I've found for the
OOM killer will guarantee that it won't SIGKILL a process that's essentially
untenable anyway (because it's going to use more memory than actually
exists on the system at some point anyway). Additionally, "add more RAM"
doesn't "solve" the problem, it only delays it until datasets
scale up to even larger transactions that use even more memory.

This is particularly prevelent with Postgis, because some Postgis functions
are very memory intesive, but I'd be willing to bet real money that I could
trigger it with just about any stored procedure that allocates memory in
such as way that it doesn't get reclaimed until the transaction completes.
See as an example: https://trac.osgeo.org/postgis/ticket/3445
If anyone wants to investigate this but is having trouble reproducing, I
can construct specific failure scenarios fairly easily.

Another workaround is to run the offending statements in smaller
transactional batches. This is the best solution I've found so far, but
it's not quite ideal. In particular it requires the client program to
reimplement transaction guarantees on the client side. Sometimes this
isn't necessary, but other times it is.

What I feel is the best way to mitigate the situation, is to have some
setting that limits the maximum RAM any backend can consume. Attempting to
exceed this limit would cause an error and rollback for that particular
backend without affecting other backends. This would provide information
to the client process that can be rationally interpreted by client code
to result in either an error that a developer can understand, or possibly
adaptive code that changes behavior to accomodate limits on the server
side.

My first question: does this setting exist somewhere and I'm simply not
finding it for some reason?

Assuming this doesn't exist (I haven't found it) my next question is
whether there's a philosophical or technical reason that such a feature
doesn't exist? Should I take this discussion to -hackers?

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] pg_repack and Postgres versions > 9.4

2016-12-16 Thread Bill Moran

Does anyone have experience using pg_repack on Postgres versions > 9.4?
Specifically 9.5, but probably 9.6 at some point.

The documentation claims it supports up to 9.4. I haven't looked at it
closely enough to guess whether there might be changes in 9.5/9.6 to
cause it not to work any more.

Anyone know? Or, alternatively, anyone have another option to get the
same job done?

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Unexplained statistics reset? Help tracking it down.

2016-12-09 Thread Bill Moran

I've been seeing some unexplained behavior whereas the statistics in a Postgres
database reset with no explanation as to why. This is concerning because it
results in terrible query plans until someone manually runs analyze, and that
it negatively impacts autovacuum.

This is Postgres 9.5.4 on Ubuntu 14 (yes, I know there's a patch I need to
apply -- it's on my TODO list, but I don't see any mention of fixing unexplained
stats resets in that version, so I'm not assuming that will fix it)

My first thought is that some random user was calling pg_reset_stats() without
realizing what they were doing. However, I have full query logging enabled on
this system, and the logs don't show this happening. (Yes, I've also checked
for someone disabling query logging for their connection before doing 
myseterious
things).

Before I start theorizing that this might be a bug in Postgres, does anyone have
any suggestions on what other ways the stats could be reset that I need to check
on? Has anyone else experienced this to lend credence to the possibility that 
it's
a bug? I have no clue how to reproduce it, as the occurrance is rare and still
seems random.

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] The consequenses of interrupted vacuum

2016-10-27 Thread Bill Moran
On Thu, 27 Oct 2016 10:44:03 -0400
Tom Lane <t...@sss.pgh.pa.us> wrote:

> > I'm asking for cases of large tables where autovacuum frequently gets
> > interrupted. I'm trying to understand if the partial runs are at least
> > making _some_ progress so the next vacuum has less to do, or if this is
> > a serious problem that I need to fiddle with tuning to fix.
> 
> It's probably making some progress but not much.  You need to fix that.

Thanks for the feedback. The good news is that grepping through recent logs,
I'm not seeing the problem any more. So I must have just noticed it on a
particularly problematic day last time I looked.

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] The consequenses of interrupted vacuum

2016-10-27 Thread Bill Moran

Very specific question:
Does interrupting a VACUUM (not FULL) rollback all the work it has done
so far, or is the work done on a page by page basis such that at least
some of the pages in the table have been vacuumed?

I'm asking for cases of large tables where autovacuum frequently gets
interrupted. I'm trying to understand if the partial runs are at least
making _some_ progress so the next vacuum has less to do, or if this is
a serious problem that I need to fiddle with tuning to fix.

-- 
Bill Moran <wmo...@potentialtech.com>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Protect a table against concurrent data changes while allowing to vacuum it

2016-06-22 Thread Bill Moran
On Wed, 22 Jun 2016 10:20:38 +
Sameer Kumar <sameer.ku...@ashnik.com> wrote:

> On Wed, Jun 22, 2016 at 6:08 PM Vlad Arkhipov <arhi...@dc.baikal.ru> wrote:
> 
> > I am running PostgreSQL 9.5.
> >
> > CREATE TABLE t (id BIGINT NOT NULL PRIMARY KEY, name TEXT);
> >
> > The constraint that the data must satisfy is `there is no more than 3
> > records with the same name`.
> >
> > I am not in control of queries that modify the table, so advisory locks
> > can hardly be of help to me.
> >
> 
> Define a function which does a count of the rows and if count is 3 it
> return false if count is less it returns true.

An exclusion constraint might be a better solution.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Thoughts on "Love Your Database"

2016-05-04 Thread Bill Moran
On Tue, 3 May 2016 23:11:06 -0500
Guyren Howe <guy...@gmail.com> wrote:

> I've long been frustrated with how most web developers I meet have no idea 
> how to use an SQL database properly. I think I'm going to write a book called 
> Love Your Database, aimed at web developers, that explains how to make their 
> apps better by leveraging the power of SQL in general, and Postgres in 
> particular.
> 
> I'm thinking of a section on features of SQL most folks don't know about 
> (CTEs are *way* to hell at the top of that list, but also EXCEPT/INTERSECT 
> and window functions), but much of the book would be about how to do things 
> server side. Benchmarks showing how much faster this can be, but mostly 
> techniques ? stored procedures/triggers/rules, views.
> 
> I asked a colleague about the advice I often hear stated but seldom 
> justified, that one shouldn't put business rules in the database. He offered 
> that server-side code can be hard to debug.
> 
> I'm sure many here would love to see such a book published, maybe some talks 
> on the topic given.
> 
> 
> What might I cover that I haven't mentioned? What are the usual objections to 
> server-side code and how can they be met? When *are* they justified and what 
> should the criteria be to put code in Postgres? Any other thoughts? Any other 
> websites or books on the topic I might consult?

Not a specific topic, but as a general theme, a lot of developers don't
seem to think it's useful for them to know SQL, and therefore don't
bother trying -- or even actively resist learning.

So if the overall theme is "knowing this makes things better", I would
buy multiple copies of the book an mysteriously leave it on various
developer's desks.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] (VERY) Slow Query - PostgreSQL 9.2

2016-05-03 Thread Bill Moran
On Tue, 3 May 2016 21:55:21 +1200
"drum.lu...@gmail.com" <drum.lu...@gmail.com> wrote:

> Hi all,
> 
> I'm trying to get the query below a better performance.. but just don't
> know what else I can do...
> 
> Please, have a look and let me know if you can help somehow.. also.. if you
> need some extra data jet ask me please.
> 
> * Note that the gorfs.inode_segments table is 1.7TB size
> 
> I have the following Query:
> 
> explain analyzeSELECT split_part(full_path, '/', 4)::INT AS account_id,
>split_part(full_path, '/', 6)::INT AS note_id,
>split_part(full_path, '/', 9)::TEXT AS variation,
>st_size,
>segment_index,
>reverse(split_part(reverse(full_path), '/', 1)) as file_name,
>i.st_ino,
>full_path,
>(i.st_size / 100::FLOAT)::NUMERIC(5,2) || 'MB' AS
> size_mbFROM gorfs.inodes iJOIN gorfs.inode_segments s
>   ON i.st_ino = s.st_ino_targetWHERE
>   i.checksum_md5 IS NOT NULL
>   AND 
>   AND i.st_size > 0;
>   split_part(s.full_path, '/', 4)::INT IN (
> SELECT account.id
> FROM public.ja_clients AS account
> WHERE
> NOT (
> ((account.last_sub_pay > EXTRACT('epoch' FROM
> (transaction_timestamp() - CAST('4 Months' AS INTERVAL AND
> (account.price_model > 0)) OR
> (account.regdate > EXTRACT('epoch' FROM
> (transaction_timestamp() - CAST('3 Month' AS INTERVAL OR
> (((account.price_model = 0) AND (account.jobcredits >
> 0)) AND (account.last_login > EXTRACT('epoch' FROM
> (transaction_timestamp() - CAST('4 Month' AS INTERVAL)
> ) LIMIT 100);
> 
> 
>- Explain analyze link: http://explain.depesz.com/s/Oc6
> 
> The query is taking ages, and I can't get the problem solved.
> 
> These are the index I've already created on the inode_segments table:
> 
> Indexes:
> "ix_account_id_from_full_path" "btree"
> (("split_part"("full_path"::"text", '/'::"text", 4)::integer)) WHERE
> "full_path"::"text" ~
> '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+'::"text"
> "ix_inode_segments_ja_files_lookup" "btree" ((CASE
> WHEN "full_path"::"text" ~ '/[^/]*\.[^/]*$'::"text" THEN
> "upper"("regexp_replace"("full_path"::"text", '.*\.'::"text",
> ''::"text", 'g'::"text"))
> ELSE NULL::"text"END)) WHERE
> "gorfs"."is_kaminski_note_path"("full_path"::"text")
> "ix_inode_segments_notes_clientids" "btree"
> (("split_part"("full_path"::"text", '/'::"text", 4)::integer)) WHERE
> "gorfs"."is_kaminski_note_path"("full_path"::"text")
> "ix_inode_segments_notes_clientids2" "btree" ("full_path")
> "ix_inode_segments_notes_fileids" "btree"
> (("split_part"("full_path"::"text", '/'::"text", 8)::integer)) WHERE
> "gorfs"."is_kaminski_note_path"("full_path"::"text")
> "ix_inode_segments_notes_noteids" "btree"
> ((NULLIF("split_part"("full_path"::"text", '/'::"text", 6),
> 'unassigned'::"text")::integer)) WHERE
> "gorfs"."is_kaminski_note_path"("full_path"::"text")
> 
> These are the index I've already created on the inodes table:
> 
>  Indexes:
> "ix_inodes_checksum_st_size" "btree" ("checksum_md5", "st_size")
> WHERE "checksum_md5" IS NOT NULL
> 
> *Question:*
> 
> What else can I do to improve the Performance of the Query?

>From the explain, it looks like the biggest pain point is the
inode_segments table, specifically, this condition:
s.full_path ~ '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+'
It's doing a full scan of every record in that table, which is
a large number, and that regex can't be cheap over that kind of
volume.

If you do:
SELECT count(*)
FROM inode_segments
WHERE full_path ~ '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+';
how many tuples actually match that condition? If the number
is a large percentage of the total table, then I'm not sure
how to help you, but if the percentage is small, you might
be able to speed things up by adding an index:

CREATE INDEX is_fp_trunc_idx ON inode_segments(substring(full_path FROM 1 FOR 
19));

Then adding this condition to the where clause:
substring(s.full_path FROM 1 FOR 19) = '/userfiles/account/'

There are other index combinations that may help as well,
depending on the nature of the values in that table, but,
in general, anything you can do to reduce the number of
records that have to be examined in that table is liable
to speed things up.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Enhancement request for pg_dump

2016-04-17 Thread Bill Moran
On Sun, 17 Apr 2016 14:10:50 -0600
Sergei Agalakov <sergei.agala...@getmyle.com> wrote:

> I don't see how these questions are related to the proposed pg_dump 
> improvement.
> I suggest to improve pg_dump so it can be used instead of the third 
> party tools like DBSteward and SQLWorkbench/J etc.
> to compare two different databases or existing dumps, and to identify 
> the differences. The use cases will be exactly
> the same as for the third party tools. The positive difference will be 
> that pg_dump is a very reliable, always available and supports all the 
> latest PostgreSQL features.
> Do you imply that there shouldn't be any reasons to compare different 
> databases to find the differences between them?

Nobody has weighed in on this, but I have a theory ...

I (personally) worry that adding features like you suggest to pg_dump
would interfere with its ability to perform complete dump of a large
database in a _rapid_ manner. Using pg_dump as a backup tool has an
inherent desire for the tool to be as fast and low-impact on the
operation of the database as possible.

Features that would force pg_dump to care about ordering that isn't
necessary to its core functionality of providing a reliable backup
are liable to slow it down. They might also overcomplicate it, making
it more difficult to maintain reliably.

When you consider that possibility, and the fact that pg_dump isn't
_supposed_ to be a tool to help you with schema maintenance, it's easy
to see why someone would look for different approach to the problem.

And I feel that's what all the answers have attempted to do: suggest
ways to get what you want without asking them to be implemented in a
tool that isn't really the right place for them anyway. While your
arguments toward making this change are valid, I'm not sure that
they are compelling enough to justify adding a feature where it
doesn't really belong.

Another side to this, is that your request suggests that your
development process is suboptimal. Of course, I can't be 100% sure
since you haven't explained your process ... but my experience is
that people who feel the need to automagically sync prod and dev
databases have a suboptimal development process. Thus, the suggestions
are also biased toward helping you improve your process instead of
adjusting a tool to better support a suboptimal process.

Of course, if the people actually doing the work on the code disagree
with me, then they'll make the change. I'm just expressing an opinion.

> Sergei
> 
> > > On Apr 17, 2016, at 12:41 PM, Sergei Agalakov 
> > > <Sergei(dot)Agalakov(at)getmyle(dot)com> wrote:
> > >
> > > I know about DBSteward. I don't like to bring PHP infrastructure only to 
> > > be able to compare two dumps,
> > > and to deal with potential bugs in the third party tools. The pg_dump in 
> > > other hand is always here, and is always trusted.
> > > SQLWorkbench/J also can compare two schemas, and requires only Java. 
> > > Again, I trust pg_dump more.
> > >http://www.sql-workbench.net/
> > >
> > > May be pg_dump was never INTENDED to generate the dump files with the 
> > > determined order of the statements,
> > > but it CAN do it with the minor changes, and be more useful to 
> > > administrators. Why rely on the third party tools
> > > for the tasks that can be done with the native, trusted tools?
> > >
> > > Sergei
> > Does it matter if they differ if you cannot recreate the correct one 
> > exactly from source-controllled DDL?  Or know how they are supposed to 
> > differ if this is a migration point?
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Enhancement request for pg_dump

2016-04-17 Thread Bill Moran
On Sat, 16 Apr 2016 13:33:21 -0600
Sergei Agalakov <sergei.agala...@getmyle.com> wrote:

> Hi,
> 
> Currently as in PG 9.4, 9.5 the order of the statements in the script 
> produced by pg_dump is uncertain even for the same versions of the 
> databases and pg_dump.
> One database may script grants like
> 
> REVOKE ALL ON TABLE contracttype FROM PUBLIC;
> REVOKE ALL ON TABLE contracttype FROM madmin;
> GRANT ALL ON TABLE contracttype TO madmin;
> GRANT SELECT ON TABLE contracttype TO mro;
> GRANT SELECT,INSERT,DELETE,UPDATE ON TABLE contracttype TO musers;
> 
> and the other may change the order of grants like
> 
> REVOKE ALL ON TABLE contracttype FROM PUBLIC;
> REVOKE ALL ON TABLE contracttype FROM madmin;
> GRANT ALL ON TABLE contracttype TO madmin;
> GRANT SELECT,INSERT,DELETE,UPDATE ON TABLE contracttype TO musers;
> GRANT SELECT ON TABLE contracttype TO mro;
> 
> It complicates the usage of pg_dump to compare the structures of the two 
> similar databases like DEV and PROD, two development branches etc.

I don't think pg_dump was ever intended to serve that purpose.

dbsteward, on the other hand, does what you want:
https://github.com/nkiraly/DBSteward/wiki

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] what database schema version management system to use?

2016-04-06 Thread Bill Moran
On Wed, 6 Apr 2016 11:55:40 +0100
Alexey Bashtanov <bashta...@imap.cc> wrote:

> Hi all,
> 
> I am searching for a proper database schema version management system.
> 
> My criteria are the following:
> 0) Open-source, supports postgresql
> 1) Uses psql to execute changesets (to have no problems with COPY, 
> transaction management or sophisticated DDL commands, and to benefit 
> from scripting)
> 2) Support repeatable migrations (SQL files that get applied every time 
> they are changed, it is useful for functions or views tracking).
> 
> Reasonable?
> 
> But unfortunately it looks like each of Liquibase, Flyway, SqlHawk, 
> MigrateDB, Schema-evolution-manager, Depesz's Versioning, Alembic and 
> Sqitch does not satisfy some of those, right?
> 
> What DB VCS do you use and how does it related with the criteria listed 
> above?
> Do you have any idea what other systems to try?

http://dbsteward.org/

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] database corruption

2016-02-12 Thread Bill Moran
On Fri, 12 Feb 2016 10:56:04 +0100
"Oliver Stöneberg" <olive...@online.de> wrote:

> We are running a 64-bit PostgreSQL 9.4.5 server on Windows Server 
> 2012. The system is a virtual machine on a VMware ESX 6.0 server and 
> has 24 GB of memory. The database server is only accessed locally by 
> two services and there is only a single database in the server. The 
> disk is located on a storage that is shared with lots of other 
> servers. The database server has fsync enabled.
> 
> A few weeks ago we already had a data corruption when the disk was 
> full. There are other services running on the same machine that could 
> cause the disk to fill up (e.g. local chaching when the network is 
> acting up). It happened a few times so far but the database was never 
> compromised. In that case thought it was but fortunately we only lost 
> a huge table/toast (300+ GB) that has very verbose data stored which 
> is not essential. That happened with an earlier 9.4 version.
> 
> Today we encountered another data corruption after the disk was full. 
> It's much worse this time around since data that is essential for the 
> applications using it to run. After truncating that 300+ GB table 
> already mentioned above all the services were restarted and one of 
> the applications failed to start with the following database error:
> 
> Caused by: org.postgresql.util.PSQLException: ERROR: invalid memory alloc 
> request size 18446744073709551613
>   at 
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2182)
>   at 
> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1911)
>   at 
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:173)
>   at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:622)
>   at 
> org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:472)
>   at org.postgresql.jdbc.PgStatement.executeQuery(PgStatement.java:386)
>   at 
> joulex.cloud.data.tables.JxObjectStoreJsonTable.getAllJsonsAsHashMap(JxObjectStoreJsonTable.java:215)
>   ... 16 more
> 
> Checking the database logs the problems seem to have started two days 
> ago:
> 2016-02-10 16:00:34 GMTERROR:  invalid page in block 1255 of relation 
> base/16387/3634911224
> 2016-02-10 16:00:34 GMTCONTEXT:  automatic vacuum of table 
> "cloud.t_63d1c1eb806f4e25904b659b75176093.y2016_utilization"
> 
> For some reason it doesn't seem to be caused by the disk being full 
> since the database server was still able to write another 8 GB of log 
> files for the next two days and the system was still working 
> yesterday afternoon.
> It also doesn't appear to be a hardware problem since all the other 
> systems sharing the virtual hostsystem and the storage show no issues 
> at all.
> 
> Unfortunately we don't have a recent backup of the database (a tool 
> to back up all the relevant data was just finished recently and was 
> not set up for this system yet).
> 
> Something else worth noting is that we disabled the autovacuum on the 
> toast table of the 300+ GB table since we perform INSERT INTO on that 
> tbale and the vacuum on the table was causing a performance hit. The 
> autovacuum for it is still being performed to prevent wraparound from 
> time to and that autovacuum was still running after the machine run 
> out of disk space and the services was restarted.
> 
> Any help in recovering the data is appreciated and if there is more 
> information necessary on this I will try to provide it. Thanks in 
> advance.

You most likely have byte-level corruption. If you need that data back,
your best bet is to hire a company with PostgreSQL experts who know
the structure of how the data is stored on disk and can manipulate
the files directly to recover whatever hasn't been destroyed. If you
want to do it yourself, it will require you to understand the actual
byte sequences as they are stored on disk, as well as the system
PostgreSQL uses to identify database pages within the file system.
First you will have to indentify the file that contains the corrupt
page, then you will have to modify the bytes in the page to make the
page non-corrupt. (do this with Postgres shut down) In any event,
that sort of thing is touchy work, even if you do understand it well,
so make sure you have a full copy of all database files so you can
roll back if you make things worse.

Long term, you need to fix your hardware. Postgres doesn't corrupt
itself just because the disks fill up, so your hardware must be lying
about what writes completed successfully, otherwise, Postgres would
be able to recover after a restart.

Beyond that, running Postgres on a filesystem that frequently fills up
is going to be problematic all a

Re: [GENERAL] workarounds for ci_text

2016-02-04 Thread Bill Moran
On Thu, 4 Feb 2016 13:48:37 +0200
Heine Ferreira <heine.ferre...@gmail.com> wrote:
> 
> As far as I can tell, you can't restrict the length of a ci_text field like
> char(5) or varchar(5)?
> Let's say you got a stock table and you want an alphanumeric stock
> code that is the primary key but it must be case insensitive can I do
> something like this:
> 
> create table stock(stock code varchar(5), stock_desc varchar(50))
> primary key pk_stock_code ilike(stock_code)

You may be better served by using a check constraint to enforce
the length limit, or even creating a domain:
http://www.postgresql.org/docs/9.5/static/ddl-constraints.html
http://www.postgresql.org/docs/9.5/static/sql-createdomain.html

But that depends on whether you're only trying to enforce the
uniqueness or if you want things such as case insensative matching
of the key. Your approach will only give you the former, whereas
CITEXT will give you both.

I don't think your syntax will work, though. I'm guessing that
PRIMARY KEY pk_stock_code lower(stock_code) will, though.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] strange sql behavior

2016-02-01 Thread Bill Moran
t; >> Results for the small table: it took 141 seconds to finish.  The planning
> >> time is 85256.31
> >>
> >> "Sort  (cost=85201.05..85256.31 rows=22101 width=55) (actual
> >> time=141419.499..141420.025 rows=20288 loops=1)"
> >> "  Sort Key: "time""
> >> "  Sort Method: quicksort  Memory: 3622kB"
> >> "  Buffers: shared hit=92 read=19816"
> >> "  ->  Bitmap Heap Scan on data2013_01w  (cost=515.86..83606.27
> >> rows=22101 width=55) (actual time=50.762..141374.777 rows=20288 loops=1)"
> >> "Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
> >> "Heap Blocks: exact=19826"
> >> "Buffers: shared hit=92 read=19816"

^^
Note that despite this table being smaller, Postgres had to read 19816
blocks from disk, which is 2.5x more than the larger table.

> >> "->  Bitmap Index Scan on data2013_01w_ixtaxiid
> >> (cost=0.00..510.33 rows=22101 width=0) (actual time=26.053..26.053
> >> rows=20288 loops=1)"
> >> "  Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
> >> "  Buffers: shared hit=4 read=78"
> >> "Planning time: 0.144 ms"
> >> "Execution time: 141421.154 ms"
> >>
> >> Results for the large table: it took 5 seconds to finish.  The planning
> >> time is 252077.10
> >> "Sort  (cost=251913.32..252077.10 rows=65512 width=55) (actual
> >> time=5038.571..5039.765 rows=44204 loops=1)"
> >> "  Sort Key: "time""
> >> "  Sort Method: quicksort  Memory: 7753kB"
> >> "  Buffers: shared hit=2 read=7543"
> >> "  ->  Bitmap Heap Scan on data2011_01  (cost=1520.29..246672.53
> >> rows=65512 width=55) (actual time=36.935..5017.463 rows=44204 loops=1)"
> >> "Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
> >> "Heap Blocks: exact=7372"
> >> "Buffers: shared hit=2 read=7543"
> >> "->  Bitmap Index Scan on data2011_01_ixtaxiid
> >> (cost=0.00..1503.92 rows=65512 width=0) (actual time=35.792..35.792
> >> rows=44204 loops=1)"

^^
Note that the larger table took LONGER to do the index work than the
smaller table, which probably means there's nothing wrong with your
disks or anything ... that's the behavior I would expect.

> >> "  Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
> >> "  Buffers: shared hit=2 read=171"
> >> "Planning time: 0.127 ms"
> >> "Execution time: 5042.134 ms"

So, what I'm seeing, is that Postgres is able to figure out _which_ rows
to fetch faster on the small table than the large table, which is what
you would expect, since a smaller index should be faster than a large one.

However, when it goes to actually fetch the row data, it takes
significantly longer on the small table, despite the fact that it's
only fetching 1/3 as many rows. It is, however, doing 2.5x as many
disk reads to get those rows: For the large table, it reads 61MB from
disk, but it reads 160MB to get all the data for the smaller table.

How the data was inserted into each table could lead to similar data
being clustered on common pages on the large table, while it's spread
across many more pages on the small table.

That still doesn't explain it all, though. 2.5x the disk
activity normally wouldn't equate to 28x the time required. Unless
you're disks are horrifically slow? Does this server have a lot of
other activity against the disks? I.e. are other people running
queries that you would have to contend with, or is the server a VM
sharing storage with other VMs, or even a combined use server that
has to share disk access with (for example) a web or mail server
as well? Is the performance difference consistently ~28x?

Other things: what is shared_buffers set to? The queries would seem
to indicate that this server has less than 1M of those two tables
cached in memory at the time you ran those queries, which seems to
suggest that either you've got shared_buffers set very low, or that
there are a lot of other tables that other queries are accessing at
the time you're running these. Perhaps installing pg_buffercache to
have a look at what's using your shared_buffers would be helpful.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Is PRIMARY KEY the same as UNIQUE NOT NULL?

2016-01-31 Thread Bill Moran
On Sun, 31 Jan 2016 18:02:38 +0100
Tom Lane <t...@sss.pgh.pa.us> wrote:

> Harald Fuchs <hari.fu...@gmail.com> writes:
> > Ben Leslie <be...@benno.id.au> writes:
> >> "Technically, PRIMARY KEY is merely a combination of UNIQUE and NOT NULL"
> >> 
> >> I wanted to clarify if that was, technically, true.
> 
> > Yes, but see below.
> 
> >> "identifying a set of columns as primary key also provides metadata
> >> about the design of the schema, as a primary key implies that other
> >> tables can rely on this set of columns as a unique identifier for
> >> rows."
> 
> Yeah.  The extra metadata has several other effects.  Perhaps it would be
> better to reword this sentence to make it clear that PRIMARY KEY is
> equivalent to UNIQUE+NOTNULL in terms of the data constraint that it
> enforces, without implying that there is no other difference.  I'm not
> sure about a short and clear expression of that though ...

How about:

"PRIMARY KEY is merly a combination of UNIQUE and NOT NULL with regard
to data consistency behavior."

"identifying a set of columns as primary key also provides metadata about
the design of the schema, as a primary key implies that other tables can
rely on this set of columns as a unique identifier for rows. This
metadata may be used by external programs, but is also utilized interally
by the server in some cases."

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Catalog bloat (again)

2016-01-27 Thread Bill Moran
On Wed, 27 Jan 2016 23:54:37 +0100
Ivan Voras <ivo...@gmail.com> wrote:
> 
> I've done my Googling, and it looks like this is a fairly common problem.
> In my case, there's a collection of hundreds of databases (10 GB+) with
> apps which are pretty much designed (a long time ago) with heavy use of
> temp tables - so a non-trivial system.
> 
> The databases are vacuumed (not-full) daily, from cron

Vacuuming once a day is seldom often enough, except on very quiet
databases.

> (autovacuum was
> turned off some time ago for performance reasons), and still their size
> increases unexpectedly. By using some of the queries floating around on the
> wiki and stackoverflow[*], I've discovered that the bloat is not, as was
> assumed, in the user tables, but in the system tables, mostly in
> pg_attributes and pg_class.

The size increase isn't really unexpected. If you're only vacuuming once
per day, it's very easy for activity to cause active tables to bloat quite
a bit.

> This is becoming a serious problem, as I've seen instances of these tables
> grow to 6 GB+ (on a 15 GB total database), while still effectively
> containing on the order of 10.000 records or so. This is quite abnormal.
> 
> For blocking reasons, we'd like to avoid vacuum fulls on these tables (as
> it seems like touching them will lock up everything else).

It will. But to get them back down to a reasonable size, you're going to
have to do a VACUUM FULL at least _once_. If you retune things correctly,
you shouldn't need any more FULLs after that 1 time.

> So, question #1: WTF? How could this happen, on a regularly vacuumed
> system? Shouldn't the space be reused, at least after a VACUUM? The issue
> here is not the absolute existence of the bloat space, it's that it's
> constantly growing for *system* tables.

With a lot of activity, once a day probably isn't regular enough.

> Question #2: What can be done about it?

I highly recommend turning autovacuum back on, then tweaking the autovacuum
parameters to prevent any preformance issues.

However, if you're dead set against autovacuum, find out (using the queries
that are available all over the internet) which tables are bloating the
worst, and schedule additional vacuums via cron that vacuum _only_ the
problem tables. How often is something that will require some guesswork
and/or experimenting, but I would recommend at least once per hour. Since
you're only vacuuming selected tables, the performance impact should be
minimal.

You'll have to do a VACUUM FULL on the bloated tables _once_ to get the size
back down, but if you pick a good schedule or use autovacuum with appropriate
settings, they shouldn't need a VACUUM FULL again after that.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Catalog bloat (again)

2016-01-27 Thread Bill Moran
On Thu, 28 Jan 2016 00:37:54 +0100
Ivan Voras <ivo...@gmail.com> wrote:

> On 28 January 2016 at 00:13, Bill Moran <wmo...@potentialtech.com> wrote:
> 
> > On Wed, 27 Jan 2016 23:54:37 +0100
> > Ivan Voras <ivo...@gmail.com> wrote:
> >
> > > So, question #1: WTF? How could this happen, on a regularly vacuumed
> > > system? Shouldn't the space be reused, at least after a VACUUM? The issue
> > > here is not the absolute existence of the bloat space, it's that it's
> > > constantly growing for *system* tables.
> >
> > With a lot of activity, once a day probably isn't regular enough.
> >
> >
> I sort of see what you are saying. I'm curious, though, what goes wrong
> with the following list of expectations:
> 
>1. Day-to-day load is approximately the same
>2. So, at the end of the first day there will be some amount of bloat
>3. Vacuum will mark that space re-usable
>4. Within the next day, this space will actually be re-used
>5. ... so the bloat won't grow.
> 
> Basically, I'm wondering why is it growing after vacuums, not why it exists
> in the first place?

To add to what others have said: are you 100% sure that vacuum is
completing successfully each time it runs? I.e. does your cron job
trap and report failures of vacuum to complete? If it fails occasionally
for whatever reason, it's liable to bloat a lot over 48 hours (i.e.
assuming it succeeds the next time).

Additionally, there's the problem with active transactions causing it to
not clean up quite everything.

Not to belabour the point, but these hiccups are best handled by enabling
autovacuum and allowing it to monitor tables and take care of them for you.
I'm curious of claims of autovacuum causing performance issues, as I've
never seen it have much impact. Generally, if you can't run autovacuum
due to performance issues, your hardware is undersized for your workload
and anything else you do is just going to have problems in a different way.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JSONB performance enhancement for 9.6

2016-01-20 Thread Bill Moran
On Tue, 19 Jan 2016 23:53:19 -0300
Alvaro Herrera <alvhe...@2ndquadrant.com> wrote:

> Bill Moran wrote:
> 
> > As far as a current solution: my solution would be to decompose the
> > JSON into an optimized table. I.e.:
> > 
> > CREATE TABLE store1 (
> >  id SERIAL PRIMARY KEY,
> >  data JSONB
> > );
> > 
> > CREATE TABLE store2 (
> >  id INT NOT NULL REFERENCES store1(id),
> >  top_level_key VARCHAR(1024),
> >  data JSONB,
> >  PRIMARY KEY(top_level_key, id)
> > );
> 
> Isn't this what ToroDB already does?
> https://www.8kdata.com/torodb/

Looks like. I wasn't aware of ToroDB, thanks for the link.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] WIP: CoC V5

2016-01-12 Thread Bill Moran
On Tue, 12 Jan 2016 22:10:43 -0500
Tom Lane <t...@sss.pgh.pa.us> wrote:

> Kevin Grittner <kgri...@gmail.com> writes:
> > I'm not the greatest word-smith, but I'll attempt to rework Josh's
> > draft to something that seems more "natural" to me.
> 
> Minor (or not?) comment:
> 
> > * To maintain a safe, respectful, productive and collaborative
> > environment all participants must ensure that their language and
> > actions are free of personal attacks and disparaging remarks of any
> > kind.
> 
> The "disparaging remarks" part of this could easily be taken to forbid
> technical criticism of any sort, eg "this patch is bad because X,Y, and
> Z", even when X,Y, and Z are perfectly neutral technical points.  "Of any
> kind" doesn't improve that either.  I'm on board with the "personal
> attacks" part.  Maybe "disparaging personal remarks" would be better?

When I used to write fiction, I met regularly with a writing group.
We had a very explicit rule: criticize the manuscript, NOT the author.

I feel this applies ... and possibly could be worded to that effect,
"Critical remarks regarding patches and/or technical work are
necessary to ensure a quality product; however, critical remarks
directed at individuals are not constructive and therefore not
acceptable." or something ...

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Code of Conduct: Is it time?

2016-01-10 Thread Bill Moran
On Sun, 10 Jan 2016 07:36:23 -0800
"Joshua D. Drake" <j...@commandprompt.com> wrote:

> Hey,
> 
> For the record, my thoughts on a CoC are something like:
> 
> 1. Be excellent to each other
> 2. If you don't know what that means, leave
> 3. If someone isn't being excellent please contact: XYZ
> 
> With XYZ being a committee that determines the ABCs.

In general, I agree; but there are problems with 1 and 2.

The definition of "being excellent" varies from individual
to individual; but more importantly, from culture to culture.
As a result, pretty much everyone would have to leave as a
result of #2, because very few people know what "being
excellent" means to everyone involved.

As a result, I would feel REALLY bad for XYZ, who would be
put in the unenviable place of trying to mitigate disputes
with no guidance whatsoever.

So, the purpose of a CoC is twofold:

A) Define what "being excellent" means to this particular
   community.
B) Provide a process for how to resolve things when "being
   excellent" doesn't happen.

Without #1, nobody will want to do #2, as it's basically a
job that can never be done correctly.

But defining #1 is the really difficult part, because no matter
how you define it, there will be some people who disagree with
said definition.

The fact that Postgres has not needed a CoC up till now is a
testiment to the quality of the people in the community. However,
if Postgres continues to be more popular, the number of people
involved is going to increase. Simply as a factor of statistics,
the project will be forced to deal with some unsavory people at
some point. Having a CoC is laying the foundation to ensure that
dealing with those people involves the least pain possible. It
will always involve _some_ pain, but less is better.

I've done the job of #3 with other groups, and 99% of the time
there was nothing to do. The one incident I had to handle was
terrible, but at least I had some guidance on how to deal with
it.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Enforcing referential integrity against a HSTORE column

2016-01-02 Thread Bill Moran
On Sat, 2 Jan 2016 07:30:38 -0800
Adrian Klaver <adrian.kla...@aklaver.com> wrote:

> > So given:
> >
> > CREATE TABLE xtra_fields(
> >xfk SERIAL PRIMARY KEY,
> >xtk INTEGER NOT NULL REFERENCES xtra_types,
> >...
> > );
> >
> > CREATE OR REPLACE FUNCTION foo_xtra_fk(HSTORE) RETURNS BOOLEAN AS $$
> > WITH keyz AS (SELECT skeys($1)::INT AS xfk)
> > SELECT
> >(SELECT COUNT(*) FROM keyz JOIN xtra_fields USING (xfk))
> >=
> >(SELECT COUNT(*) FROM keyz)
> > $$LANGUAGE SQL STABLE STRICT LEAKPROOF;
> >
> > CREATE TABLE foo(
> >id INTEGER NOT NULL CHECK (id > 0),
> >...
> > -- Extra fields where the keys are the xtra_fields.xfk values and the
> > values are the
> > -- data values for the specific xfk.
> >xtra hstore CHECK (foo_xtra_fk(xtra))
> > );
> >
> > is ?there a more efficient way of maintaining logical referential integrity?

I second Adrian's comment on making sure that the benefit of HSTORE is
outweighing the drawback of having to write your own checks ... however,
if you decide that HSTORE is the right way to go, you may want to try
something more along the lines of this for your check:

SELECT true WHERE NOT EXISTS(SELECT 1 FROM keyz WHERE xfk NOT IN (akeys($1)));

Not tested, so it's possible that I have some typo or something; but overall
I've found that the NOT EXISTS construct can be very efficient in cases
like these.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] uuid-ossp: Performance considerations for different UUID approaches?

2015-12-22 Thread Bill Moran
On Tue, 22 Dec 2015 11:07:30 -0600
Brendan McCollam <bmccol...@uchicago.edu> wrote:

> (previously posted to the pgsql-performance list)
> 
> Hello,
> 
> We're in the process of designing the database for a new service, and 
> some of our tables are going to be using UUID primary key columns.
> 
> We're trying to decide between:
> 
> * UUIDv1 (timestamp/MAC uuid) and
> 
> * UUIDv4 (random uuid)
> 
> And the separate but related choice between:
> 
> * Generating the UUIDs client-side with the Python uuid library 
> (https://docs.python.org/2/library/uuid.html) or
> 
> * Letting PostgreSQL handle uuid creation with the uuid-ossp extension 
> (http://www.postgresql.org/docs/9.4/static/uuid-ossp.html)
> 
> In terms of insert and indexing/retrieval performance, is there one 
> clearly superior approach? If not, could somebody speak to the 
> performance tradeoffs of different approaches?
> 
> There seem to be sources online (e.g. 
> https://blog.starkandwayne.com/2015/05/23/uuid-primary-keys-in-postgresql/ 
> http://rob.conery.io/2014/05/29/a-better-id-generator-for-postgresql/) 
> that claim that UUIDv4 (random) will lead to damaging keyspace 
> fragmentation and using UUIDv1 will avoid this.

There's no substance to these claims. Chasing the links around we finally
find this article:
http://www.sqlskills.com/blogs/kimberly/guids-as-primary-keys-andor-the-clustering-key/
which makes the reasonable argument that random primary keys can cause
performance robbing fragmentation on clustered indexes.
But Postgres doesn't _have_ clustered indexes, so that article doesn't
apply at all. The other authors appear to have missed this important
point.

One could make the argument that the index itself becomming fragmented
could cause some performance degredation, but I've yet to see any
convincing evidence that index fragmentation produces any measurable
performance issues (my own experiments have been inconclusive).

Looking at it another way, a quick experiment shows that PG can fit
about 180 UUID primary keys per database page, which means a million
row table will use about 5600 pages to the tune of about 46m. On
modern hardware, that index is likely to be wholly in memory all the
time.

If your performance requirements are really so dire, then you should
probably consider ditching UUIDs as keys. Taking the same million row
table I postulated in the previous paragraph, but using ints insted
of UUIDs for the primary key, the primary key index would be about
3200 pages (~26m) ... or almost 1/2 the size -- making it more likely
to all be in memory at any point in time.

I seriously doubt that trying to make your UUIDs generate in a
predictable fashon will produce any measurable improvement, and I
see no evidence in the articles you cited that claims otherwise
have any real basis or were made by anyone knowledgeable enough
to know.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Test disk reliability (or HGST HTS721010A9E630 surprisingly reliable)

2015-12-21 Thread Bill Moran
On Mon, 21 Dec 2015 14:54:14 +0100
Félix GERZAGUET <felix.gerzag...@gmail.com> wrote:

> On Mon, Dec 21, 2015 at 12:31 AM, Jim Nasby <jim.na...@bluetreble.com>
> wrote:
> 
> > On 12/20/15 1:09 PM, Félix GERZAGUET wrote:
> >
> >> After reading
> >> http://www.postgresql.org/docs/current/static/wal-reliability.html, I
> >> tried the recommended diskchecker.pl
> >> <http://brad.livejournal.com/2116715.html> but I am not satisfied:
> >>
> >> I always get:
> >> Total errors: 0
> >>
> >> even if I tested with with a HGST HTS721010A9E630 that the vendor's
> >> datasheet
> >> (http://www.hgst.com/sites/default/files/resources/TS7K1000_ds.pdf)
> >> advertise as "
> >> Designed for low duty cycle, non mission-critical applications in
> >> PC,nearline and consumer electronics environments, which vary
> >> application to application
> >> "
> >>
> >> Since it is not, a high end disk, I expect some errors.
> >>
> >
> > Why? Just because a disk isn't enterprise-grade doesn't mean it has to lie
> > about fsync, which is the only thing diskchecker.pl tests for.
> >
> 
> I was thinking that since the disk have a 32M write-cache (with not
> battery) it would lie to the OS (and postgres) about when data are really
> on disk (not in the disk write cache). But maybe that thinking was wrong.

It varies by vendor and product, which is why diskchecker.pl exists.
It's even possible that the behavior is configurable ... check to see
if the vendor provides a utility for configuring it.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD

2015-12-13 Thread Bill Moran
On Sun, 13 Dec 2015 16:35:08 +0100
Gerhard Wiesinger <li...@wiesinger.com> wrote:

> Hello Bill,
> 
> Thank you for your response, comments inline:
> 
> On 13.12.2015 16:05, Bill Moran wrote:
> > On Sun, 13 Dec 2015 09:57:21 +0100
> > Gerhard Wiesinger <li...@wiesinger.com> wrote:
> >> some further details from the original FreeBSD 10.1 machine:
> >>
> >> Mem: 7814M Active, 20G Inact, 2982M Wired, 232M Cache, 1661M Buf, 30M Free
> >> Swap: 512M Total, 506M Used, 6620K Free, 98% Inuse
> >>
> >> PID USERNAMETHR PRI NICE   SIZERES STATE   C   TIME WCPU 
> >> COMMAND
> >> 77941 pgsql 5  200  7925M  7296M usem2 352:34 6.98%
> >> postgres: username dbnamee 127.0.0.1(43367)  (postgres)
> > 
> >
> > I see no evidence of an actual leak here. Each process is basically using
> > the 7G of shared_buffers you have allocated in the config (which is only
> > 7G _total_ for all processes, since it's shared memory)
> 
> OK, but why do we then get: kernel: swap_pager_getswapspace(4): failed?

You haven't provided enough information to isolate that cause yet. What's
in the Postgres log? Surely it will have logged something when its request
for RAM was denied, and it should be more informational than the OS'
generic message.

> >> Out of memory:
> >> kernel: swap_pager_getswapspace(4): failed
> >> kernel: swap_pager_getswapspace(8): failed
> >> kernel: swap_pager_getswapspace(3): failed
> >>
> >> Main issue is IHMO (as far as I understood the FreeBSD Memory system)
> >> that 20G are INACTIVE. When I subtract the shared memory, even ~13GB
> >> should be available, but they are still allocated but inactive
> >> (INACTIVE). INACTIVE memory might be clean or dirty. As we get into out
> >> of memory situations it is likely that the memory is dirty (otherwise it
> >> would have been reused).
> > Not quite correct. Inactive memory is _always_ available for re-use.
> 
> Are you sure that's true?

Yes. Read The Design and Implementation of FreeBSD for the details.

> Monitoring inactive memory:
> cat vm_stat.sh
> #!/usr/bin/env bash
> 
> while [ 1 ]; do
>date +%Y.%m.%d.%H.%M.%S
>sysctl -a | grep vm.stats.vm.
>sleep 1
> done
> 
> And even we get out of memory with swap_pager_getswapspace Inactive 
> Memory (from the log file) is around 20GB (doesn't go down or up)
> vm.stats.vm.v_inactive_count: 5193530 (*4k pages is around 20GB)
> 
> Then we have 20GB inactive memory, but we still get out of memory with 
> kernel: swap_pager_getswapspace(4): failed. Any ideas why?

Theory: If the planner decides it needs to do 30 sort operations for a
query, it will try to allocate 27G of RAM, which exceeds what's available,
and therefore never gets allocated. So you get the "out of space" message,
but the actual memory usage doesn't change.

> >> maintenance_work_mem = 512MB
> >> effective_cache_size = 10GB
> >> work_mem = 892MB
> > I expect that this value is the cause of the problem. The scenario you
> > describe below is sorting a large table on an unindexed column, meaning
> > it will have to use all that work_mem. I'd be interested to see the
> > output of:
> >
> > EXPLAIN ANALYZE SELECT * FROM t_random ORDER BY md5 LIMIT 10;
> 
> That was only a test query, has nothing to do with production based 
> query. They are mostly SELECT/INSERTS/UPDATES on primary keys.

Then provide the _actual_ queries and the EXPLAIN ANALYZE and the table
layouts and basic data distribution of the actual cause. If your test
case is completely non-representative of what's happening, then you're
not going to get useful answers.

> > But even without that information, I'd recommend you reduce work_mem
> > to about 16M or so.
> 
> Why so low? E.g. sorting on reporting or some long running queries are 
> then done on disk and not in memory.

Even a simple query could involve multiple sorts, and you're allowing
each sort to use up to 890M of RAM (which is _not_ shared). As noted
earlier, even a moderately complex query could exceed the available
RAM on the system. But since you don't provide the actual queries and
tables causing problems, I can only guess. And since you appear to
have already decided what the cause of the problem is, then crafted
completely non-relevent queries that you think prove your point, I'm
not sure there's anything I can do to help you.

> >> wal_buffers = 8MB
> >> checkpoint_segments = 16
> >> shared_buffers = 7080MB
> >> max_connections = 80
> >> autovacuum_max_workers = 3
> > [snip]
> >
> >>> We are running PostgreS

Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD

2015-12-13 Thread Bill Moran
gt; postgres: postgres postgres [local] SELECT
> > 26851 postgres  20   0 2365732 408464 406788 R 100.0 10.1 0:17.81 
> > postgres: postgres postgres [local] SELECT
> > 26851 postgres  20   0 2365732 864472 862576 R 100.0 21.4 0:38.90 
> > postgres: postgres postgres [local] SELECT
> > -- Function execmultiplei and transaction terminated, but memory still 
> > allocated!!!
> > 26851 postgres  20   0 2365732 920668 918748 S   0.0 22.7 0:41.40 
> > postgres: postgres postgres [local] idle
> > -- Calling it again
> > 26851 postgres  20   0 2365732 920668 918748 R  99.0 22.7 0:46.51 
> > postgres: postgres postgres [local] SELECT
> > -- idle again, memory still allocated
> > 26851 postgres  20   0 2365732 920668 918748 S   0.0 22.7 1:22.54 
> > postgres: postgres postgres [local] idle
> >
> > Memory will only be released if psql is exited. According to the 
> > PostgreSQL design memory should be freed when the transaction completed.
> >
> > top commands on FreeBSD: top -SaPz -o res -s 1
> > top commands on Linux: top -o RES d1
> >
> > Config: VMs with 4GB of RAM, 2 vCPUs
> > shared_buffers = 2048MB # min 128kB
> > effective_cache_size = 2GB
> > work_mem = 892MB
> > wal_buffers = 8MB
> > checkpoint_segments = 16

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD

2015-12-13 Thread Bill Moran
d HTTP/Postgres server? You didn't mention that
earlier, and it's kind of important.

What evidence do you have that Postgres is actually the part of
this system running out of memory? I don't see any such evidence in any of
your emails, and (based on experience) I find it pretty likely that whatever
is running under node is doing something in a horrifically memory-inefficient
manner. Since you mention that you see nothing in the PG logs, that makes it
even more likely (to me) that you're looking entirely in the wrong place.

I'd be willing to bet a steak dinner that if you put the web server on a
different server than the DB, that the memory problems would follow the
web server and not the DB server.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Memory Leak executing small queries without closing the connection - FreeBSD

2015-12-13 Thread Bill Moran
On Sun, 13 Dec 2015 22:23:19 +0100
Gerhard Wiesinger <li...@wiesinger.com> wrote:

> On 13.12.2015 21:14, Bill Moran wrote:
> > Wait ... this is a combined HTTP/Postgres server? You didn't mention that
> > earlier, and it's kind of important.
> >
> > What evidence do you have that Postgres is actually the part of
> > this system running out of memory?
> 
> For me the complete picture doesn't look consistent.

That's because you haven't gathered enough of the right type of information.

> > I don't see any such evidence in any of
> > your emails, and (based on experience) I find it pretty likely that whatever
> > is running under node is doing something in a horrifically 
> > memory-inefficient
> > manner. Since you mention that you see nothing in the PG logs, that makes it
> > even more likely (to me) that you're looking entirely in the wrong place.
> >
> > I'd be willing to bet a steak dinner that if you put the web server on a
> > different server than the DB, that the memory problems would follow the
> > web server and not the DB server.
> 
> Changes in config:
> track_activity_query_size = 102400
> work_mem = 100MB
> 
> Ok, we restarted PostgreSQL and had it stopped for seconds, and logged 
> top every second:
> 
> When PostgreSQL was down nearly all memory was freed, looks good to me. 
> So it is likely that node and other processes are not the cause.
> Mem: 742M Active, 358M Inact, 1420M Wired, 21G Cache, 871M Buf, 8110M Free
> Swap: 512M Total, 477M Used, 35M Free, 93% Inuse
> 
> When PostgreSQL restarted, Inactive was growing fast (~1min):
> Mem: 7998M Active, 18G Inact, 2763M Wired, 1766M Cache, 1889M Buf, 1041M 
> Free
> Swap: 512M Total, 472M Used, 41M Free, 92% Inuse
> 
> After some few minutes we are back again at the same situation:
> Mem: 8073M Active, 20G Inact, 2527M Wired, 817M Cache, 1677M Buf, 268M Free
> Swap: 512M Total, 472M Used, 41M Free, 92% Inuse
> 
> The steak dinner is mine :-) Donating to the PostgreSQL community :-)

Based on the fact that the inactive memory increased? Your understanding
of inactive memory in FreeBSD is incorrect. Those pages are probably DB
pages that the OS is keeping in inactive memory because Postgres requests
them over and over, which is what the OS is supposed to do to ensure the
best performance. Are you seeing any out of swap space errors? Even if
you are, you still haven't determined if the problem is the result of
Postgres or the node.js stuff you have running. I don't know what node.js
might be caching on the client side ... do you?

No. Until you can actually report back something other than wild
speculation, I'll keep that steak dinner for myself. Besides, that bet
was based on you putting the PG server on seperate hardware from the
web server, which you didn't do.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JDBC and inet type

2015-12-04 Thread Bill Moran
On Fri, 4 Dec 2015 09:41:24 +
Tim Smith <randomdev4+postg...@gmail.com> wrote:

> When I use "preparedStatement.setString(5,ip);" to send values to a
> stored function, it obviously gets sent to postgres as "character
> varying".
> 
> Postgres obviously complains loudly and says " Hint: No function
> matches the given name and argument types. You might need to add
> explicit type casts.".
> 
> What is the appropriate workaround ?

You can define param 5 as varchar in your query, as Rob suggests:

CREATE FUNCTION some_function(int, int, int, int, int, varchar) ...

Then cast the 5th parameter to INET within your function.

You can also cast the value in your SQL.

sql = "SELECT some_function($, $, $, $, $::INET)"; ...

You could also create an Inet class in Java and implement the
SQLData interface, then use setObject() instead of setString().
It doesn't appear as if anyone has done this yet, but it would
be nice if it were incluced in the JDBC driver.

The first answer is probably best for stored procedures, as it
simplifies things down the road. The second solution is more
universal, as it works for non-function-calling SQL as well.
The third solution is probably _really_ the correct one, from
a pedantic standpoint, but it's a bit more work to implement.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] AccessExclusiveLock on tuple?

2015-12-02 Thread Bill Moran
On Wed, 2 Dec 2015 09:31:44 -0800
Christophe Pettus <x...@thebuild.com> wrote:
> 
> On Dec 2, 2015, at 9:25 AM, Bill Moran <wmo...@potentialtech.com> wrote:
> 
> > No. See the section on row level locks here:
> > http://www.postgresql.org/docs/9.4/static/explicit-locking.html
> 
> That wasn't quite my question.  I'm familiar with the row-level locking and 
> the locking messages in general, but this message implies there is such a 
> thing as an AccessExclusiveLock on a tuple, which is new to me.  I wasn't 
> able to produce this message experimentally doing various combinations of 
> UPDATE statements and SELECT FOR UPDATEs, or even with explicit LOCK ACCESS 
> EXCLUSIVE MODE, thus the question.

First off, that documentation page _does_ answer your question.

Secondly, there is a config setting: log_lock_waits, which is
disabled by default. The message won't appear if that is off, so
if you're testing on a different install than where the incident
happened, that could be part of the problem.

Finally, the following sequence triggers the message:

create table test1 (data int);
insert into test1 values (1);

Connection 1:
begin;
select * from test1 where data = 1 for update;

Connection 2:
select * from test1 where data = 1 for update;

Then wait for a little while and the message will be logged.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] AccessExclusiveLock on tuple?

2015-12-02 Thread Bill Moran
On Wed, 2 Dec 2015 09:01:37 -0800
Christophe Pettus <x...@thebuild.com> wrote:

> On 9.4, I've encountered a locking message I've not seen before:
> 
>   process 5293 still waiting for AccessExclusiveLock on tuple (88636,15) 
> of relation 18238 of database 16415 after 5000.045 ms
> 
> What conditions produce an "AccessExclusiveLock on tuple"?  Attempting to 
> lock a tuple when another process has done an explicit LOCK ACCESS EXCLUSIVE?

No. See the section on row level locks here:
http://www.postgresql.org/docs/9.4/static/explicit-locking.html

Essentially, any data modification could take an exclusive lock on the row(s)
that it's going to modify. Generally, this will be an UPDATE statement,
although the same thing happens when you do SELECT ... FOR UPDATE.

The message you're seeing simply means that one process has been waiting for
a long time for the lock to release (5 seconds in this case). Deadlocks are
automatically handled, so this is not a deadlock. Although if the process
holding the lock does not commit the transaction, the waiting process will
wait indefinitely.

If this is happening infrequently, it's probably of no concern. If it's
happening frequently, you'll want to investigate what process is holding
the locks for so long and see what can be done about it.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JSONB performance enhancement for 9.6

2015-11-29 Thread Bill Moran
On Sun, 29 Nov 2015 08:24:12 -0500
Tom Smith <tomsmith198...@gmail.com> wrote:

> Hi, Thanks for everyone's response.
> 
> The issue is not just compression, but lack of "indexing" or "segmentation"
> when a
> single doc has, say 2000 top level keys (with multiple levels of subkeys).
>  right now, if I query for one key,  the whole doc
> has to be first uncompressed and loaded and then search for the single key.
> 
> Compared to traditional way of storing each top level key with a separate
> column, this is huge overhead when table scan is required.  Some kind of
> "keyed/slotted" storage for the doc could
> help, (for illustration, all keys starting with 'A' would have its own
> storage unit, so on,
> so when I search for key  "A1" only that unit would be unpacked and
> traversed to get :"A1" value". it is like postgresql predfine 26
> columns/slots for the whole doc. an internal indexing
> within each doc for fast retrieval of individual field values.

Sounds like you're pushing the limits of what JSONB is designed to do
(at this stage, at least). I'm not aware of any improvements in recent
versions (or head) that would do much to improve the situation, but I
don't track ever commit either. If you really need this improvement and
you're willing to wait for 9.6, then I suggest you check out the latest
git version and test on that to see if anything has been done.

I doubt you'll see much, though. As a thought experiment, the only way
I can think to improve this use case is to ditch the current TOAST
system and replace it with something that stores large JSON values in
a form optimized for indexed access. That's a pretty massive change
to some fairly core stuff just to optimize a single use-case of a
single data type. Not saying it won't happen ... in fact, all things
considered, it's pretty likely to happen at some point.

As far as a current solution: my solution would be to decompose the
JSON into an optimized table. I.e.:

CREATE TABLE store1 (
 id SERIAL PRIMARY KEY,
 data JSONB
);

CREATE TABLE store2 (
 id INT NOT NULL REFERENCES store1(id),
 top_level_key VARCHAR(1024),
 data JSONB,
 PRIMARY KEY(top_level_key, id)
);

You can then use a trigger to ensure that store2 is always in sync with
store1. Lookups can then use store2 and will be quite fast because of
the index. A lot of the design is conjectural: do you even still need
the data column on store1? Are there other useful indexes? etc. But,
hopefully the general idea is made clear.

This probably aren't the answers you want, but (to the best of my
knowledge) they're the best answers available at this time. I'd really
like to build the alternate TOAST storage, but I'm not in a position to
start on a project that ambitious right ... I'm not even really keeping
up with the project I'm currently supposed to be doing.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] JSONB performance enhancement for 9.6

2015-11-29 Thread Bill Moran
On Sat, 28 Nov 2015 21:27:51 -0500
Tom Smith <tomsmith198...@gmail.com> wrote:
> 
> Is there a plan for 9.6 to resolve the issue of very slow query/retrieval
> of jsonb fields
> when there are large number (maybe several thousands) of top level keys.
> Currently, if I save a large json document with top level keys of thousands
> and query/retrieve
> field values,  the whole document has to be first decompressed and load to
> memory
> before searching for the specific field key/value.

I could be off-base here, but have you tried:

ATLER TABLE $table ALTER COLUMN $json_column SET STORAGE EXTERNAL;

?

The default storage for a JSONB field is EXTENDED. Switching it to
EXTERNAL will disable compression. You'll have to insert your data over
again, since this change doesn't alter any existing data, but see
if that change improves performance.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] controlling memory management with regard to a specific query (or groups of connections)

2015-11-19 Thread Bill Moran
On Wed, 18 Nov 2015 20:10:00 -0500
Jonathan Vanasco <postg...@2xlp.com> wrote:

> As a temporary fix I need to write some uploaded image files to PostgreSQL 
> until a task server can read/process/delete them.  
> 
> The problem I've run into (via server load tests that model our production 
> environment), is that these read/writes end up pushing the indexes used by 
> other queries out of memory -- causing them to be re-read from disk.   These 
> files can be anywhere from 200k to 5MB.
> 
> has anyone dealt with situations like this before and has any suggestions?  I 
> could use a dedicated db connection if that would introduce any options. 

PostgreSQL doesn't have any provisions for preferring one thing
or another for storing in memory.

The easiest thing I can think would be to add memory to the machine
(or configure Postgres to use more) such that those files aren't
pushing enough other pages out of memory to have a problematic
impact.

Another idea would be to put the image database on a different
physical server, or run 2 instances of Postgres on a single
server with the files in one database configured with a low
shared_buffers value, and the rest of the data on the other
database server configured with higher shared_buffers.

I know these probably aren't the kind of answers you're looking
for, but I don't have anything better to suggest; and the rest
of the mailing list seems to be devoid of ideas as well.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Auto-analyse on insert operations

2015-11-04 Thread Bill Moran
On Wed, 4 Nov 2015 14:32:37 +0100
Bertrand Roos <bertrand.r...@areal.fr> wrote:
> 
> I try to configure auto-analyse task with postgresql 9.4.
> I have the following configuration (default configuration):
> track_counts = on
> autovacuum = on
> log_autovacuum_min_duration = -1
> autovacuum_max_workers = 3
> autovacuum_naptime = 300s
> autovacuum_vacuum_threshold = 50
> autovacuum_analyze_threshold = 50
> autovacuum_vacuum_scale_factor = 0.2
> autovacuum_analyze_scale_factor = 0.2
> autovacuum_freeze_max_age = 2
> autovacuum_multixact_freeze_max_age = 4
> autovacuum_vacuum_cost_delay = 20ms
> autovacuum_vacuum_cost_limit = -1
> 
> With this configuration, I can observe that some tables are 
> auto-analysed, but some others are not. Even if there are millions of 
> insert operations on an empty table (all tables are in cluster mode).
> In fact it seems that tables with update operations are the only ones 
> that are auto-analysed.
> I'm quite suprised because the documentation says that daemon check the 
> count of insert, update and delete operations.
> What could it be the reason ? Why tables which have only update 
> operation, aren't analysed ?
> Are update operations really taken into account ?

Given that autoanalyze is pretty critical to the way the system functions,
it's unlikely that it just doesn't work (someone else would have noticed).

A more likely scenario is that you've found some extremely obscure edge
case. If that's the case, you're going to have to give very specific
details as to how you're testing it before anyone is liable to be able
to help you.

I get the impression that you're somewhat new to Postgres, in which case
it's very likely that the problem is that you're not testing the situation
correctly. In that case, we're going to need specific details on how you're
observing that tables are or are not being analysed.

As a wild-guess theory: the process that does the analyze only wakes up
to check tables every 5 minutes (based on the config you show) ... so are
you doing the inserts then checking the table without leaving enough time
in between for the system to wake up and notice the change?

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Auto-analyse on insert operations

2015-11-04 Thread Bill Moran
On Wed, 4 Nov 2015 16:43:57 +0100
Bertrand Roos <bertrand.r...@areal.fr> wrote:
> 
> Le 04/11/2015 14:55, Bill Moran a écrit :
> > On Wed, 4 Nov 2015 14:32:37 +0100
> > Bertrand Roos <bertrand.r...@areal.fr> wrote:
> >> I try to configure auto-analyse task with postgresql 9.4.
> >> I have the following configuration (default configuration):
> >> track_counts = on
> >> autovacuum = on
> >> log_autovacuum_min_duration = -1
> >> autovacuum_max_workers = 3
> >> autovacuum_naptime = 300s
> >> autovacuum_vacuum_threshold = 50
> >> autovacuum_analyze_threshold = 50
> >> autovacuum_vacuum_scale_factor = 0.2
> >> autovacuum_analyze_scale_factor = 0.2
> >> autovacuum_freeze_max_age = 2
> >> autovacuum_multixact_freeze_max_age = 4
> >> autovacuum_vacuum_cost_delay = 20ms
> >> autovacuum_vacuum_cost_limit = -1
> >>
> >> With this configuration, I can observe that some tables are
> >> auto-analysed, but some others are not. Even if there are millions of
> >> insert operations on an empty table (all tables are in cluster mode).
> >> In fact it seems that tables with update operations are the only ones
> >> that are auto-analysed.
> >> I'm quite suprised because the documentation says that daemon check the
> >> count of insert, update and delete operations.
> >> What could it be the reason ? Why tables which have only update
> >> operation, aren't analysed ?
> >> Are update operations really taken into account ?
> > Given that autoanalyze is pretty critical to the way the system functions,
> > it's unlikely that it just doesn't work (someone else would have noticed).
> >
> > A more likely scenario is that you've found some extremely obscure edge
> > case. If that's the case, you're going to have to give very specific
> > details as to how you're testing it before anyone is liable to be able
> > to help you.
> >
> > I get the impression that you're somewhat new to Postgres, in which case
> > it's very likely that the problem is that you're not testing the situation
> > correctly. In that case, we're going to need specific details on how you're
> > observing that tables are or are not being analysed.
> >
> > As a wild-guess theory: the process that does the analyze only wakes up
> > to check tables every 5 minutes (based on the config you show) ... so are
> > you doing the inserts then checking the table without leaving enough time
> > in between for the system to wake up and notice the change?
> >
> Thanks for your answer Bill.
> Indeed, I'm pretty new to Postgres and I don't exclude that I'm doing 
> something wrong. But I did my test on a more than 1 day duration, so 
> it's not an issue of autovacuum_naptime (I insert 760 lignes each 30 
> seconds during 36 hours).
> I can't give all the details of this test because it is to complicated 
> with triggers and partman (and your objective is not to solve 
> configuration issues of others).

Others have answered some of your other questions, so I'll just throw
out another possibility: have the per-table analyze settings been altered
on the table(s) that are behaving badly? See
http://www.postgresql.org/docs/9.4/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS

Attaching the output of
pg_dump -s -t $table_name -U postgres $database_name
will probably go a long way toward getting more targeted assistance.
(substitute the actual database name, and the name of a table that is
giving you trouble)

In addition, the output of
SELECT * FROM pg_stat_user_tables WHERE relname = '$table_name';
(Again, substitute an actual table name that's giving you trouble,
preferrably the same table as from the pg_dump)

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] ??: postgres cpu 100% need help

2015-10-27 Thread Bill Moran
On Tue, 27 Oct 2015 11:30:45 +0800
"657985...@qq.com" <657985...@qq.com> wrote:

> Dear sir:
>  Recently a wired question about postgresql database really bothered 
> me a lot, so i really need your help. Here is the problem, in the most 
> situations the postgre database work very well,  Average 3500tps/s per day, 
> the cpu usage of its process is 3%~10% and every query can be responsed in 
> less than 20ms, but sometimes the cpu usages of its process can suddenly grow 
> up to 90%+ , at that time a simple query can cost  2000+ms. ps: My postgresql 
> version is 9.3.5 and the database is oltp  server.

9.3.5 is pretty old, you should probably schedule an upgrade.

>  shared_buffers | 25GB  

Try setting this to 16GB. It's been a while since I tested on
large-memory/high-load systems, but I seem to remember that
shared_buffers above 16G could cause these sorts of intermittant
stalls.

If that doesn't improve the situation, you'll probably need to
provide more details, specifically the layout of the table in
question, as well as the queries that are active when the
problem occurs, and the contents of the pg_locks table when
the problem is occurring.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Service not starting on Ubuntu 15.04

2015-10-26 Thread Bill Moran
On Mon, 26 Oct 2015 11:21:23 +
Lasse Westh-Nielsen <lassewe...@gmail.com> wrote:
> 
> I posted to stackoverflow about my problem upgrading from Ubuntu 14.04 to
> Ubuntu 15.04:
> http://stackoverflow.com/questions/33306475/ubuntu-15-04-postgresql-doesnt-start
> 
> Tl;dr: postgresql service does not start properly when installed as a
> package using cloud-init.
> 
> And I can't figure out if I am doing something wrong, if the AMI is no
> good, if the package has problems, ... I reckon I cannot be the first
> person to use Postgres on Ubuntu Vivid, but I have been hammering my head
> against the wall with this for hours.
> 
> Any help greatly appreciated!

I'm taking a shot in the dark here, but ...

The symptoms you describe seem to suggest that the script is starting
PostgreSQL asynchronously (i.e. in the background) which means that
the CREATE command runs too quickly and the server isn't started yet.

A quick way to _test_ this theory would be to put a sleep between the
install and the CREATE commands and see if the problem goes away.

If that does seem to be the problem, then a good _fix_ would be to
find a way to foreground the startup of the server, or have some command
that tests to ensure the server is started and blocks until it is
before running the create command.

The only point I'm unclear on is whether you've confirmed that
Postgres actually _is_ started once the server is up (albiet without
the CREATE statement having succeeded).

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] I'm starting a PostgreSQL user's group in Pittsburgh

2015-10-25 Thread Bill Moran

I'm going to see if I can drum up enough interest for a PostgreSQL
user's group in the Pittsburgh area.

After talking to the organizers of the Philadelphia PUG, I decided
to try using Meetup to coordinate things:
http://www.meetup.com/Pittsburgh-PostgreSQL-Users-Group/

If you're in the Pittsburgh area and would like to get involved,
please show your interest by joining the meetup. I'll get a first
event scheduled as soon as we have enough people signed up to make
it interesting.

If you haven't used meetup before: it's a service specifically for
coordinating things like user's groups, and it does a pretty good
job of letting us coordinate activities. Basic membership on the
site is free and includes participating in as many groups as you
desire. (it only costs something if you want to host your own group).

Hope to see you soon.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Drop or alter column under load give ERROR #42804 structure of query does not match function result type:

2015-10-09 Thread Bill Moran
On Fri, 9 Oct 2015 14:32:44 +0800
Victor Blomqvist <v...@viblo.se> wrote:

> I have a heavily used PostgreSQL 9.3.5 database on CentOS 6. Sometimes I
> need to add/remove columns, preferably without any service interruptions,
> but I get temporary errors.
> 
> I follow the safe operations list from
> https://www.braintreepayments.com/blog/safe-operations-for-high-volume-postgresql
> but many operations cause troubles anyway when the more busy tables are
> updated.

I seriously doubt if Paul did enough research to be sure that "safe" is an
absolute term for that list.

> Typically I have user defined functions for all operations, and my table
> and functions follow this pattern:
> 
> CREATE TABLE users (
>   id integer PRIMARY KEY,
>   name varchar NOT NULL,
>   to_be_removed integer NOT NULL
> );
> 
> CREATE FUNCTION select_users(id_ integer) RETURNS SETOF users AS
> $$
> BEGIN
>   RETURN QUERY SELECT * FROM users WHERE id = id_;
> END;
> $$ LANGUAGE plpgsql;
> 
> Then the actual queries are run by our application as
> 
> SELECT id, name FROM select_users(18);
> 
> As you can see the column to_be_removed is not selected.

Sure it is ... the function does SELECT *, which absolutely includes the
to_be_removed column. The fact that you ignore that column in a
subsequent superselect doesn't mean that the query in the function knows
to do so.

> Then to remove the
> column I use:
> 
> ALTER TABLE users DROP COLUMN to_be_removed;
> 
> However, while the system is under load sometimes (more frequently and
> persistent the more load the system is experiencing) I get errors like
> these:
> 
> ERROR #42804 structure of query does not match function result type:
> Number of returned columns (2) does not match expected column count (3).

This is a bit surprising to me. I would expect Postgres to have one or the
other definition of that row within a single transaction, but what seems to
be happening is that the ALTER causes the row definition to be changed in
the middle of the transaction, thus the the function may return 3 columns,
but when the outer query checks the type, it sees that it should only
have 2.

> The same error can happen when columns are added. Can this be avoided
> somehow, or do I need to take the system offline during these kind of
> changes?
> 
> For reference, there was a similar but not same issue posted to psql-bugs a
> long time ago:
> http://www.postgresql.org/message-id/8254631e-61a5-4c03-899f-22fdcf369...@e23g2000vbe.googlegroups.com
> 
> I posted this same question at dba.stackexchange and got the advice to
> repost here:
> http://dba.stackexchange.com/questions/117511/postgresql-drop-column-under-load-give-wrong-number-of-columns-errors

This has a lot to do with internals. You should wait a bit to see if you
get a good answer, but if not you might need to post to the hackers list
so the developers can chime in. My opinion is that this is a bug, but it's
an obscure enough bug that it's not surprising that it's gone unfixed for
a while.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using postgresql for session

2015-10-07 Thread Bill Moran
On Wed, 7 Oct 2015 09:58:04 -0600
"john.tiger" <john.tigernas...@gmail.com> wrote:

> has anyone used postgres jsonb for holding session ?  Since server side 
> session is really just a piece of data, why bother with special 
> "session" plugins and just use postgres to hold the data and retrieve it 
> with psycopg2 ?  Maybe use some trigger if session changes?We are 
> using python Bottle with psycopg2 (super simple, powerful combo) - are 
> we missing something magical about session plugins ?

Nothing that I'm aware of. I've worked on large projects that keep the
session data in a Postgres table with great success.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best way to sync table DML between databases

2015-10-05 Thread Bill Moran
On Mon, 5 Oct 2015 06:20:28 -0700 (MST)
jimbosworth <jimbos5...@mail.com> wrote:

> Hi All,
> 
> I have two servers each running pg9.4.4 database instances.  
> I need to determine the best way to keep a large 20gb table on server A
> synchronised onto server B...
> 
> At the moment, I use pg_dump to periodically dump the table on server A,
> then psql to reload into server B.  This is fine, but means I have to pull
> 100% of the table each time rather than just the changes.  This option does
> not offer real time accuracy on server B.
> 
> I have considered using a table trigger on row (update, insert or delete)
> and then using db_link or postgres_fdw to sync the changes, but am concerned
> that a table trigger is synchronous... so a db_link or fdw could incur a
> lengthy delay.
> 
> I have also considered using table OIDs to track changes, then just
> periodically sync the difference.
> 
> I have considered using postgre_fdw and then 'refresh concurrently
> materialized view' on server B.
> 
> I have considered using logical decoding to read the wal files, then extract
> the changes.
> 
> Can anyone explain the best way to synchronise JUST the changes on a table
> between servers please?

Sounds like a problem custom-made to be solved by Slony:
http://slony.info/

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can somebody explain what is the meaning for HashAggregate?

2015-09-20 Thread Bill Moran
On Sun, 20 Sep 2015 21:56:39 +0800 (CST)
lin <jlu...@163.com> wrote:

> Can somebody explain what does the postgres done for the explain of sql shows 
> HashAggregate( what is the meaning for  HashAggregate  )?
> for example: 
> 
> 
> postgres=# explain verbose select oid,relname  from pg_class group by 
> oid,relname;
>  QUERY PLAN  
> -
>  HashAggregate  (cost=12.42..15.38 rows=295 width=68)
>Output: oid, relname
>Group Key: pg_class.oid, pg_class.relname
>->  Seq Scan on pg_catalog.pg_class  (cost=0.00..10.95 rows=295 width=68)
>  Output: oid, relname
> (5 rows)
> 
> 
> ::  first, seq scan pg_class and return (oid,relname);
>second,  make group (oid,relname),  is it fisrt sort by oid then in the 
> oid group sort by relname?
> Can somebody explain what does the database done for hashAggregate?

It combines the values for oid and relname for each returned row, generates a 
hashkey
for them, then uses that hashkey to aggregate (compute the GROUP BY, 
essentially, in
this case)

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Anyone interested in a Pittsburgh-area Postgres users'

2015-09-08 Thread Bill Moran

I will be in Pittsburgh full-time starting the middle of Oct, and I
would be interested in helping coordinate a UG. I have a feeling that
I could convince a number of people I've worked with to get involved
as well.

On Tue, 8 Sep 2015 11:10:34 -0400
James Keener <j...@jimkeener.com> wrote:

> Is there a user group in Pittsburgh?  This email was the first that
> showed up in a Google Search.
> 
> Jim
> 
> On 2004-05-02 05:43:26, Tom Lane wrote:
> 
> > I've gotten a couple of inquiries lately about a Postgres users' group
> > in my home town of Pittsburgh PA.  There is not one (unless it's very
> > well camouflaged) but perhaps there is critical mass to create one.
> > If you think you might come to meetings of such a group, let me know
> > off-list.  If I get enough responses I'll set up an initial meeting
> > and we'll see where it goes ...
> > 
> > regards, tom lane
> 
> 
> 


-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why does splitting $PGDATA and xlog yield a performance benefit?

2015-08-25 Thread Bill Moran
On Tue, 25 Aug 2015 10:08:48 -0700
David Kerr d...@mr-paradox.net wrote:

 Howdy All,
 
 For a very long time I've held the belief that splitting PGDATA and xlog on 
 linux systems fairly universally gives a decent performance benefit for many 
 common workloads.
 (i've seen up to 20% personally).
 
 I was under the impression that this had to do with regular fsync()'s from 
 the WAL 
 interfearing with and over-reaching writing out the filesystem buffers. 
 
 Basically, I think i was conflating fsync() with sync(). 
 
 So if it's not that, then that just leaves bandwith (ignoring all of the 
 other best practice reasons for reliablity, etc.). So, in theory if you're 
 not swamping your disk I/O then you won't really benefit from relocating your 
 XLOGs.

Disk performance can be a bit more complicated than just swamping. Even if
you're not maxing out the IO bandwidth, you could be getting enough that some
writes are waiting on other writes before they can be processed. Consider the
fact that old-style ethernet was only able to hit ~80% of its theoretical
capacity in the real world, because the chance of collisions increased with
the amount of data, and each collision slowed down the overall transfer speed.
Contrasted with modern ethernet that doesn't do collisions, you can get much
closer to 100% of the rated bandwith because the communications are effectively
partitioned from each other.

In the worst case scenerion, if two processes (due to horrible luck) _always_
try to write at the same time, the overall responsiveness will be lousy, even
if the bandwidth usage is only a small percent of the available. Of course,
that worst case doesn't happen in actual practice, but as the usage goes up,
the chance of hitting that interference increases, and the effective response
goes down, even when there's bandwidth still available.

Separate the competing processes, and the chance of conflict is 0. So your
responsiveness is pretty much at best-case all the time.

 However, I know from experience that's not entirely true, (although it's not 
 always easy to measure all aspects of your I/O bandwith).
 
 Am I missing something?

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgresql jsonb

2015-08-14 Thread Bill Moran
On Fri, 14 Aug 2015 17:39:49 +0530
Deepak Balasubramanyam deepak.b...@gmail.com wrote:
 
 I have a table (20 million rows) in Postgresql 9.4 that contains a bigint
 id as the primary key and another column that contains jsonb data. Queries
 run on this table look like so...
 
 
 ## Query
 
 select ... from table
 WHERE table.column -'item'- 'name' = 'value'
 
 
 I'd like to make an effort to get Postgresql to keep all data available in
 this table and any index on this table in memory. This would ensure that
 sequence or index scans made on the data are fairly fast.
 
 Research into this problem indicates that there is no reliable way to get
 Postgresql to run off of RAM memory completely (
 http://stackoverflow.com/a/24235439/830964). Assuming the table and its
 indexes amount to 15 gb of data  on the disk and the machine contains 64GB
 of RAM with shared buffers placed at anywhere from 16-24 GB, here are my
 questions...
 
 1. When postgresql returns data from this query, how can I tell how much of
 the data was cached in memory?

I'm not aware of any way to do that on a per-query basis.

 2. I'm aware that I can tweak the shared buffer so that more data is
 cached. Is there a way to monitor this value for its effectiveness?

Install the pg_buffercache extension and read up on what it provides. It
gives a pretty good view into what PostgreSQL is keeping in memory.

 3. Is there a reliable way / calculation (or close to it), to determine a
 point after which Postgresql will ask the disk for data Vs the caches?

It will ask the disk for data if the data is not in memory. As long as the
data it needs is in memory, it will never talk to the disk unless it needs
to write data back.

The cache is a cache. So there are only 2 reasons your data wouldn't all be
in memory all the time:

1) It doesn't all fit
2) Some of that memory is needed by other tables/indexes/etc

As far as when things get evicted from memory, you'll have to look at the
source code, but it's your typical keep the most commonly needed data in
memory algorithms.

What problem are you seeing? What is your performance requirement, and what
is the observed performance? I ask because it's unlikely that you really
need to dig into these details like you are, and most people who ask
questions like this are misguided in some way.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Fwd: PostgreSQL VMWare

2015-07-03 Thread Bill Moran
On Fri, 3 Jul 2015 12:35:07 +0200
Jean-Gérard Pailloncy jg.paillo...@adnow.fr wrote:
 
 I work on a project that collects geolocalized data.
 All data will be in PostgreSQL / PostGIS.
 
 The small PostgreSQL databases will be on Linux guests on VMWare hosts.
 
 The size of the main database will grow by 50 TB / year, 500 M row / day.
 For the largest one, we plan to test different options.
 One of them is to stay with Linux on WMWare.
 Outside the questions about schema, sharding, I would appreciate if some of 
 you have informations, benchmarks, stories about big PostgreSQL databases on 
 Linux guests on VMWare hosts.

The place I'm working now did a feasibility study about installing
their primary app on vmware instead of directly onto the hardware.
Their conclusion was that the app would be about 25% slower running
on VMWare. The app is very database-centric. However, I wasn't
involved in the tests, can't vouche for the quality of the testing,
and there _are_ other pieces involved than the database.

That being said, I've used PostgreSQL on VMs quite a bit. It does
seem slower, but I've never actually benchmarked it. And it's never
seemed slower enough for me to complain much.

The concern I have about running a large database on a VM (especially
since you're asking about performance) is not he VM itself, but all
the baggage that inevitably comes with it ... oversubscribed hosts,
terrible, cheap SANs, poor administration leading to bad configuration,
and yet another layer of obscurity preventing you from figuring out
why things are slow. In my experience, you _will_ get all of these,
because once you're on a VM, the admins will be pressured to host
more and more VMs on the existing hardware and/or add capacity at
minimal cost.

There's nothing like a VM where you never know what the performance
will be because you never know when some other VMs (completely unrelated
to you and/or your work) will saturate the IO with some ridiculous
grep recursive command or something.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Backup Method

2015-07-03 Thread Bill Moran
On Fri, 03 Jul 2015 13:16:02 +0200
Jan Lentfer jan.lent...@web.de wrote:

 Am 2015-07-03 13:00, schrieb howardn...@selestial.com:
  On 03/07/2015 11:39, Guillaume Lelarge wrote:
 
   In that case is there any recommendation for how often to make 
  base backups in relation to the size of the cluster and the size of 
  the WAL?
  
 
  Nope, not really. That depends on a lot of things. Our customers 
  usually do one per day.
 
 
  Excuse my ignorance... Is the base backup, in general, faster than 
  pg_dump?
 
 It is a different approach. With the base backup you are actually 
 backing up files from the filesystem ($PGDATA directory), whereas with 
 pg_dump your saving the SQL commands to reload and rebuild the database.
 Usually a file based backup will be faster, both on backup and 
 restore, but it is - as mentioned - a different approach and it might 
 also not serve all your purposes.

One of the things that makes a lot of difference is the amount of
redundant data in the database. For example, indexes are completely
redundant. They sure do speed things up, but they're storing the same
data 2x for each index you have. When you do a base backup, you have
to copy all that redundancy, but when you do a pg_dump, all that
redundant data is reduced to a single CREATE INDEX command. The
result being that if your database has a lot of indexes, the pg_dump
might actually be faster.

But the only way to know is to try it out on your particular system.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] very slow queries and ineffective vacuum

2015-07-02 Thread Bill Moran
On Thu, 2 Jul 2015 12:58:18 +0200
Lukasz Wrobel lukasz.wro...@motorolasolutions.com wrote:

 Hello again.
 
 Thank you for all your responses. I will try to clarify more and attempt to
 answer the questions you raised.
 
 I'm attaching the postgresql.conf this time. I cannot supply you guys with
 a proper database schema, so I will try to supply you with some obfuscated
 logs and queries. Sorry for the complication.
 
 First of all I seem to have misdirected you guys about the pg_stat* tables.
 I have a virtual machine with the database from our test team, which was
 running for a month. When I deploy it, our java application is not running,
 so no queries are being executed. The pg_stat* tables contain no data
 (which is surprising). When I launch the application and queries start
 going, the stats are collected normally and autovacuums are being performed.
 
 I attached the output of vacuum verbose command.
 
 As for the pg_stat_activity, I have no idle in transaction records there,
 but I do have some in idle state, that don't disappear. Perhaps this
 means some sessions are not closed? I attached the query result as
 activity.txt.
 
 I also have a few sending cancel to blocking autovacuum and canceling
 autovacuum task messages in syslog.
 
 Sample query explain analyze. This was ran after vacuum analyze of the
 entire database.

The analyze doesn't seem to be working terribly well. Looking at the
explain, it expects 337963 rows in table57, but there are only 6789.
There are similar discrepencies with table19 and table84.

I don't know if indexes are your problem. Those three tables are pretty
small, so the sequential scans should be pretty quick (probably faster
than index scans, since it looks like most of the rows are returned from
all the tables.

I'm somewhat confused by your description of the situation. Is the performance
problem happening on the virtual machine? Because VMs are notorious for
being on oversubscribed hosts and exhibiting performance far below what
is expected. It would be worthwhile to do some disk speed and CPU speed tests
on the VM to see what kind of performance it's actually capable of ... if
the VM is performing poorly, there's not much you can do with PostgreSQL
to improve things.

 explain analyze SELECT col1, col2, col3, col4, col5 FROM ( table84 table84
 LEFT JOIN table57 table57 ON table84.col7 = table57.col7 ) LEFT JOIN
 table19 table19 ON table84.col7 = table19.col7;
  QUERY
 PLAN
 -
  Hash Right Join  (cost=46435.43..108382.29 rows=189496 width=79) (actual
 time=4461.686..13457.233 rows=5749 loops=1)
Hash Cond: (table57.col7 = table84.col7)
-  Seq Scan on table57 table57  (cost=0.00..49196.63 rows=337963
 width=57) (actual time=0.040..8981.438 rows=6789 loops=1)
-  Hash  (cost=42585.73..42585.73 rows=189496 width=38) (actual
 time=4447.731..4447.731 rows=5749 loops=1)
  Buckets: 16384  Batches: 2  Memory Usage: 203kB
  -  Hash Right Join  (cost=18080.66..42585.73 rows=189496
 width=38) (actual time=1675.223..4442.046 rows=5749 loops=1)
Hash Cond: (table19.col7 = table84.col7)
-  Seq Scan on table19 table19  (cost=0.00..17788.17
 rows=187317 width=26) (actual time=0.007..2756.501 rows=5003 loops=1)
-  Hash  (cost=14600.96..14600.96 rows=189496 width=20)
 (actual time=1674.940..1674.940 rows=5749 loops=1)
  Buckets: 32768  Batches: 2  Memory Usage: 159kB
  -  Seq Scan on table84 table84  (cost=0.00..14600.96
 rows=189496 width=20) (actual time=0.059..1661.482 rows=5749 loops=1)
  Total runtime: 13458.301 ms
 (12 rows)
 
 Thank you again for your advice and I hope that with your help I'll be able
 to solve this issue.
 
 Best regards.
 Lukasz


-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] INSERT a number in a column based on other columns OLD INSERTs

2015-06-20 Thread Bill Moran
On Sat, 20 Jun 2015 10:44:21 -0700 (MST)
litu16 litumelen...@gmail.com wrote:

 In PostgreSQL I have this table... (there is a primary key in the most left
 side timestamp02 which is not shown in this image)
 
 in the table above, all columns are entered via querrys, except the
 time_index which I would like to be filled automatically via a trigger
 each time each row is filled.
 
 This is the code to create the same table (without any value) so everyone
 could create it using the Postgre SQL query panel.
 
 *CREATE TABLE table_ebscb_spa_log02
 (
   pcnum smallint,
   timestamp02 timestamp with time zone NOT NULL DEFAULT now(),
   fn_name character varying,
   time time without time zone,
   time_elapse character varying,
   time_type character varying,
   time_index real,
   CONSTRAINT table_ebscb_spa_log02_pkey PRIMARY KEY (timestamp02)
 )
 WITH (
   OIDS=FALSE
 );
 ALTER TABLE table_ebscb_spa_log02
   OWNER TO postgres;*
 
 What I would like the trigger to do is:
 
 INSERT a number in the time_index column based on the INSERTed values of
 the fn_name and time_type columns in each row.
 
 If both (fn_name and time_type) do a combination (eg. Check Mails -
 Start) that doesn't exist in any row before (above), then INSERT 1 in the
 time_index column,
 
 Elif both (fn_name and time_type) do a combination that does exist in
 some row before (above), then INSERT the number following the one
 before(above) in the time_index column.
 
 (pls look at the example table image, this trigger will produce every red
 highlighted square on it)
 
 
 I have tried so far this to create the function:
 
 CREATE OR REPLACE FUNCTION on_ai_myTable() RETURNS TRIGGER AS $$
 DECLARE
 t_ix real;
 n int;
 
 BEGIN
 IF NEW.time_type = 'Start' THEN
 SELECT t.time_index FROM table_ebscb_spa_log02 t WHERE t.fn_name =
 NEW.fn_name AND t.time_type = 'Start' ORDER BY t.timestamp02 DESC LIMIT 1
 INTO t_ix;
   GET DIAGNOSTICS n = ROW_COUNT;
 IF (n = 0) THEN 
 t_ix = 1;
 ELSE 
 t_ix = t_ix + 1;
 END IF;
 END IF;
 NEW.time_index = t_ix;
 return NEW;
 END
 $$
 LANGUAGE plpgsql;
 
 
 But when I manually insert the values in the table, nothing change (no error
 message) time_index column just remain empty, what am I doing wrong???
 
 Please some good PostgreSQL fellow programmer could give me a hand, I really
 have come to a death point in this task, I have any more ideas.

Couple things.

First off, you don't show your statement for creating the trigger. This is 
important.
The trigger has to be a BEFORE trigger FOR EACH ROW. Otherwise, the returned 
value
won't do anything. It should read like this:

CREATE TRIGGER trigger_name
 BEFORE INSERT ON table_ebscb_spa_log02
 FOR EACH ROW EXECUTE PROCEDURE on_ai_myTable();

If you created it with AFTER INSERT or FOR EACH STATEMENT, then the trigger 
won't
work as desired.

The other thing about assignment being := was already mentioned.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Select query regarding info

2015-06-18 Thread Bill Moran
On Thu, 18 Jun 2015 07:29:37 +
Yogesh. Sharma yogesh1.sha...@nectechnologies.in wrote:

 HI Everyone,
 
 Below DB query is showing below error on postgresql9.3.
 SELECT '\'' || t2.name || '\'', '\'' || t1.phone_number || '\'', '\'' || 
 t1.details || '\'', '\'' || t1.description || '\'', '\'' || (CASE WHEN 
 t1.s_id IS NULL THEN 'N/A' ELSE t3.s_type END) || '\'', '\'' || t1.s_id || 
 '\'' FROM abc_tble AS t1 LEFT JOIN pqrtable AS t2 ON t1.s_id = nid LEFT JOIN 
 te AS t3 ON t1.s_id = t3.s_id;
 Invalid command \''. Try \? for help.
 But Above query is working fine in postgresql8.3.
 Solution is provided by someone:-
 The SQL standard defines two single quotes to escape one inside a literal: 
 
 Postgres 8.3 defaulted to a non-standard behavior where it was allowed to 
 escape a single quote using a backslash: '\''
 This deviation from the SQL standard was always discouraged and can be 
 controlled through the configuration parameter 
 standard_conforming_stringshttp://www.postgresql.org/docs/current/static/runtime-config-compatible.html#GUC-STANDARD-CONFORMING-STRINGS
 With version 9.1 the default for this parameter was changed from off to on. 
 Version 8.1 and later would emit a warning when you used the non-standard way 
 of escaping single quotes (unless you explicitly turned that off)
 
 
 Could you please provide below information.
  How to change standard_conforming_strings value of postgresql.conf? I have 
 checked but this option is not found in postgresql.conf.

Add it to the file.

Also, don't reply to unrelated threads with new questions, a lot of
people won't see your question if you do that, and if nobody sees
your question you won't get an answer.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Momentary Delay

2015-06-17 Thread Bill Moran
On Wed, 17 Jun 2015 10:33:37 -0300
Anderson Valadares anderva...@gmail.com wrote:

 2015-06-15 18:19 GMT-03:00 Bill Moran wmo...@potentialtech.com:
 
  On Mon, 15 Jun 2015 17:48:54 -0300
  Anderson Valadares anderva...@gmail.com wrote:
 
   2015-06-12 19:56 GMT-03:00 Bill Moran wmo...@potentialtech.com:
  
Please do not remove the mailing list from replies. See below.
   
On Fri, 12 Jun 2015 09:21:19 -0300
Anderson Valadares anderva...@gmail.com wrote:
   
 2015-06-08 20:33 GMT-03:00 Bill Moran wmo...@potentialtech.com:

  On Mon, 8 Jun 2015 11:59:31 -0300
  Anderson Valadares anderva...@gmail.com wrote:
 
   Hi
We are experiencing an intermittent problem in a GIS database
  from a
   client. Some times of the day is as if the PostgreSQL executed
  the
  slowest
   operations see below an example. The query was performed three
  times,
  twice
   I canceled and the third time I left it to end. The consultation
took 10
   seconds to finish, but the measured execution time is 20 ms. As
  we
have
   several queries that run every 2 seconds when this momentary
  delay
occurs
   queries accumulate and burst the limit of 203 connections
  allowed.
The
   interval of momentary stop are approximately 2 seconds
  occurring at
   random times and during these stoppages occur no locks and no
increased
   load on the server is identified. There is a job 2/2 seconds
  locks
   collection of information, running queries, etc., nmon also
  collects
   information every 2 seconds. My client asks what causes these
momentary
   stops? because it affects all operations of the database? etc.
  How
do I
   identify what is causing these delays in executions of
  operations in
the
   database?
 
  Number of points to consider:
  * Check the contents of pg_locks and see if something is waiting
  on a
lock when the query is slow.

  There is nothing waiting when the query is slow, see:

  snap_datetime | waiting | count
 ---+-+---
  2015-06-05 09:25:00.954731-03 | f   |74
  2015-06-05 09:26:00.249187-03 | f   |   205
  2015-06-05 09:27:00.826874-03 | f   |   207
   
I don't know what that means. Since you don't show the query that
generated that output, I have no idea if your statement is valid, or
if you're running a query that will inherently produce incorrect
results.
  
   Sorry the information was incomplete. Below is the query that was
   performed to extract information from tables that used to monitor the
   database.
   The snap_datetime column indicates the time that occurred monitoring,
   the waiting column tells you whether any connection was on hold and the
   column
   count tells how many connections existed at the time.
  
   select
s.snap_datetime
,a.waiting
, count(*)
   from stat_snapshot s, stat_activity a
   where s.snap_id = a.snap_id
 and s.snap_datetime = '2015-06-05 09:25:00'
 and s.snap_datetime = '2015-06-05 09:28:00'
   group by
 s.snap_datetime
 ,a.waiting
   order by s.snap_datetime
   ;
snap_datetime | waiting | count
   ---+-+---
2015-06-05 09:25:00.954731-03 | f   |74
2015-06-05 09:26:00.249187-03 | f   |   205
2015-06-05 09:27:00.826874-03 | f   |   207
   (3 rows)
 
  Given what I'm seeing above, there's still a lot of information
  missing. Those tables are not standard PostgreSQL tables, and I'm
  not aware of any tool that creates them, thus I have to assume
  it's a cron job or something similar created in-house. Given the
  query you ran and the resultant data, my first guess is that the
  data in stat_snapshot and stat_activity is worthless: i.e. it's
  captured once per minute, and therefore there are 59+ seconds
  worth of detail that aren't captured, thus the actual liklihood
  that the those tables will contain any indication of the problem
  is very low.
 
 Yes it is a cron, created at home, running every 1 hour gathering
 information from PostgreSQL tables (pg_stat_activity, pg_locks, etc).
 How to improve the collection of information PostgreSQL?

Look at the log file, details of what is going on are being recorded
as they happen.

However, given your statement below (claiming that nothing happened
during the observed slowdown between 9:26 and 9:27) you may want to
find someone who is actually capable of interpreting the log output,
as there are dozens of indicators of what's wrong.  Immediately after
the problem disappears, a number of queries finish after spending a
lot of time executing, in addition to a number of complaints of long
lock waits. So all those queries were fighting each other and when
they finally untangled themselves and got their work done

Re: [GENERAL] serialization failure why?

2015-06-16 Thread Bill Moran
On Tue, 16 Jun 2015 13:33:12 +0001
Filipe Pina filipe.p...@impactzero.pt wrote:

 I have these 2 tables:
 
 CREATE TABLE stuff (number integer NOT NULL PRIMARY KEY, title 
 varchar(40) NOT NULL);
 CREATE TABLE stuff_ext (stuff_ptr_id integer NOT NULL PRIMARY KEY, 
 extra integer NOT NULL);
 ALTER TABLE stuff_ext ADD CONSTRAINT 
 stuff_ext_stuff_ptr_id_5a4ee8edae53404b FOREIGN KEY (stuff_ptr_id) 
 REFERENCES stuff (number) DEFERRABLE INITIALLY DEFERRED;
 CREATE SEQUENCE stuff_seq;
 
 And then the function:
 
 CREATE OR REPLACE FUNCTION create_stuff(number integer, title text)
 RETURNS integer AS $$
 DECLARE
 a1 stuff;
 a2 stuff_ext;
 BEGIN
 IF number IS NULL THEN
 number := nextval('stuff_seq');
 END IF;
 
 a1.number := number;
 a1.title := title;
 
 a2.stuff_ptr_id := a1.number;
 
 INSERT INTO stuff VALUES (a1.*);
 INSERT INTO stuff_ext VALUES (a2.*);
 
 RETURN number;
 END
 $$
 LANGUAGE plpgsql;
 
 
 The DB is configured for SERIALIZABLE transaction mode.
 
 Now, if I can the function without passing number, such as:
 
 select create_stuff(NULL,'title');
 
 in 10 forked processes in a loop with a few iterations in each, I get 
 quite a few SERIALIZATON FAILURE (sqlstate 40001).
 
 If I comment out the INSERT INTO stuff_ext line, I don't get any.
 
 How is the second insert causing serialize dependencies...?

I'm not sure this is correct, but I have a theory.

Essentially, PostgreSQL can't be sure that the foreign key will be
valid if the other transaction rolls back. i.e., what if the foreign
key is valid becuase the other transaction created the matching row,
and that other transaction then rolls back?

In other isolation modes, it can just wait for the appropriate lock to
free up, then see what happens. But in serializable mode it hits a
condition where it can't ensure serializability.

 The specific error messages vary between
 
 ERROR: could not serialize access due to read/write dependencies among 
 transactions
 DETAIL: Reason code: Canceled on identification as a pivot, during 
 commit attempt.
 HINT: The transaction might succeed if retried.
 
 and
 
 ERROR: could not serialize access due to read/write dependencies among 
 transactions
 DETAIL: Reason code: Canceled on commit attempt with conflict in from 
 prepared pivot.
 HINT: The transaction might succeed if retried.
 
 Thanks!


-- 
Bill Moran wmo...@potentialtech.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Compression function

2015-06-16 Thread Bill Moran
On Tue, 16 Jun 2015 04:45:52 -0300
Leonardo M. Ramé l.r...@griensu.com wrote:

 Hi, does anyone know if there's a compression function to let me store 
 in gzipped/deflate format TEXT or Bytea fields.
 
 Please correct me if I'm wrong, but I also wonder if this function is 
 really needed since I've read large objects are stored with TOAST, hence 
 compression is already there.

The TOAST system does do compression, but depending on your expectation,
you may be disappointed.

The big thing that might let you down is that the TOAST code doesn't run
at all unless the tuple is larger than 2K. As a result, you could have
fairly large rows of almost 2000 bytes long, that _could_ compress to
significantly less than that, but PostgreSQL never tries to compress.
Additionally, PostgreSQL stops trying to compress fields once the row size
is smaller than 2K, so if you have multiple fields that could benefit from
compression, they might not all be compressed.

As a result, if you understand your data well, you need to take this into
account, as you might see better results if you do your own compression.
Unfortunately, I don't know of any in-database function that can be used
to compress data; you'd have to write your own or do it at the application
level.

-- 
Bill Moran wmo...@potentialtech.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Momentary Delay

2015-06-15 Thread Bill Moran
On Mon, 15 Jun 2015 17:48:54 -0300
Anderson Valadares anderva...@gmail.com wrote:

 2015-06-12 19:56 GMT-03:00 Bill Moran wmo...@potentialtech.com:
 
  Please do not remove the mailing list from replies. See below.
 
  On Fri, 12 Jun 2015 09:21:19 -0300
  Anderson Valadares anderva...@gmail.com wrote:
 
   2015-06-08 20:33 GMT-03:00 Bill Moran wmo...@potentialtech.com:
  
On Mon, 8 Jun 2015 11:59:31 -0300
Anderson Valadares anderva...@gmail.com wrote:
   
 Hi
  We are experiencing an intermittent problem in a GIS database from a
 client. Some times of the day is as if the PostgreSQL executed the
slowest
 operations see below an example. The query was performed three times,
twice
 I canceled and the third time I left it to end. The consultation
  took 10
 seconds to finish, but the measured execution time is 20 ms. As we
  have
 several queries that run every 2 seconds when this momentary delay
  occurs
 queries accumulate and burst the limit of 203 connections allowed.
  The
 interval of momentary stop are approximately 2 seconds occurring at
 random times and during these stoppages occur no locks and no
  increased
 load on the server is identified. There is a job 2/2 seconds locks
 collection of information, running queries, etc., nmon also collects
 information every 2 seconds. My client asks what causes these
  momentary
 stops? because it affects all operations of the database? etc. How
  do I
 identify what is causing these delays in executions of operations in
  the
 database?
   
Number of points to consider:
* Check the contents of pg_locks and see if something is waiting on a
  lock when the query is slow.
  
There is nothing waiting when the query is slow, see:
  
snap_datetime | waiting | count
   ---+-+---
2015-06-05 09:25:00.954731-03 | f   |74
2015-06-05 09:26:00.249187-03 | f   |   205
2015-06-05 09:27:00.826874-03 | f   |   207
 
  I don't know what that means. Since you don't show the query that
  generated that output, I have no idea if your statement is valid, or
  if you're running a query that will inherently produce incorrect
  results.
 
 Sorry the information was incomplete. Below is the query that was
 performed to extract information from tables that used to monitor the
 database.
 The snap_datetime column indicates the time that occurred monitoring,
 the waiting column tells you whether any connection was on hold and the
 column
 count tells how many connections existed at the time.
 
 select
  s.snap_datetime
  ,a.waiting
  , count(*)
 from stat_snapshot s, stat_activity a
 where s.snap_id = a.snap_id
   and s.snap_datetime = '2015-06-05 09:25:00'
   and s.snap_datetime = '2015-06-05 09:28:00'
 group by
   s.snap_datetime
   ,a.waiting
 order by s.snap_datetime
 ;
  snap_datetime | waiting | count
 ---+-+---
  2015-06-05 09:25:00.954731-03 | f   |74
  2015-06-05 09:26:00.249187-03 | f   |   205
  2015-06-05 09:27:00.826874-03 | f   |   207
 (3 rows)

Given what I'm seeing above, there's still a lot of information
missing. Those tables are not standard PostgreSQL tables, and I'm
not aware of any tool that creates them, thus I have to assume
it's a cron job or something similar created in-house. Given the
query you ran and the resultant data, my first guess is that the
data in stat_snapshot and stat_activity is worthless: i.e. it's
captured once per minute, and therefore there are 59+ seconds
worth of detail that aren't captured, thus the actual liklihood
that the those tables will contain any indication of the problem
is very low.

   * Also, ensure log_lock_waits is turned on for the server and check
  PostgreSQL's logs for messages about long lock waits.
  
   There is also nothing
 
  This is even worse than the previous statement. What did you _do_ to
  determine this?
 
  Since you're unable to diagnose and fix the problem on your own, the
  possibility exists that the problem is with your diagnostic steps. If
  you don't share the details of those steps, I have no way to help you
  with them.
 
 See the log in
 https://github.com/andervalbh/serv-gis01/blob/master/postgresql-2015-06-05_00.log

Apologies, my above comment was intended to be in regards to the IO load.

However, that log shows that you have massive lock contention. In the early
part of the log a process waits over 5 minutes for a share lock:
2015-06-05 05:51:32 BRT [44236822] - [10.57.1.26] [14-1] db=citgis,user=opegis 
LOG:  process 44236822 acquired ShareLock on transaction 3090377758 after 
374146.817 ms

Without seeing the source of the PL/pgSQL functions that seem to be
causing the problem, I can only suggest that you review those functions
for excessive locking and inefficient access. I'm going to guess

Re: [GENERAL] Momentary Delay

2015-06-12 Thread Bill Moran

Please do not remove the mailing list from replies. See below.

On Fri, 12 Jun 2015 09:21:19 -0300
Anderson Valadares anderva...@gmail.com wrote:

 2015-06-08 20:33 GMT-03:00 Bill Moran wmo...@potentialtech.com:
 
  On Mon, 8 Jun 2015 11:59:31 -0300
  Anderson Valadares anderva...@gmail.com wrote:
 
   Hi
We are experiencing an intermittent problem in a GIS database from a
   client. Some times of the day is as if the PostgreSQL executed the
  slowest
   operations see below an example. The query was performed three times,
  twice
   I canceled and the third time I left it to end. The consultation took 10
   seconds to finish, but the measured execution time is 20 ms. As we have
   several queries that run every 2 seconds when this momentary delay occurs
   queries accumulate and burst the limit of 203 connections allowed. The
   interval of momentary stop are approximately 2 seconds occurring at
   random times and during these stoppages occur no locks and no increased
   load on the server is identified. There is a job 2/2 seconds locks
   collection of information, running queries, etc., nmon also collects
   information every 2 seconds. My client asks what causes these momentary
   stops? because it affects all operations of the database? etc. How do I
   identify what is causing these delays in executions of operations in the
   database?
 
  Number of points to consider:
  * Check the contents of pg_locks and see if something is waiting on a
lock when the query is slow.
 
  There is nothing waiting when the query is slow, see:
 
  snap_datetime | waiting | count
 ---+-+---
  2015-06-05 09:25:00.954731-03 | f   |74
  2015-06-05 09:26:00.249187-03 | f   |   205
  2015-06-05 09:27:00.826874-03 | f   |   207

I don't know what that means. Since you don't show the query that
generated that output, I have no idea if your statement is valid, or
if you're running a query that will inherently produce incorrect
results.

 * Also, ensure log_lock_waits is turned on for the server and check
PostgreSQL's logs for messages about long lock waits.
 
 There is also nothing

This is even worse than the previous statement. What did you _do_ to
determine this?

Since you're unable to diagnose and fix the problem on your own, the
possibility exists that the problem is with your diagnostic steps. If
you don't share the details of those steps, I have no way to help you
with them.

  * Based on the connection behavior you describe, I'm guessing it's a
Tomcat app using some sort of conection pool. Whatever it is, tune
your connection pool settings so that the max size of the pool doesn't
exceed the available PostgreSQL connections. At least that will prevent
errors from happening when the problem occurs.
 
 Yes it is a tomcat application using connection pooling and will be set to
 not exceed the limit of PostgreSQL connections, but the central point is
 that the number of connections has increased due to what I am calling
 momentary stoppages

I understand that. I made the suggestion as a temporary fix to reduce
the impact until you are able to fix the actual cause.

  * The query you you keep cancelling below, run just EXPLAIN on it (not
EXPLAIN ANALYZE) which will certainly finish and give you a plan
that can be reviewed to help determine what the problem is.
 
 Explain plan in http://explain.depesz.com/s/bWw

I'm confused. The query at that URL only took 10ms to complete, which
doesn't seem at all unreasonable.

 * On the OS, monitor iostat or something else to see if you're
saturating disk capacity when the problem happens.
 
 The OS is also being monitored and there occurs no saturation.

Again, without any details, it's possible that there is a problem
in your monitoring.

 
  The Server is a IBM P720 128G RAM PostgreSQL 9.2.9 on
   powerpc-ibm-aix7.1.0.0, compiled by gcc (GCC) 4.6.4, 64-bit
  
   Evolution of the number of connections for a period
  
   snap_datetime|  #connections
   --+-
   2015-06-05 09:25:00.954731-03 | 74
   2015-06-05 09:26:00.249187-03 | 205
   2015-06-05 09:27:00.826874-03 | 207
   2015-06-05 09:28:00.374666-03 | 73
   2015-06-05 09:29:00.690696-03 | 75
  
  
   Occurrence of the problem
  
   citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
  dvi_cod_dados_via_iti,
   dvi_sub.via_cod_viagem
from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
   via_sub, gis_iti_itinerario iti_sub
where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
  and via_sub.via_status_viagem = 'A'
  and via_sub.via_dt_hora_ini  now() - interval '9 hours'
  and iti_sub.lin_cod_linha = 389
  and iti_sub.iti_sentido_itinerario = 'I'
  and iti_sub.iti_cod_itinerario_linha

Re: [GENERAL] Planner cost adjustments

2015-06-11 Thread Bill Moran
On Wed, 10 Jun 2015 17:20:00 -0400
Daniel Begin jfd...@hotmail.com wrote:

 Here is a follow-up on adjusting the planner costs calculation
 
 -Statistics target of problematic columns were increased from 100 to 200.
 -Analyse was ran on all concerned tables (actually ran on the whole DB)
 -Random_page_cost was decreased from 4 to 2.

Since you changed two things, there's no way to be sure which change led
to the improvement. You seem to be assuming that changing the stastics
target was what helped. While that _may_ be right, it might also have
been the change to random_page_cost.

 As a result, about 80% of queries are now using what I would consider an
 optimal plan. However, the planner keeps using full table scans for large
 tables... 
 
 For instance, I ran a query that should have used an index scan on two
 similar test tables. The planner had selected an index scan for the smaller
 one and a Seq Scan for larger one. Except for their sizes and for one field
 not used in the exercise, the test tables were identical and indexed on the
 same field. The smaller test table had 3.26E+10 records and the larger one
 3.78E+11 records.  
 
 The query looked like...
 SELECT cs.user_id, cs.changeset_id, nd.id, nd.version  
 FROM changesets_selection cs, a_test_table nd
 WHERE nd.changeset_id=cs.changeset_id;
 
 In order to understand why the planner selected the Seq Scan instead of an
 Index Scan on the large table (nodes), I ran an EXPLAIN ANALYSE (on warm
 cache) using enable_seqscan set to OFF/ON.
 
 -- QUERY PLAN on larger table (nodes) with SET enable_seqscan=OFF
 -;
  Nested Loop  (cost=100.58..10210480648.03 rows=194204592 width=40)
 (actual time=74.088..278008.579 rows=140112 loops=1)
-  Seq Scan on changesets_selection cs
 (cost=100.00..1000110.44 rows=6644 width=24) (actual
 time=0.015..4.904 rows=6903 loops=1)
-  Index Scan using nodes_changesetidndx on nodes nd
 (cost=0.58..31387.49 rows=29230 width=24) (actual time=12.655..40.259
 rows=20 loops=6903)
  Index Cond: (changeset_id = cs.changeset_id)
  Total runtime: 278026.196 ms
 (5 rows) 
 
 -Completed after less than 5 minutes processing 
 -I assume that cost=100.00..1000110.44 for the Seq Scan of
 changesets_selection is an artefact of setting enable_seqscan=OFF.
 -From what I see, the evaluation of rows number is still way off (1400X) for
 the large table, even if the statistics target was doubled to 200.
 
 -- QUERY PLAN on larger table (nodes) with SET enable_seqscan=ON
 --;
  Hash Join  (cost=156171782.28..185673195.13 rows=194204592 width=40)
Hash Cond: (cs.changeset_id = n.changeset_id)
-  Seq Scan on changesets_selection cs  (cost=0.00..110.44 rows=6644
 width=24)
-  Hash  (cost=84959952.68..84959952.68 rows=3878771968 width=24)
  -  Seq Scan on nodes nd  (cost=0.00..84959952.68 rows=3878771968
 width=24)
 (5 rows) 
 
 -Still running after 2:30 hours processing! That is why I did not provided
 the actual time and rows (however, actual rows are provided on first query
 plan)
 -Not surprisingly, the evaluation of rows number is way off again for the
 large table - same stats, same results...
 
 It seems there is a problem with my large table statistics, even after
 increase them to 200. Should I increase the statistic target to 500, or even
 to 1000?
 Is there something else I can trigger to get the appropriate plan?
 
 Comments/explanations would be appreciated
 Daniel
 
 
 -Original Message-
 From: pgsql-general-ow...@postgresql.org
 [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Daniel Begin
 Sent: June-03-15 06:32
 To: 'Bill Moran'
 Cc: pgsql-general@postgresql.org; 'Tomas Vondra'; 'Melvin Davidson'
 Subject: Re: [GENERAL] Planner cost adjustments
 
 Thank Bill,
 
 About disks performance, all drives are identical and connected using USB3
 connections and yes, I can tweak values and restart Postgres without any
 hardship!-)
 About seq_page_cost and random_page_cost, I am about to test different lower
 values as you and Thomas propose.
 Raising the statistics target is a good idea. Since most of the data have a
 power law distribution it might affect the statistics.
 
 I will do as suggested and get back to the community for further comments. I
 wished to follow the procedure proposed by PT, just in case I eventually had
 to on step 4 (contact PostgreSQL developers so they can improve the
 planner).
 
 And I am still open to other proposal
 Daniel
 
 -Original Message-
 From: pgsql-general-ow...@postgresql.org
 [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Bill Moran
 Sent: June-02-15 23:31
 To: Daniel Begin
 Cc: pgsql-general@postgresql.org; 'Tomas Vondra'; 'Melvin Davidson'
 Subject: Re: [GENERAL] Planner cost adjustments
 
 On Tue, 2 Jun 2015 14:01:35 -0400
 Daniel Begin jfd...@hotmail.com wrote:
 
  Here is a follow-up on the step-by-step procedure proposed by PT
  
  #1 - setup

Re: [GENERAL] Momentary Delay

2015-06-08 Thread Bill Moran
'
and iti_sub.iti_cod_itinerario_linha =
 via_sub.iti_cod_itinerario_linha
and dvi_sub.vlo_cod_localizacao is not null
  group by dvi_sub.via_cod_viagem,
 iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
 ;
 Cancel request sent
 ERROR:  canceling statement due to user request
 Time: 10164.015 ms
 citgis=# select now();
   now
 ---
  2015-06-05 09:27:22.006072-03
 (1 row)
 
 Time: 0.152 ms
 citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
 dvi_sub.via_cod_viagem
  from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
 via_sub, gis_iti_itinerario iti_sub
  where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
and via_sub.via_status_viagem = 'A'
and via_sub.via_dt_hora_ini  now() - interval '9 hours'
and iti_sub.lin_cod_linha = 389
and iti_sub.iti_sentido_itinerario = 'I'
and iti_sub.iti_cod_itinerario_linha =
 via_sub.iti_cod_itinerario_linha
and dvi_sub.vlo_cod_localizacao is not null
  group by dvi_sub.via_cod_viagem,
 iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
 ;
  dvi_cod_dados_via_iti | via_cod_viagem
 ---+
 1059964443 |7989813
 1060072723 |7990876
 (2 rows)
 
 Time: 5565.175 ms
 citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
 dvi_sub.via_cod_viagem
  from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
 via_sub, gis_iti_itinerario iti_sub
  where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
and via_sub.via_status_viagem = 'A'
and via_sub.via_dt_hora_ini  now() - interval '9 hours'
and iti_sub.lin_cod_linha = 389
and iti_sub.iti_sentido_itinerario = 'I'
and iti_sub.iti_cod_itinerario_linha =
 via_sub.iti_cod_itinerario_linha
and dvi_sub.vlo_cod_localizacao is not null
  group by dvi_sub.via_cod_viagem,
 iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
 ;
  dvi_cod_dados_via_iti | via_cod_viagem
 ---+
 1059964445 |7989813
 1060072725 |7990876
 (2 rows)
 
 Time: 27.944 ms
 citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
 dvi_sub.via_cod_viagem
  from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
 via_sub, gis_iti_itinerario iti_sub
  where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
and via_sub.via_status_viagem = 'A'
and via_sub.via_dt_hora_ini  now() - interval '9 hours'
and iti_sub.lin_cod_linha = 389
and iti_sub.iti_sentido_itinerario = 'I'
and iti_sub.iti_cod_itinerario_linha =
 via_sub.iti_cod_itinerario_linha
and dvi_sub.vlo_cod_localizacao is not null
  group by dvi_sub.via_cod_viagem,
 iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
 ;
  dvi_cod_dados_via_iti | via_cod_viagem
 ---+
 1059964445 |7989813
 1060072727 |7990876
 (2 rows)
 
 Time: 24.428 ms
 
 Greetings
 
   Anderson


-- 
Bill Moran wmo...@potentialtech.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Automatic Client routing

2015-06-04 Thread Bill Moran
On Thu, 4 Jun 2015 08:53:15 -0400
Ravi Krishna sravikrish...@gmail.com wrote:

 Are there any plans to introduce the concept of automatic client
 routing to the principal server in a cluster of N machines. For
 example, if there is a four node replication cluster N1 .. N4, at any
 time only one can be principal (the one which does the writing). In
 Oracle and DB2, client side libraries provide a way for the clients to
 connect to the principal writer regardless of where it is running on
 N1 .. N4. This way client need to be aware of only one connection
 string.
 
 EnterpriseDb is a failover manager which relies on virtual IP
 management, not the one I described above.

pgpool has this capacity.

-- 
Bill Moran wmo...@potentialtech.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Planner cost adjustments

2015-06-02 Thread Bill Moran
)
 (actual time=0.027..4620.691 rows=25133929 loops=1)
-  Hash  (cost=14.00..14.00 rows=600 width=8) (actual time=0.300..0.300
 rows=600 loops=1)
  Buckets: 1024  Batches: 1  Memory Usage: 24kB
  -  Seq Scan on users  (cost=0.00..14.00 rows=600 width=8) (actual
 time=0.022..0.187 rows=600 loops=1)
  Total runtime: 7519.254 ms
 (7 rows)
 
 osmdump=# SET enable_seqscan = OFF;
 osmdump=# EXPLAIN ANALYSE SELECT user_id, id AS changeset_id,closed FROM
 changesets WHERE changesets.user_id IN(SELECT id FROM sample.users);
 
 
  Nested Loop  (cost=115.94..10001090810.49 rows=729133 width=24)
 (actual time=0.268..1466.248 rows=338568 loops=1)
-  HashAggregate  (cost=115.50..121.50 rows=600 width=8)
 (actual time=0.205..0.530 rows=600 loops=1)
  -  Seq Scan on users  (cost=100.00..114.00
 rows=600 width=8) (actual time=0.003..0.035 rows=600 loops=1)
-  Index Scan using changesets_useridndx on changesets
 (cost=0.44..1805.83 rows=1215 width=24) (actual time=0.036..2.314 rows=564
 loops=600)
  Index Cond: (user_id = users.id)
  Total runtime: 1677.447 ms
 (6 rows)
 
 #3 - Run EXPLAIN ANALYZE and look for discrepancies between the estimated
 and actual times
 
 --
 Looking at above results, there are obvious discrepancies between
 expected/actual rows and time!
 I dug a bit by exploring/trying to understand the different concepts
 explained in...
 
 http://www.postgresql.org/docs/9.4/static/planner-stats.html
 http://www.postgresql.org/docs/8.1/static/planner-stats-details.html
 http://www.postgresql.org/docs/9.2/static/view-pg-stats.html
 
 Concerning discrepancies between the actual number of rows and predicted
 value, I looked at what pg_stats was saying about user_id in table
 changesets.
 Here are the values provided to the planner...
 Average_width=8
 histogram_bounds: the size of the bins varies between 50 and 15, which
 make sense because if I had divided the column's values into groups of
 approximately equal population, I would have produced bins between 1 and
 10 (if sorted by frequency)
 n_distinct= 20686 (there is actually 464858 distinct values for user_id in
 the table)
 most_common_vals: values make sense (I checked the frequency count of a
 couple most common users_id)
 correlation=0.617782 (?)
 most_common_elems, most_common_elem_freqs and elem_count_histogram were
 empty
 
 At this point, I wonder if the assumptions behind the planner's statistics
 may produce such problems since the distribution of my data is not uniform
 but follows a power law (some user_id would return millions of records while
 others only one).  
 This is the farthest I can go at this point. Maybe someone can provide me
 with more explanations regarding planner's behavior and ways to go further
 to make it work properly?

You may also benefit from increasing the statistics targets and
running ANALYZE again. It certainly looks like some of those stats are
pretty far off. Raising the statistics target will cause ANALYZE to
investigate more rows (which takes longer but might produce more
accurate results)

I suggest experimenting with the cost settings first, though.

-- 
Bill Moran wmo...@potentialtech.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Efficient sorting the results of a join, without denormalization

2015-05-31 Thread Bill Moran
On Sun, 31 May 2015 04:50:00 -0500
Glen M. Witherington g...@fea.st wrote:
 
 On Sun, May 31, 2015, at 12:53 AM, Tom Lane wrote:
  Glen M. Witherington g...@fea.st writes:
   And here's the query I want to do, efficiently:
  
   SELECT * FROM c
 JOIN b ON b.id = c.b_id
 JOIN a ON a.id = b.a_id
   WHERE a.id = 3
   ORDER BY b.created_at DESC
   LIMIT 10
  
  At least for that dummy data, this seems sufficient:
  
  regression=# create index on b (a_id, created_at);
  CREATE INDEX
  regression=# explain analyze SELECT * FROM c
JOIN b ON b.id = c.b_id
JOIN a ON a.id = b.a_id
  WHERE a.id = 3
  ORDER BY b.created_at DESC
  LIMIT 10;
QUERY
PLAN 
  --
   Limit  (cost=0.14..21.95 rows=10 width=64) (actual time=0.064..1.176
   rows=10 loops=1)
 -  Nested Loop  (cost=0.14..436079.81 rows=20 width=64) (actual
 time=0.063..1.173 rows=10 loops=1)
   Join Filter: (b.id = c.b_id)
   Rows Removed by Join Filter: 1218
   -  Nested Loop  (cost=0.14..9.81 rows=20 width=40) (actual
   time=0.035..0.035 rows=1 loops=1)
 -  Index Scan Backward using b_a_id_created_at_idx on b 
 (cost=0.14..8.49 rows=20 width=24) (actual
 time=0.019..0.019 rows=1 loops=1)
   Index Cond: (a_id = 3)
 -  Materialize  (cost=0.00..1.07 rows=1 width=16) (actual
 time=0.013..0.013 rows=1 loops=1)
   -  Seq Scan on a  (cost=0.00..1.06 rows=1 width=16)
   (actual time=0.009..0.009 rows=1 loops=1)
 Filter: (id = 3)
 Rows Removed by Filter: 2
   -  Materialize  (cost=0.00..27230.00 rows=100 width=24)
   (actual time=0.008..0.811 rows=1228 loops=1)
 -  Seq Scan on c  (cost=0.00..16370.00 rows=100
 width=24) (actual time=0.007..0.310 rows=1228 loops=1)
   Planning time: 0.796 ms
   Execution time: 1.390 ms
  (15 rows)
  
  regards, tom lane
 
 Wow, sorry I screwed up the query. It should be:
 
 ORDER BY c.created_at DESC
 
 Not b, or as you noted its trivial to index. Sorry!

Creating an index on c.created_at sped things up by a factor of over
1000, which caused the case you defined to run in ~0.5ms for me.

-- 
Bill Moran wmo...@potentialtech.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] FW: Constraint exclusion in partitions

2015-05-23 Thread Bill Moran
On Sat, 23 May 2015 18:16:43 -0400
Daniel Begin jfd...@hotmail.com wrote:

 Hello Bill, 
 You wrote that my testing methodology is flawed - I hope you are right!  
 
 However, I am a bit confused about your comments. Yes, I did edited the name
 of the tables for clarity but if I miss the point I, I will do it again as I
 am writing without modifying anything. Here is the procedure I follow and
 results...
 
 I use pgadmin_III sql window. I write the following query (I have changed
 the id to make sure it does not use previous results still in memory)...

I didn't realize you were using PGAdmin ... that explains some of it ...
see below:

 Select * from nodes where id=345678912; -- nodes is the real partitioned
 table name
 
 Now I select explain query from the menu and I get the following result...
 Append  (cost=0.00..384.08 rows=99 width=66)
   -  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66)
 Filter: (id = 345678912)
   -  Index Scan using nodes19_idversion_pk on nodes_19  (cost=0.56..384.08
 rows=98 width=66)
 Index Cond: (id = 345678912)
 
 Now, I select run and I get one record as a result and the following
 message in history tab...
 -- Executing query:
 Select * from nodes where id=345678912; 
 Total query runtime: 62 ms.
 1 row retrieved.
 
 Now, if I use the same query on the original table using the same procedure,
 here is what I get...
 Select * from old_nodes where id=345678912; -- old_nodes is the real
 original table name
 
 Explain gives me the following
 Index Scan using nodes_idversion_pk on old_nodes  (cost=0.70..4437.15
 rows=1682 width=66)
   Index Cond: (id = 345678912)
 
 Running the query gives me the same record with the following message in
 history tab...
 -- Executing query:
 select * from old_nodes where id=345678912; 
 Total query runtime: 62 ms.
 1 row retrieved.
 
 This time, the history tab shows that both took the same time to run (an
 improvement!?)

If your environment is providing such wildly variant results, then
you need to start running multiple tests instead of assuming that a single
run of a query is indicative of a pattern.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] FW: Constraint exclusion in partitions

2015-05-23 Thread Bill Moran
..17739.18 
 rows=6726 width=66)
   Index Cond: (id = ANY 
 ('{1000,10,20,30}'::bigint[]))
 --Total query runtime: 187 ms. 4 rows retrieved
 
 select * from newtable where id 
 IN(1000,10,20,30); 
 Append  (cost=0.00..933.40 rows=223 width=66)
   -  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)
 Filter: (id = ANY 
 ('{1000,10,20,30}'::bigint[]))
   -  Index Scan using newtable01_idversion_pk on newtable_01  
 (cost=0.57..622.78 rows=156 width=66)
 Index Cond: (id = ANY 
 ('{1000,10,20,30}'::bigint[]))
 ...
   -  Index Scan using newtable85_idversion_pk on newtable_85  
 (cost=0.57..53.37 rows=9 width=66)
 Index Cond: (id = ANY 
 ('{1000,10,20,30}'::bigint[]))
 --Total query runtime: 421 ms. 4 rows retrieved
 
 
 range of ids 
 ---
 select * from oldtable where id between 152249 and 152349;
 Index Scan using oldtable_idversion_pk on oldtable  (cost=0.70..383.51 
 rows=144 width=66)
   Index Cond: ((id = 152249) AND (id = 152349))
 Total query runtime: 47 ms. 53 rows retrieved.
 
 select * from newtable where id between 152249 and 152349;
 Append  (cost=0.00..408.16 rows=104 width=66)
   -  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)
 Filter: ((id = 152249) AND (id = 152349))
   -  Index Scan using newtable51_idversion_pk on newtable_51  
 (cost=0.56..183.52 rows=46 width=66)
 Index Cond: ((id = 152249) AND (id = 152349))
   -  Index Scan using newtable52_idversion_pk on newtable_52  
 (cost=0.56..224.64 rows=57 width=66)
 Index Cond: ((id = 152249) AND (id = 152349))
 Total query runtime: 78 ms. 53 rows retrieved.
 
 
 list of ids from a select clause 
 ---
 --Subset provides 4 ids similar but not identical to the previous query
 select * from oldtable where id IN (select * from subset); 
 Nested Loop  (cost=37.45..886298.00 rows=2028512050 width=66)
   -  HashAggregate  (cost=36.75..38.75 rows=200 width=8)
 -  Seq Scan on subset  (cost=0.00..31.40 rows=2140 width=8)
   -  Index Scan using oldtable_idversion_pk on oldtable  
 (cost=0.70..4414.37 rows=1693 width=66)
 Index Cond: (id = subset.id)
 Total query runtime: 171 ms. 4 rows retrieved.
 
 select * from newtable where id IN (select * from subset)
 Nested Loop  (cost=36.75..1407672.76 rows=1935067087 width=66)
   -  HashAggregate  (cost=36.75..38.75 rows=200 width=8)
 -  Seq Scan on subset  (cost=0.00..31.40 rows=2140 width=8)
   -  Append  (cost=0.00..7020.68 rows=1749 width=66)
 -  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)
   Filter: (subset.id = id)
 -  Index Scan using newtable01_idversion_pk on newtable_01  
 (cost=0.56..151.97 rows=39 width=66)
   Index Cond: (id = subset.id)
 ...
 -  Index Scan using newtable86_idversion_pk on newtable_86  
 (cost=0.56..12.42 rows=2 width=66)
   Index Cond: (id = subset.id)
 Total query runtime: 140 ms. 4 rows retrieved.
 
 
 Using an index, not the primary key 
 --
 --Subset provides 58 group_id pointing to 5978 records in the concerned tables
 select * from oldtable where group_id IN (select * from subset)
 Nested Loop  (cost=37.33..21575715.89 rows=2028512050 width=66)
   -  HashAggregate  (cost=36.75..38.75 rows=200 width=8)
 -  Seq Scan on subset  (cost=0.00..31.40 rows=2140 width=8)
   -  Index Scan using oldtable_groupid_idx on oldtable  
 (cost=0.58..107364.99 rows=51340 width=66)
 Index Cond: (group_id = subset.id)
 Total query runtime: 3986 ms. 5978 rows retrieved.
 
 
 select * from newtable where group_id IN (select * from subset)
 Hash Join  (cost=41.25..138092255.85 rows=1935067087 width=66)
   Hash Cond: (newtable.group_id = subset.id)
   -  Append  (cost=0.00..84877869.72 rows=3870134173 width=66)
 -  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)
 -  Seq Scan on newtable_01  (cost=0.00..946235.96 rows=46526896 
 width=66)
 ...
 -  Seq Scan on newtable_86  (cost=0.00..986527.64 rows=44269664 
 width=66)
   -  Hash  (cost=38.75..38.75 rows=200 width=8)
 -  HashAggregate  (cost=36.75..38.75 rows=200 width=8)
   -  Seq Scan on subset  (cost=0.00..31.40 rows=2140 width=8)
 Execution Cancelled after 766702 ms !
 
 I tried the same with SET enable_seqscan = OFF and got an index scan of all 
 tables;
 
 
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Bill Moran


-- 
Sent via pgsql-general mailing list

Re: [GENERAL] Allowing postgresql to accept 0xff syntax for data types that it makes sense for?

2015-05-22 Thread Bill Moran
On Fri, 22 May 2015 11:02:47 -0400
Tom Lane t...@sss.pgh.pa.us wrote:

 Alban Hertroys haram...@gmail.com writes:
  On 22 May 2015 at 04:46, Bill Moran wmo...@potentialtech.com wrote:
  With all that being said, if I were to build a patch, would it be likely
  to be accepted into core?
 
  Wouldn't you also need to support similar syntax for octal numbers for
  the patch to be complete? Or are those already supported (ISTR that's
  '077' for decimal 63)?
 
 A patch that made 077 be interpreted as octal would certainly get rejected
 out of hand, because that's valid syntax right now and it doesn't mean 63.

You'll get no objection from me on that point.

 A similar objection might be raised to 0x..., as that is also valid
 syntax; it's read as 0 followed by an identifier:
 
 regression=# select 0xff;
  xff 
 -
0
 (1 row)
 
 (Yet another place where the fact that AS is optional yields surprising
 results...)  So there would be a backwards compatibility break here,
 and no you can't fix it with a GUC.  Still, it might be a small enough
 break that we could get away with it.

I hadn't even considered that issue. I really hate the fact that AS is
optional, and I'm irritated by code that omits it ... but nobody's
asking me ...

 I'm not personally very excited
 but other people might be.
 
 Other questions you'd have to think about: what is the data type of
 0x; what do you do with 0x (too big
 even for int8).  And it'd likely behoove you to check how Microsoft
 answers those questions, if you want to point to SQL Server as what's
 going to keep you out of standards-compatibility problems.  (IOW,
 if 0x ever did get standardized, the text might well match what
 SQL Server does.)

MSSQL seems to use it specifically for the equivalent of BYTEA types,
and it seems to me that should be how it works in PostgreSQL. Does
anyone watching this thread have access to a MSSQL server to verify?
If an implicit cast from a 4-byte BYTEA to int works now, then it
should work ... otherwise an explicit cast would be needed, with the
same behavior if you tried to specify a number that overflows an int
in any other way.

MySQL is more liberal in that you can use it to specify ints as
well, but I don't think MySQL is a good example of proper behavior.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Allowing postgresql to accept 0xff syntax for data types that it makes sense for?

2015-05-22 Thread Bill Moran
On Fri, 22 May 2015 12:44:40 -0400
Tom Lane t...@sss.pgh.pa.us wrote:

 Bill Moran wmo...@potentialtech.com writes:
  Tom Lane t...@sss.pgh.pa.us wrote:
  Other questions you'd have to think about: what is the data type of
  0x; what do you do with 0x (too big
  even for int8).  And it'd likely behoove you to check how Microsoft
  answers those questions, if you want to point to SQL Server as what's
  going to keep you out of standards-compatibility problems.  (IOW,
  if 0x ever did get standardized, the text might well match what
  SQL Server does.)
 
  MSSQL seems to use it specifically for the equivalent of BYTEA types,
  and it seems to me that should be how it works in PostgreSQL.
 
 Oh really?  Wow, I'd just assumed you wanted this as a way to write
 integers.  That's certainly the use-case I would have personally.
 I'm not even sure I like the idea of being able to write byteas without
 quotes --- they seem like strings to me, not numbers.

Arrgh ... it's good that you're bringing this up, but you're making me
realize that there's more to figure out than I originally thought ...
My focus had been on it being used for BYTEA columns, but there _are_
plenty of places in the code that do things like:

WHERE int_col  0x04 = 0x04

Which means that Sybase will implicitly cast that to an int, which
probably means that MSSQL will as well.

Once I take that into consideration, I start thinking that int_col
should actualy be a bit string. which means that:

WHERE bit_varying_col  0x04 = 0x04

should probably work without explicit casts as well.

  If an implicit cast from a 4-byte BYTEA to int works now, then it
  should work ... otherwise an explicit cast would be needed, with the
  same behavior if you tried to specify a number that overflows an int
  in any other way.
 
 There's no cast at all from bytea to int.  For one thing, it's quite
 unclear what endianness should be assumed for such a cast.  (To get
 unsurprising behavior from what you're describing, I think we'd have
 to use a big-endian interpretation of the bytea; but that would be
 a pain for a lot of other scenarios, or even for this case if you'd
 written a bytea of length other than 4 or 8 bytes.)

As an implicit cast, obviously anything too large to fit in the
target data type should be an error. But the subject of endianness
becomes damn important.

It's more complex than I original thought, but it still seems like it
can be done without creating idiocy in the way things are cast. I'll
think about it some more and try to come up with some more specific
use scenarios to see what behavior seems the most POLA to me.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Allowing postgresql to accept 0xff syntax for data types that it makes sense for?

2015-05-22 Thread Bill Moran
On Fri, 22 May 2015 11:27:49 -0500
Dennis Jenkins dennis.jenkins...@gmail.com wrote:

 On Fri, May 22, 2015 at 10:02 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 
  Alban Hertroys haram...@gmail.com writes:
   On 22 May 2015 at 04:46, Bill Moran wmo...@potentialtech.com wrote:
   With all that being said, if I were to build a patch, would it be likely
   to be accepted into core?

 How feasible would it be to write a network proxy, like pg_bouncer, to
 handle converting the values on the fly, so that you need to change neither
 your original code base (with respect to this issue), nor change PostgreSQL
 itself?

Certainly feasible, but absolutely undesirable. The system I'm working on
is needlessly complex as it is ... I'd rather convince my bosses to let
me rewrite 80,000 lines of code than add another compatibility shim to
the mess.

I brought it up because I saw an opportunity to benefit my employer and
the PostgreSQL community at the same time. I have about 4 fallback plans
if there's a reason not to do this one. Quite frankly, adding a compatibility
shim isn't even on that list.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Allowing postgresql to accept 0xff syntax for data types that it makes sense for?

2015-05-21 Thread Bill Moran

I'm working on a project converting a bunch of code from another database
system to PostgreSQL.

One of the issues is that the previous system accepted integers and binary
data in the same hex format as C ... i.e. 0xff

I understand that the proper way to handle this in postgres is x'ff', but
the problem is that a large number of #define values use the 0xff syntax
and these macros are used both in C code, as well as in SQL. The simple
fact is that a LOT of code does this, and correcting it all and ensuring
that the modified code is correct is a BIG job. Just to clarify some of
the complexity: there is about 80,000 lines of PL/PGSQL code that contains
these macros, then is run through the C preprocessor to substitute actual
values for them before being loaded into Postgres.

Obviously, there are many options for fixing this. One of those options is
modifying PostgreSQL to accept the 0xff syntax ... and evaluating that
option is the reason for my post.

So, one of my questions is: does anyone have an existing simple answer on
how to fix this?

My other question: is there a specific reason why PostgreSQL doesn't support
this syntax, aside from nobody has bothered to add such support? Because
I'm considering writing a patch to Postgres and submitting it, but I'm not
going to go down that path if there's a specific reason why supporting this
syntax would be _bad_. Personally, I feel like it would be a good thing, as
it seems like a lot of other database systems support it, and even though
it's not ANSI, it's pretty much the de-facto standard.

-- 
Bill wmo...@potentialtech.com



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Allowing postgresql to accept 0xff syntax for data types that it makes sense for?

2015-05-21 Thread Bill Moran
On Thu, 21 May 2015 13:57:24 -0400
Tom Lane t...@sss.pgh.pa.us wrote:

 Bill Moran wmo...@potentialtech.com writes:
  My other question: is there a specific reason why PostgreSQL doesn't support
  this syntax, aside from nobody has bothered to add such support? Because
  I'm considering writing a patch to Postgres and submitting it, but I'm not
  going to go down that path if there's a specific reason why supporting this
  syntax would be _bad_. Personally, I feel like it would be a good thing, as
  it seems like a lot of other database systems support it, and even though
  it's not ANSI, it's pretty much the de-facto standard.
 
 How many is a lot, and do any of the responsible vendors sit on the SQL
 standards committee?

Well, I've personally worked with (in addition to PostgreSQL) Microsoft
SQL Server, MySQL, and Sybase -- PostgreSQL is the only one of those 4 that
doesn't support the 0xff syntax.

I did a litle research and it appears that neither Oracle nor db2 supports
the 0xff syntax ... so not _quite_ as common as it seemed to me.

 One large concern about doing anything like this is whether future
 versions of the SQL standard might blindside us with some
 not-terribly-compatible interpretation of that syntax.  If we do something
 that is also in Oracle or DB2 or one of the other big boys, then we can
 probably rely on the assumption that they'll block anything really
 incompatible from becoming standardized ;-).

I assume that Microsoft is big enough to prevent anything that would
hurt SQL Server's compatibility from becomming a standard?

 OTOH, if the actual meaning of a lot is MySQL, I'd be pretty worried
 about this scenario.

Well, MySQL _does_ support that syntax ... but I couldn't care less. MySQL
also throws away your data instead of giving you errors and I would never
ask PostgreSQL to start behaving like that.

With all that being said, if I were to build a patch, would it be likely
to be accepted into core?

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgresql and GlassFish - cannot commit when autoCommit is enabled

2015-05-09 Thread Bill Moran
On Thu, 07 May 2015 10:07:44 +0300
?  ooo_satu...@mail.ru wrote:

 
 I have postgresql 9.4 and glassfish 4.1. Besides I use MyBatis inside EJB. 
 Now I try to make  select from table and this is what I get:
 javax.resource.spi.LocalTransactionException:Cannot commit when autoCommit is 
 enabled.
 at 
 com.sun.gjc.spi.LocalTransactionImpl.commit(LocalTransactionImpl.java:112)
 at 
 com.sun.enterprise.resource.ConnectorXAResource.commit(ConnectorXAResource.java:124)
 at 
 com.sun.enterprise.transaction.JavaEETransactionImpl.commit(JavaEETransactionImpl.java:518)
 at 
 com.sun.enterprise.transaction.JavaEETransactionManagerSimplified.commit(JavaEETransactionManagerSimplified.java:854)
 at 
 com.sun.ejb.containers.EJBContainerTransactionManager.completeNewTx(EJBContainerTransactionManager.java:719)
 at 
 com.sun.ejb.containers.EJBContainerTransactionManager.postInvokeTx(EJBContainerTransactionManager.java:503)
 at 
 com.sun.ejb.containers.BaseContainer.postInvokeTx(BaseContainer.java:4566)
 at 
 com.sun.ejb.containers.BaseContainer.postInvoke(BaseContainer.java:2074)
 at 
 com.sun.ejb.containers.BaseContainer.postInvoke(BaseContainer.java:2044)
 at 
 com.sun.ejb.containers.EJBObjectInvocationHandler.invoke(EJBObjectInvocationHandler.java:212)
 at 
 com.sun.ejb.containers.EJBObjectInvocationHandlerDelegate.invoke(EJBObjectInvocationHandlerDelegate.java:79)
 at com.sun.proxy.$Proxy312.getLsist(UnknownSource)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethod)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at 
 com.sun.corba.ee.impl.presentation.rmi.ReflectiveTie.dispatchToMethod(ReflectiveTie.java:143)
 at 
 com.sun.corba.ee.impl.presentation.rmi.ReflectiveTie._invoke(ReflectiveTie.java:173)
 at 
 com.sun.corba.ee.impl.protocol.ServerRequestDispatcherImpl.dispatchToServant(ServerRequestDispatcherImpl.java:528)
 at 
 com.sun.corba.ee.impl.protocol.ServerRequestDispatcherImpl.dispatch(ServerRequestDispatcherImpl.java:199)
 at 
 com.sun.corba.ee.impl.protocol.MessageMediatorImpl.handleRequestRequest(MessageMediatorImpl.java:1549)
 at 
 com.sun.corba.ee.impl.protocol.MessageMediatorImpl.handleRequest(MessageMediatorImpl.java:1425)
 at 
 com.sun.corba.ee.impl.protocol.MessageMediatorImpl.handleInput(MessageMediatorImpl.java:930)
 at 
 com.sun.corba.ee.impl.protocol.giopmsgheaders.RequestMessage_1_2.callback(RequestMessage_1_2.java:213)
 at 
 com.sun.corba.ee.impl.protocol.MessageMediatorImpl.handleRequest(MessageMediatorImpl.java:694)
 at 
 com.sun.corba.ee.impl.protocol.MessageMediatorImpl.dispatch(MessageMediatorImpl.java:496)
 at 
 com.sun.corba.ee.impl.transport.ConnectionImpl$1.dispatch(ConnectionImpl.java:195)
 at 
 com.sun.corba.ee.impl.transport.ConnectionImpl.read(ConnectionImpl.java:328)
 at 
 com.sun.corba.ee.impl.transport.ReaderThreadImpl.doWork(ReaderThreadImpl.java:112)
 at 
 com.sun.corba.ee.impl.threadpool.ThreadPoolImpl$WorkerThread.performWork(ThreadPoolImpl.java:497)
 at 
 com.sun.corba.ee.impl.threadpool.ThreadPoolImpl$WorkerThread.run(ThreadPoolImpl.java:540)Causedby:
  org.postgresql.util.PSQLException:Cannot commit when autoCommit is enabled.
 at 
 org.postgresql.jdbc2.AbstractJdbc2Connection.commit(AbstractJdbc2Connection.java:811)
 at 
 com.sun.gjc.spi.LocalTransactionImpl.commit(LocalTransactionImpl.java:106)...30
  more
 I even tried the following code:
 SqlSession session 
 =ConnectionFactory.getSession().openSession(false);//AUTOCOMMITList list=null;
 try{
 session.getConnection().setAutoCommit(false);
 TempMapper mapper =(TempMapper)session.getMapper(TempMapper.class);
 list=mapper.readAll();
 }catch(SQLException 
 ex){Logger.getLogger(TempBean.class.getName()).log(Level.SEVERE,null, ex);}
 finally{
 session.close();}
 However the result is the same. When I used gf+mysql+mybatis I changed 
 relaxautocommit but for postgresql driver there is no such attribute. How to 
 solve it?

Best guess, based on experience, is that somehow you're disabling autocommit on 
a different
connection from the one where you're doing the commit. I've seen this sort of 
thing happen
with connection poolers where the developers don't fully understand how the 
connection pool
works.

You could confirm this by turning on full query logging in Postgres and see 
which connection
does what. If it turns out to be the case, then you'll have sort out how your 
code is
getting confused.

If it's not the case, then I don't have any other ideas, but the full query 
logging will
also tell you whether the command to disable autocommit really is being issued 
to
PostgreSQL as well ... so still might be useful.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general

[GENERAL] RPM building tools and info missing?

2015-05-09 Thread Bill Moran

I might need to roll a custom PostgreSQL server RPM for my current
job.

Looking here:
https://wiki.postgresql.org/wiki/RPM_Packaging

The link to the specfiles and other data at
http://svn.pgrpms.org/repo/ gives a 404. I found a few other
pieces on the internet suggesting the same URL, and can't find
any information on why that sit is down or where it might have
gone to.

Is this a temporary outage? Or has the RPM data moved somewhere
else and isn't documented yet? Any help is appreciated.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] plpgsql functions organisation

2015-05-02 Thread Bill Moran
On Sat, 2 May 2015 14:07:31 -0700
Jeff Janes jeff.ja...@gmail.com wrote:

 On Sat, May 2, 2015 at 1:05 PM, Adrian Klaver adrian.kla...@aklaver.com
 wrote:
 
  On 05/02/2015 10:12 AM, Melvin Davidson wrote:
 
  AFAIK, you cannot package functions in  PostgreSQL, but it is possible
  to
  call a function from within a function.
 
  That being said, I would seriously look at how and why you are writing
  your functions
  as functions that call other functions are not very efficient.
 
 
  I am not following. That is what packaging is about, separating out 'units
  of work' so they can be combined as needed. Part of that is using existing
  functions in new functions/classes. In fact in the Postgres source I see
  this in many places. Now it is entirely possible I missed a memo, so I am
  open to a more detailed explanation of the inefficiencies involved.
 
 
 The Postgres source is written in C, not in plpgsql.  C has a good
 optimizing compiler and plpgsql doesn't.

Maybe that's a roundabout way of saying that if your functions are
complex enough to require calling sub-functions they might be
justifying being writting in C?

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] plpgsql functions organisation

2015-05-02 Thread Bill Moran
On Sat, 02 May 2015 15:06:24 -0700
Adrian Klaver adrian.kla...@aklaver.com wrote:

 On 05/02/2015 02:07 PM, Jeff Janes wrote:
  On Sat, May 2, 2015 at 1:05 PM, Adrian Klaver adrian.kla...@aklaver.com
  mailto:adrian.kla...@aklaver.com wrote:
 
  On 05/02/2015 10:12 AM, Melvin Davidson wrote:
 
  AFAIK, you cannot package functions in  PostgreSQL, but it is
  possible to
  call a function from within a function.
 
  That being said, I would seriously look at how and why you are
  writing
  your functions
  as functions that call other functions are not very efficient.
 
 
  I am not following. That is what packaging is about, separating out
  'units of work' so they can be combined as needed. Part of that is
  using existing functions in new functions/classes. In fact in the
  Postgres source I see this in many places. Now it is entirely
  possible I missed a memo, so I am open to a more detailed
  explanation of the inefficiencies involved.
 
 
  The Postgres source is written in C, not in plpgsql.  C has a good
  optimizing compiler and plpgsql doesn't.
 
 Does this actually matter?  I am a biologist that backed into computing, 
 so I realize I am weak on the fundamentals. Still the scientist in me 
 wants data backing assertions. As I understand it plpgsql works close to 
 the server and is optimized to do so.  I know writing in C would be a 
 better solution. Still is calling plpgsql functions inside plpgsql 
 really a bad thing when just considering plpgsql?

The answer to that is the same answer to so many other things: it depends.

plpgsql functions are slower than C. They also lack a lot of language
features that C has. That being said, if they're meeting your needs, then
don't worry about it. plpgsql is around because for most people, it works
well enough. There are certainly cases when you want to create very complex
logic in the database and plpgsql is liable to make that difficult. But
there are a lot of cases where having to manage pointers and a build
environment and all the things that go with C aren't justified, because
plpgsql has none of that complexity. There are advantages both ways.

The beauty of PostgreSQL is that you have both available and you
can choose whichever is best for your situation.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Regarding bytea column in Posgresql

2015-04-09 Thread Bill Moran
On Thu, 9 Apr 2015 11:03:30 +
Deole, Pushkar (Pushkar) pde...@avaya.com wrote:
 
 I have been assigned to a product that uses Postgresql 9.3 as backend 
 database. I am new to postgresql.
 The product provides chat functionality between the uses and the completed 
 chats are stored in the database table in a 'bytea' column in the form of 
 xml. When I query the data from this column I see xml file with text data. I 
 have couple of queries:
 
 1.   Is 'bytea' column intended for storing text data?

No, it's intended for storing binary data.

 2.   Typically a chat can have text data with several special characters 
 (which can be represented in multi bytes), how these characters can be stored 
 in a bytea column and retrieved back properly?

bytea won't help you here. You'll have to manage the special
characters entirely in your code. bytea gives you back the
exact same types you put in, with no changes or interpretation.

A better choice would be to use a text field with a proper
text encoding (such as utf-8).

Probably an even better choice would be to use the XML datatype
in PostgreSQL, since you say that you're storing XML anyway.

The place where people tend to get tripped up with TEXT and
XML datatypes is that they're strict. If you try to store
text in a TEXT data type that isn't valid (i.e., multi-byte
characters that aren't correct) you'll get an error and the
data won't be accepted. The same thing happens if you try to
store invalid XML in an XML field (such as XML without proper
closing tags, etc). It seems that this strictness causes a lot
of people to avoid those data types, as there seem to be a lot
of people who would rather have garbage data in their database
than actually go to the work of fixing their application.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Serializable transaction restart/re-execute

2015-04-06 Thread Bill Moran
On Mon, 6 Apr 2015 10:41:25 +0100
Filipe Pina fop...@impactzero.pt wrote:

 Hi Bill, thanks for the quick reply.
 
 I had read about SAVEPOINTs but I also read I cannot use them in PLPGSQL and
 should use BEGIN/END blocks and EXCEPTIONs.
 
 Did you check the URL I mentioned?

Yes, I did:
http://stackoverflow.com/questions/29372202/postgresql-generic-handler-for-serialization-failure

...

 But it doesn't work.. Every iteration fails with serialization_failure
 probably because the outer transaction is not rolled back and I'm not sure
 how to write this in a way I can roll it back and still have control of the
 LOOP..

Probably one of your issues is that there is no such thing as an
outer transaction. There's just a transaction. There is no nesting
of transactions, so the belief that there is an outer transaction
that can somehow be manipulated indepently of some other transaction
is leading you to try things that will never work.

I wasn't aware that SAVEPOINTs didn't work in pl/pgsql, thanks for
educating me on that point.

 I find it hard to believe that PGSQL has this amazing serializable
 isolation method but not a standard way to take advantage of it to
 automatically restart the failed transactions...

I've been over this ground before. You're thinking in such a
micro case that you haven't realized the inherent difficulty of
restarting large transactions with lots of data modification.
An RDBMS may have many tables updated within a transaction, and
transactions may do data processing completely outside of the
database, which means the only way to ensure consistency is to
notify the controlling process of the problem so it can decide
how best to respond.

So ... I dug into your problem a little more, and I think the
problem is that you're trying too hard to replicate GTM design
paradigms instead of learning the way that PostgreSQL is designed
to work.

If I were creating the functions you describe, I would ditch the
second one and simply have this:

CREATE OR REPLACE FUNCTION mytest() RETURNS integer AS $$
BEGIN
update account set balance = balance+10 where id=1 RETURNING balance;
END
$$
LANGUAGE SQL;

of course, it's unlikely that you'll ever want to wrap such a
simple query in a function, so I'm supposing that you'd want
to do something else with the old value of balance before
updating it, in which case:

CREATE OR REPLACE FUNCTION mytest() RETURNS integer AS $$
DECLARE
cc integer;
BEGIN
SELECT INTO cc balance FROM account WHERE id = 1 FOR UPDATE;

RAISE NOTICE 'Balance: %', cc;
perform pg_sleep(3);

update account set balance = cc+10 where id=1 RETURNING balance INTO cc;

return cc;
END
$$
LANGUAGE plpgsql;

The FOR UPDATE ensures that no other process can modify the
row while this one is sleeping.

Now, I understand that you want to don't want to do row locking,
but this is (again) an insistance on your part of trying to
force PostgreSQL to do things the way GTM did instead of
understanding the RDBMS way of doing things. 

Unlearn.

Keep in mind that mytest() might be called as part of a much
larger transaction that does many other things, and you can't
simply roll that back and restart it within mytest() since
mytest() doesn't know everything else that happened.

In you're case, you're trying to look at mytest() as something
that will always be used in a specific way where the
aforementioned problem won't be encountered, but you can not
guarantee that, and it doesn't hold true for all functions.

In general, it's inappropriate for a function to be able to manipulate
a transaction beyond aborting it. And the abort has to bubble up so
that other statements involved in the transaction are also notified.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Serializable transaction restart/re-execute

2015-04-03 Thread Bill Moran
On Fri, 3 Apr 2015 15:35:14 +0100
Filipe Pina filipe.p...@impactzero.pt wrote:

 Hello,
 
 I come from a GTM background and once of the transactional features there are 
 the ?Transaction Restarts?.
 
 Transaction restart is when we have two concurrent processes reading/writing 
 to the same region/table of the database, the last process to commit will 
 ?see? that the database is not the same as it was when the transaction 
 started and goes back to the beginning of the transactional code and 
 re-executes it.
 
 The closest I found to this in PGSQL is the Serializable transaction 
 isolation mode and it does seem to work well except it simply throws an error 
 (serialization_failure) instead of restarting.
 
 I?m trying to make use of this exception to implement restartable functions 
 and I have all the examples and conditions mentioned here in a question in SO 
 (without any answer so far?):
 
 http://stackoverflow.com/questions/29372202/postgresql-generic-handler-for-serialization-failure
  
 http://stackoverflow.com/questions/29372202/postgresql-generic-handler-for-serialization-failure
 
 So basically I have two questions:
 - the restartable ?wrapper? function never gets its ?DB view? refreshed once 
 it restarts, I assume it?s because of the outter transaction (at function 
 level) so it never re-reads the new values and keeps failing with 
 serialization_failure.. Any way to solve this?
 - the ideal would be to be able to define this at database level so I 
 wouldn?t have to implement wrappers for all functions.. Implementing a 
 ?serialization_failure? generic handler that would simply re-call the 
 function that threw that exception (up to a number of tries). Is this 
 possible without going into pgsql source code?

I suspect that savepoints will accomplish what you want:
http://www.postgresql.org/docs/9.4/static/sql-savepoint.html

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Would like to know how analyze works technically

2015-04-01 Thread Bill Moran
On Wed, 1 Apr 2015 04:33:07 -0700 (MST)
TonyS t...@exquisiteimages.com wrote:

 On Wed, April 1, 2015 12:18 am, Tom Lane-2 [via PostgreSQL] wrote:
 
  TonyS t...@exquisiteimages.com writes:
 
  Running analyze verbose; and watching top, the system starts out
  using no swap data and about 4GB of cached memory and about 1GB of used
  memory. As it runs, the amount of used RAM climbs, and eventually the
  used swap memory increases to 100% and after being at that level for a
  couple of minutes, the analyze function crashes and indicates server
  closed the connection unexpectedly.
 
  ANALYZE is supposed to work in a constrained amount of memory, though
  that amount depends quite a bit on what the data is and what you've got the
  statistics target set to.
 
  We've heard reports that there are memory leaks in (some versions of?)
  PostGIS's analyze support.  Whether that would apply to you would depend
  on whether you're using PostGIS.
 
  Hard to say more without a lot more concrete info about what your
  data is, what PG version you're using, etc.
 
  regards, tom lane
 
 
 Thanks for the response Tom.
 
 I am not using PostGIS. The data in my system is mostly along the lines of
 what you would see in an accounts payable, accounts receivable, and
 billing type situation. Names and addresses of individuals, information
 about billing, payments received, payments sent etc.
 
 All of my indexes are b-tree indexes.
 
 Currently, the largest individual table is 1.8GB.
 
 select version() returns:
 PostgreSQL 9.3.6 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu
 4.8.2-19ubuntu1) 4.8.2, 64-bit
 
 OS: Ubuntu 14.04.1 LTS
 
 Physical memory: 8GB
 
 The postgresql log has these entries at the crash point:
 2015-04-01 06:24:37 EDT LOG:  server process (PID 1384) was terminated by
 signal 9: Killed
 2015-04-01 06:24:38 EDT DETAIL:  Failed process was running: analyze verbose;
 2015-04-01 06:24:38 EDT LOG:  terminating any other active server processes
 
 I started this process at 11PM, so it ran for about 7.5 hours before
 crashing.
 
 Is there anything else that would be helpful?

Don't know if I'm on the right track with this, but what is
maintenance_work_mem set to on this system?

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Would like to know how analyze works technically

2015-04-01 Thread Bill Moran
On Wed, 1 Apr 2015 06:26:36 -0700 (MST)
TonyS t...@exquisiteimages.com wrote:

 On Wed, April 1, 2015 8:48 am, Bill Moran [via PostgreSQL] wrote:
 
 
  Running analyze verbose; and watching top, the system starts out
  using no swap data and about 4GB of cached memory and about 1GB of
  used memory. As it runs, the amount of used RAM climbs, and
  eventually the used swap memory increases to 100% and after being at
  that level for a couple of minutes, the analyze function crashes and
  indicates server closed the connection unexpectedly.
 
  ANALYZE is supposed to work in a constrained amount of memory, though
   that amount depends quite a bit on what the data is and what you've
  got the statistics target set to.
 
  We've heard reports that there are memory leaks in (some versions
  of?) PostGIS's analyze support.  Whether that would apply to you would
  depend on whether you're using PostGIS.
 
  Hard to say more without a lot more concrete info about what your
  data is, what PG version you're using, etc.
 
  Don't know if I'm on the right track with this, but what is
  maintenance_work_mem set to on this system?
 
 
 Hello Bill,
 
 maintenance_work_mem is set to 480MB. I haven't changed that from what
 pgtune suggested.

Doesn't seem unreasonable, so my guess isn't right.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Would like to know how analyze works technically

2015-04-01 Thread Bill Moran
On Wed, 1 Apr 2015 10:47:32 -0700 (MST)
TonyS t...@exquisiteimages.com wrote:

 On Wed, April 1, 2015 12:30 pm, Igor Neyman [via PostgreSQL] wrote:
 
  TonyS [hidden email]/user/SendEmail.jtp?type=nodenode=5844292i=0
  wrote:
 
 
 
  name,current_setting,source autovacuum,off,configuration file
  synchronous_commit,off,configuration file TimeZone,localtime,configuration
  file unix_socket_directories,/var/run/postgresql,configuration file
  wal_buffers,8MB,configuration file work_mem,1536MB,configuration file
 
 
  ---
 
 
  ?work_mem,1536MB,configuration file
 
 
 
  IIRC, your RAM is 8GB.  Your work_mem is too high.  Actual memory used
  for sorting, etc... could be multiples of work_mem setting.
 
  That could be the reason for your memory problems.  I'd suggest to set it
  to 16MB, and see if you can avoid on disk sorting.  If not - gradually
  increase work_mem.
 
  Regards,
 
 
  Igor Neyman
 
 
 
 Thanks Igor,
 
 I will try changing that. I pretty much just let pgtune set all of those
 values for me.

If pgtune set 1.5G of work_mem, then someone should file a bug report.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Autovacuum query

2015-03-26 Thread Bill Moran
On Thu, 26 Mar 2015 03:58:59 +
Mitu Verma mitu.ve...@ericsson.com wrote:
 
 We have a customer complaining about the time taken by one of the application 
 scripts while deleting older data from the log tables.
 During the deletion, customer reported that he often sees the below error and 
 because of which table size doesn?t reduce.
 
 ERROR: canceling autovacuum task
 Date: 2015-03-14 04:29:19
 Context: automatic analyze of table fm_db_Server3.mmsuper.audittraillogentry
 
 We have the following queries in this regard:
 
 -  How often is the autovacuum task invoked by postgres

As needed. Read:
http://www.postgresql.org/docs/9.4/static/routine-vacuuming.html

 -  If the task gets cancelled (while we were deleting data from the 
 database) would this task be re-invoked at a later time?

Yes. Read the above

 -  If insertion of data into a table also impact this task?

No. Vacuum operations are not triggered by INSERTs.

 -  If we can manually schedule this task to a particular time (like 
 off peak hours)?

Yes, but given the questions you're asking, you probably do not
have a good enough understanding of the situation to schedule it
correctly and will make the problem worse. You can run it manually
any time you want, but I don't recommend that you disable
autovacuum unless you have a good understanding of what you're
doing.

Let me take a guess at the problem: The table gets LOTs of inserts,
constantly, and somewhere there's a job that runs out of cron or
some similar scheduler that DELETEs a lot of those rows in a big
chunk. The DELETE process probably runs infrequently, like once
a day or even once a week because the designers thought it would
be best to get everything taken care of all at once during some
real or perceived slow period on the database.

One solution to this is to run the DELETE process more frequently,
such as every 15 minutes. In such a case, the process will run
much faster, make less changes, and require less work on the part
of autovacuum to clean up after. People frequently complain that
this will impact performance if run during normal use hours,
but in every case I've seen, nobody had actually tested to see
if that statement was true, and running smaller purges more
frequently actually solved the problem.

Another option would be to manually run vacuum after the big
DELETE runs.
See http://www.postgresql.org/docs/9.4/static/sql-vacuum.html

Don't fall into the trap of running VACUUM FULL. This is usually
a bad idea. If the client is complaining about reclaiming disk
space, start asking some hard questions: How much space is too
much? Why are you convinced that the space is wasted?
Usually the correct answer is to add more disk space, since
Postgres tends to fall into a groove with a particular table
whereby the unused space is actually being used and reclaimed
by data tuples as the data in the table changes. It's not
unusal for the table to be 2x the size of the actual data on
a heavily updated table.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Column does not exists?

2015-03-26 Thread Bill Moran
On Thu, 26 Mar 2015 14:12:36 -0300
Leonardo M. Ramé l.r...@griensu.com wrote:

 Ok, I have this table:
 
 CREATE TABLE sessions
 (
SESSIONID integer NOT NULL,
SESSIONTIMESTAMP character varying(45) NOT NULL,
SESSIONDATA character varying(200) DEFAULT NULL::character varying,
CONSTRAINT sessions_pkey PRIMARY KEY (SESSIONID)
 )
 
 Now, when I do:
 
 DELETE From sessions WHERE SESSIONTIMESTAMP  '2010-01-01 10:02:02'
 
 I get:
 
 ERROR:  column sessiontimestamp does not exist
 LINE 1: DELETE From sessions WHERE SESSIONTIMESTAMP  '2010-01-01 10...
 ^
 ** Error **
 
 ERROR: column sessiontimestamp does not exist
 SQL state: 42703
 Character: 28
 
 But if I do:
 
 DELETE From sessions WHERE SESSIONTIMESTAMP  '2010-01-01 10:02:02'
 
 It DOES work.
 
 Why the db doesn't recognize the name of the table without quotes?.

See:
http://www.postgresql.org/docs/9.4/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL-related legal question

2015-03-11 Thread Bill Moran
On Wed, 11 Mar 2015 13:50:02 +0100
Dorian Hoxha dorian.ho...@gmail.com wrote:

 I don't see how it could have negative impact on the postgresql project?
 It's not like your job will be to find vulnerabilities and not disclose
 them ?

I don't think I should discuss the particulars of the situation on
the list. That's why I'm just looking for a lawyer who understands
the situation and can advise me.

 
 On Wed, Mar 11, 2015 at 1:28 PM, Bill Moran wmo...@potentialtech.com
 wrote:
 
 
  I've been asked to sign a legal document related to a PostgreSQL-
  related job opening. I have concerns about the document and that
  signing it could have a negative impact on the PostgreSQL project
  (in addition to personal concerns).
 
  I'm guessing I'm not the first person to go through this. I'm
  hoping someone on this list can refer me to a lawyer who is
  familiar with the challenges of NDAs and open source projects.
 
  I'm not asking for pro-bono, I'm willing to pay for services,
  but I just figured that I might get better results getting a
  referral than by contacting $random_legal_service.
 
  --
  Bill Moran
 
 
  --
  Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-general
 


-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] PostgreSQL-related legal question

2015-03-11 Thread Bill Moran

I've been asked to sign a legal document related to a PostgreSQL-
related job opening. I have concerns about the document and that
signing it could have a negative impact on the PostgreSQL project
(in addition to personal concerns).

I'm guessing I'm not the first person to go through this. I'm
hoping someone on this list can refer me to a lawyer who is
familiar with the challenges of NDAs and open source projects.

I'm not asking for pro-bono, I'm willing to pay for services,
but I just figured that I might get better results getting a
referral than by contacting $random_legal_service.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] VACUUM FULL doesn't reduce table size

2015-03-09 Thread Bill Moran
On Mon, 9 Mar 2015 09:05:07 -0700 (MST)
pinker pin...@onet.eu wrote:

 
  So there are no longer any dead rows being left behind, right?
  
  Why are we still discussing this?  Do you have some other question?
 
 There are no dead rows, but postgres still cannot reuse the space because of
 3043947 nonremovable row versions ..
 
 INFO:  vacuuming my_table
 INFO:  my_table: found 0 removable, 3043947 nonremovable row versions in
 37580 pages
 DETAIL:  0 dead row versions cannot be removed yet.
 CPU 2.67s/1.59u sec elapsed 7.71 sec.
 Query returned successfully with no result in 8319 ms.

Given your weird description of the snapshot I wouldn't be surprised if
that instance of PostgreSQL had subtle corruption.

pg_dump that database, re-init it and reload the dump. Then recreate the
situation and see if the rows are sill nonremovable. I bet you $5.34 that
everything works fine after that, which would indicate that the folks
who made the snapshot didn't do it correctly.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


  1   2   3   4   5   6   7   8   9   >