Re: [GENERAL] explain plan visibility
高健 wrote: Please let me dig it a little more: I think that when a session is accessing a postgersql table. It will be influenced by the followings: Really old data (needed to be vacuumed, eg: old image at one day ago). Recent data (committed and uncommitted), because they are all in the data block. Isn’t it strange that I have to access my data among somebody’s un-decided data? How if there is a busy system having a table accessed by so many sessions at same time? They will all be slowed down because of uncommitted but flushed out data, I think. I hope in future the architecture of PostgreSQL can put the committed data uncommitted data apart, Or even put them in separate physical disks.That will Help to improve performance I think. I don't think that this design choice will ever be changed. You'll always be influenced by other people's data, no matter how you handle MVCC. Let's imagine a design where yet uncommitted data are stored somewhere else. How do you handle the following problems: 1) The space for uncommitted data runs out. That will keep you from doing any transaction exceeding the size of this storage space. Note that Oracle suffers from that problem (though they do the opposite of what you propose: they store uncommitted data in the table and move old data to the undo space). 2) Every COMMIT will require that everything is moved from the uncommited space to the regular table. So you'd double the I/O cost for the normal case (I guess that's the reason why Oracle does it the other way round). And you wouldn't even get rid of the most unpleasant side effect of the way that PostgreSQL manages MVCC, the need to VACUUM away old tuples. As for Oracle's approach to MVCC, which I think is better than yours, it has its pros and cons, but I don't think it is demonstrably better than PostgreSQL's. No matter how you implement MVCC, you'll have to pay the price for the benefits somehow. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Understanding streaming replication
I'll try to answer the questions I can. Pawel Veselov wrote: I've been struggling with understanding all the necessary pieces for streaming replication. So I put down the pieces as I did understand them, and would appreciate if you guys could point out any of the stuff I understood or have done wrong. The set up is pgpool + streaming replication + hot stand by. No load balancing, stand-by nodes will not receive any application queries (I don't have that big of a query load, and I don't want to risk inconsistent reads). There are no shared file systems, but there is a way to rsync/scp files between nodes. Fail-over is automatic, and should kick in within reasonably small period after master failure. 1. Archiving. Should be turned on on all the nodes. The archive command should copy the archive file to the local archive directory, and rsync archive directory between all the nodes. My understanding is that archiving is necessary if a stand-by node ever missed enough WAL updates to need an old enough WAL that might have been removed from pg_xlog. You don't give details about how the rsync is triggered, but I'd advise against having rsync as part of archive_command. First, it is slow and if there is a lot of activity, the archiver will not be able to keep up. Second, if rsync fails, the WAL file will not be considered archived. Both these things will keep the WAL files from being deleted from pg_xlog. I'd schedule rsync as a cron job or similar. QUESTION: After the failover, the new master will start archiving its WAL files. These archived WALs will not collide in any way with the archived WALs generated by previous master(s)? They will not, because the standby starts a new time line when it is promoted to primary, which will result in new WAL file names. QUESTION: What is a good policy for archive clean up? From the perspective to only remove archive files that are guaranteed to never be required by any nodes. You cannot tell from the primary's side. Since you also need the archives to restore an online backup, I'd keep them a long as your backup policy dictates. I hope you don't rely on standby databases for backup (just imagine an accidental DROP TABLE that gets propagated to all standbys withing seconds). 2. Failover. On master failure, pgpool will automatically select a new master, and degenerate all other nodes. The cluster is now in the emergency state and requires manual intervention for reconfiguration and recovery. pgpool executes a script to promote a node, that script will create a trigger file on a newly selected master node, and postgres will exist stand-by mode. QUESTION: If multiple pgpools are running, and if there are no network problems, and configuration files are identical, is there any guarantee that the same stand-by node will be selected for promotion? Concern here is that with configuration of (M-SB0-SB1) one pgpool decides to promote SB0 and another - SB1, causing both of them to enter master mode, and splitting the cluster. It does look that pgpool will always select next alive node for promotion, but I couldn't find a definitive statement on that. I don't know about pgpool and its abilities to handle cluster failover, but I wouldn't go this way at all. Even if the answer were that in the circumstances you describe things would work, you can depend on it that things will go wrong in ways different from what you expect, e.g. a broken network card. The consequences would be worse than I'd like to imagine. If you want reliable automatic failover, consider cluster software. 3. Recovery. That part is a bit confusing. The majority of the documentation says that in this case, the node should be re-loaded from the base backup, obtained from the master. I'm not sure why this is necessary, if there are enough archived WALs. Because of the new time line; streaming replication cannot (yet) recover across a time line change. QUESTION: Is there any metric to understand whether hauling base will be slower/faster than replaying missed WALs? Anyway, pgpool only has one recovery mechanism, and it does invoke a base restore from whatever current master is. PROBLEM: This I see as a problem. The only way that I see to re-attach a node to the pgpool, short of restarting it, is to call pcp_recovery_node. This will make the master take a base back up, push it to the stand-by that needs recovery, and re-start the stand-by node. I am not sure if there is a good way to check if that node has already been recovered. That because if there are more than 2 pgpools, they both will attempt to recover the same stand-by, and this will probably get ugly. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] PostgreSQL and a clustered file system
Hello, Is anyone running PostgreSQL on a clustered file system on Linux? By clustered I actually mean shared, such that the same storage is mounted by different servers at the same time (of course, only one instance of PostgreSQL on only one server can be running on such a setup, and there are a lot of other precautions that need to be satisfied). signature.asc Description: OpenPGP digital signature
Re: [GENERAL] integer instead of 'double precision'?
On Fri, Sep 9, 2011 at 5:09 PM, Guillaume Lelarge guilla...@lelarge.infowrote: You divide an integer with an integer, that should give you an integer. Can you tell me the reasoning behind that idea? Is it a rule that the output type of an operator must equal the input type? In this case that doesn't seem locigal. I think that the / operator should return something that allows fractions, since the operator creates fractions so frequently. If you should need it to be an integer, e.g. when you update an integer column, casting should be done just-in-time. But i don't know much about the internals and the reasoning behind these matters, i would be grateful if you could explain. Cheers, WBL -- Quality comes from focus and clarity of purpose -- Mark Shuttleworth
Re: [GENERAL] integer instead of 'double precision'?
On Tue, Nov 13, 2012 at 12:16 AM, Willy-Bas Loos willy...@gmail.com wrote: On Fri, Sep 9, 2011 at 5:09 PM, Guillaume Lelarge guilla...@lelarge.info wrote: You divide an integer with an integer, that should give you an integer. Can you tell me the reasoning behind that idea? Is it a rule that the output type of an operator must equal the input type? In this case that doesn't seem locigal. I think that the / operator should return something that allows fractions, since the operator creates fractions so frequently. This is an argument that comes up regularly on the Python list, partly because version 2 had int/int - int, but version 3 declared that int/int - float made more sense. One of the problems of going to floating point is that it's not a superset of integers - especially not when your integer type supports arbitrary precision. It might seem obvious that 7/2 should yield 3.5 and not 3, but what about when the numbers are so large that you lose precision by going float? Or are there to be some cases where int/int makes float and some where it makes int? That would be nicely confusing. I'm generally happy with either behaviour, as long as its consistent, and as long as it can be overridden with an explicit type cast when the other is needed. ChrisA -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Can dml realize the partition table's rule and make good execution plan?
Craig Ringer cr...@2ndquadrant.com writes: It'd be really interesting to enhance the query planner to be smarter about this particular case, At least for this particular example, it's not at all clear to me that it'd be worth the cost. Getting rid of accesses to tables altogether is worthwhile, but avoiding one very-low-cost arm of a BitmapOr is less so. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] PSA: XFS and Linux Cache Poisoning
Hey everyone, We recently got bit by this, and I wanted to make sure it was known to the general community. In new(er) Linux kernels, including late versions of the 2.6 tree, XFS has introduced dynamic speculative preallocation. What does this do? It was added to prevent filesystem fragmentation by preallocating a large chunk of memory to files so extensions to those files can go on the same allocation. The dynamic part just means it adjusts the size of this preallocation based on internal heuristics. Unfortunately, they also changed the logic in how this extra space is tracked. At least in previous kernels, this space would eventually be deallocated. Now, it survives as long as there are any in-memory references to a file, such as in a busy PG database. The filesystem itself sees this space as used and will be reported as such with tools such as df or du. How do you check if this is affecting you? du -sm --apparent-size /your/pg/dir; du -sm /your/pg/dir If you're using XFS, and there is a large difference in these numbers, you've been bitten by the speculative preallocation system. But where does it go while allocated? Why, to your OS system cache, of course. Systems with several GB of RAM may experience extreme phantom database bloat, because of the dynamic aspect of the preallocation system, So there are actually two problems: 1. Data files are reported as larger than their actual size and have extra space around just in case. Since PG has a maximum file size of 1GB, this is basically pointless. 2. Blocks that could be used for inode caching to improve query performance are reserved instead for caching empty segments for XFS. The first can theoretically exhaust the free space on a file system. We were seeing 45GB(!) of bloat on one of our databases caused directly by this. The second, due to the new and improved PG planner, can result in terrible query performance and high system load since the OS cache does not match assumptions. So how is this fixed? Luckily, the dynamic allocator can be disabled by choosing an allocation size. Add allocsize to your mount options. We used a size of 1m (for 1 megabyte) to retain some of the defragmentation benefits, while still blocking the dynamic allocator. The minimum size is 64k, so some experimentation is probably warranted. This mount option *is not compatible* with the remount mount option, so you'll need to completely shut everything down and unmount the filesystem to apply. We spent days trying to track down the reason our systems were reporting a load of 20-30 after a recent OS upgrade. I figured it was only fair to share this to save others the same effort. Good luck! -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PSA: XFS and Linux Cache Poisoning
On Nov 12, 2012, at 7:37 AM, Shaun Thomas wrote: Hey everyone, We recently got bit by this, and I wanted to make sure it was known to the general community. In new(er) Linux kernels, including late versions of the 2.6 tree, XFS has introduced dynamic speculative preallocation. What does this do? It was added to prevent filesystem fragmentation by preallocating a large chunk of memory to files so extensions to those files can go on the same allocation. The dynamic part just means it adjusts the size of this preallocation based on internal heuristics. Unfortunately, they also changed the logic in how this extra space is tracked. At least in previous kernels, this space would eventually be deallocated. Now, it survives as long as there are any in-memory references to a file, such as in a busy PG database. The filesystem itself sees this space as used and will be reported as such with tools such as df or du. How do you check if this is affecting you? du -sm --apparent-size /your/pg/dir; du -sm /your/pg/dir If you're using XFS, and there is a large difference in these numbers, you've been bitten by the speculative preallocation system. But where does it go while allocated? Why, to your OS system cache, of course. Systems with several GB of RAM may experience extreme phantom database bloat, because of the dynamic aspect of the preallocation system, So there are actually two problems: 1. Data files are reported as larger than their actual size and have extra space around just in case. Since PG has a maximum file size of 1GB, this is basically pointless. 2. Blocks that could be used for inode caching to improve query performance are reserved instead for caching empty segments for XFS. The first can theoretically exhaust the free space on a file system. We were seeing 45GB(!) of bloat on one of our databases caused directly by this. The second, due to the new and improved PG planner, can result in terrible query performance and high system load since the OS cache does not match assumptions. So how is this fixed? Luckily, the dynamic allocator can be disabled by choosing an allocation size. Add allocsize to your mount options. We used a size of 1m (for 1 megabyte) to retain some of the defragmentation benefits, while still blocking the dynamic allocator. The minimum size is 64k, so some experimentation is probably warranted. This mount option *is not compatible* with the remount mount option, so you'll need to completely shut everything down and unmount the filesystem to apply. We spent days trying to track down the reason our systems were reporting a load of 20-30 after a recent OS upgrade. I figured it was only fair to share this to save others the same effort. Good luck! Oh hey, I've been wondering for a while why our master dbs seem to be using so much more space than their slaves. This appears to be the reason. Thanks for the work in tracking it down! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Understanding streaming replication
Hello all, I read this thread with interest but I still have some questions about cascading replication as you describe it. Le 12/11/2012 10:36, Albe Laurenz a écrit : I'll try to answer the questions I can. 3. Recovery. That part is a bit confusing. The majority of the documentation says that in this case, the node should be re-loaded from the base backup, obtained from the master. I'm not sure why this is necessary, if there are enough archived WALs. Because of the new time line; streaming replication cannot (yet) recover across a time line change. I'm setting up a 3 nodes cluster and after some tests I just discover that the cascading slave does not recover. As far as I can see in the 9.2 documentation it should work after an automatic reconnect to the new master. Is there any chance to get this fixed in 9.2.x ? In case of disaster on master and on standby, can I just restart the cascading slave after removing recovery.conf ? Would it be better to copy all archives log from the master in pg_xlog on the third node and then restart it ? What is the best way to get back this node with minimal loss? Thanks for your advice Regards -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL and a clustered file system
Hello Ivan, this sounds so mainframe-ish, i recall, in IBM MVS (circa 1990+) we used to attach two systems to the same DASDie storage, and then employ disk serialization provided by the OS to achieve some integrity to the data. (do not get me wrong i had adequate Unix/SUNOS/Ultrix experience before i had to go through all that MVS stuff, which made it even more painful!). On ÎÎµÏ 12 Îοε 2012 11:03:14 Ivan Voras wrote: Hello, Is anyone running PostgreSQL on a clustered file system on Linux? By clustered I actually mean shared, such that the same storage is mounted by different servers at the same time (of course, only one instance of PostgreSQL on only one server can be running on such a setup, and there are a lot of other precautions that need to be satisfied). - Achilleas Mantzios IT DEPT -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] FATAL: sorry, too many clients already when I start the replication
I have strange problem. I am trying to achieve streaming replication between 2 PostgreSQL servers with version 9.2.1. The replications worked just fine then the servers was without load. The problem is that now then the servers are loaded I cannot start the replication without receiving this error: FATAL: sorry, too many clients already The slave server is out of the question and is stopped. I am changing only the master configuration from this setup (I will paste only the end of the file postgresql.conf because the other stuff are with default params except the listen configuration): #-- # CUSTOMIZED OPTIONS #-- # Add settings for extensions here Postgre replication #archive_command = 'cp -i %p /data/PostgreSQL/PostgresArchive/%f' #archive_mode = on #max_wal_senders = 3 # Maximum 'wal_senders', processes responsible for managing a connection with a standby server #wal_keep_segments = 32 # How many WAL segments (=files) should be kept on the primary, whatever may happen (you won't have to copy them manually on the standby if the standby gets too far behind) #wal_level = hot_standby #synchronous_standby_names = 'sync_replication' # max_locks_per_transaction = 500 # #-- # pgtune wizard run on 2012-10-04 # Based on 65958512 KB RAM in the server #-- default_statistics_target = 100 maintenance_work_mem = 1GB checkpoint_completion_target = 0.9 effective_cache_size = 44GB work_mem = 384MB wal_buffers = 8MB checkpoint_segments = 16 shared_buffers = 15GB max_connections = 80 With this setup everything is OK and no errors are observed. The command: netstat -aonlp | grep 5432 | wc -l return average 30 connections to PostgreSQL. When I change the configuration to: #-- # CUSTOMIZED OPTIONS #-- # Add settings for extensions here Postgre replication archive_command = 'cp -i %p /data/PostgreSQL/PostgresArchive/%f' archive_mode = on max_wal_senders = 3 # Maximum 'wal_senders', processes responsible for managing a connection with a standby server wal_keep_segments = 32 # How many WAL segments (=files) should be kept on the primary, whatever may happen (you won't have to copy them manually on the standby if the standby gets too far behind) wal_level = hot_standby synchronous_standby_names = 'sync_replication' # max_locks_per_transaction = 500 # #-- # pgtune wizard run on 2012-10-04 # Based on 65958512 KB RAM in the server #-- default_statistics_target = 100 maintenance_work_mem = 1GB checkpoint_completion_target = 0.9 effective_cache_size = 44GB work_mem = 384MB wal_buffers = 8MB checkpoint_segments = 16 shared_buffers = 15GB max_connections = 80 In the log I can see this error FATAL: sorry, too many clients already spamming the log non stop and netstat -aonlp | grep 5432 | wc -l return 250 connections to PostgreSQL. I tried to change max_connections = 80 to max_connections = 300 but this do not help because for short time netstat -aonlp | grep 5432 | wc -l return 700 connections to PostgreSQL. The slave server is stopped all the time. What is causing this problem and why the server work just fine without replication configuration and if I add them the server freak out ?
Re: [GENERAL] Understanding streaming replication
On Mon, Nov 12, 2012 at 1:36 AM, Albe Laurenz laurenz.a...@wien.gv.atwrote: I'll try to answer the questions I can. Thank you! Pawel Veselov wrote: I've been struggling with understanding all the necessary pieces for streaming replication. So I put down the pieces as I did understand them, and would appreciate if you guys could point out any of the stuff I understood or have done wrong. The set up is pgpool + streaming replication + hot stand by. No load balancing, stand-by nodes will not receive any application queries (I don't have that big of a query load, and I don't want to risk inconsistent reads). There are no shared file systems, but there is a way to rsync/scp files between nodes. Fail-over is automatic, and should kick in within reasonably small period after master failure. 1. Archiving. Should be turned on on all the nodes. The archive command should copy the archive file to the local archive directory, and rsync archive directory between all the nodes. My understanding is that archiving is necessary if a stand-by node ever missed enough WAL updates to need an old enough WAL that might have been removed from pg_xlog. You don't give details about how the rsync is triggered, but I'd advise against having rsync as part of archive_command. First, it is slow and if there is a lot of activity, the archiver will not be able to keep up. Second, if rsync fails, the WAL file will not be considered archived. Both these things will keep the WAL files from being deleted from pg_xlog. I'd schedule rsync as a cron job or similar. From your later comments, it's also apparent that these archived WALs will be useless after failover (for the purpose of recovery), so there is no reason to send them to all the nodes after all. QUESTION: After the failover, the new master will start archiving its WAL files. These archived WALs will not collide in any way with the archived WALs generated by previous master(s)? They will not, because the standby starts a new time line when it is promoted to primary, which will result in new WAL file names. QUESTION: What is a good policy for archive clean up? From the perspective to only remove archive files that are guaranteed to never be required by any nodes. You cannot tell from the primary's side. Since you also need the archives to restore an online backup, I'd keep them a long as your backup policy dictates. I hope you don't rely on standby databases for backup (just imagine an accidental DROP TABLE that gets propagated to all standbys withing seconds). I don't relay on stand-by's for back up. But that timeline establishment business is a key piece that I didn't realize. 2. Failover. On master failure, pgpool will automatically select a new master, and degenerate all other nodes. The cluster is now in the emergency state and requires manual intervention for reconfiguration and recovery. pgpool executes a script to promote a node, that script will create a trigger file on a newly selected master node, and postgres will exist stand-by mode. QUESTION: If multiple pgpools are running, and if there are no network problems, and configuration files are identical, is there any guarantee that the same stand-by node will be selected for promotion? Concern here is that with configuration of (M-SB0-SB1) one pgpool decides to promote SB0 and another - SB1, causing both of them to enter master mode, and splitting the cluster. It does look that pgpool will always select next alive node for promotion, but I couldn't find a definitive statement on that. I don't know about pgpool and its abilities to handle cluster failover, but I wouldn't go this way at all. Even if the answer were that in the circumstances you describe things would work, you can depend on it that things will go wrong in ways different from what you expect, e.g. a broken network card. The consequences would be worse than I'd like to imagine. I would imagine this situation will happen in any case, I don't logically see how it's avoidable. If you only have one agent that has power to promote a node to be a new master, you have SPF. If you have multiple agents that can do the promotion, there is always a risk that they fall out of sync. If you want reliable automatic failover, consider cluster software. Anything you could please recommend? [skipped] Yours, Laurenz Albe
Re: [GENERAL] Understanding streaming replication
On Mon, Nov 12, 2012 at 10:11 AM, Pawel Veselov pawel.vese...@gmail.comwrote: On Mon, Nov 12, 2012 at 1:36 AM, Albe Laurenz laurenz.a...@wien.gv.atwrote: I'll try to answer the questions I can. Thank you! Pawel Veselov wrote: I've been struggling with understanding all the necessary pieces for streaming replication. So I put down the pieces as I did understand them, and would appreciate if you guys could point out any of the stuff I understood or have done wrong. The set up is pgpool + streaming replication + hot stand by. No load balancing, stand-by nodes will not receive any application queries (I don't have that big of a query load, and I don't want to risk inconsistent reads). There are no shared file systems, but there is a way to rsync/scp files between nodes. Fail-over is automatic, and should kick in within reasonably small period after master failure. 1. Archiving. Should be turned on on all the nodes. The archive command should copy the archive file to the local archive directory, and rsync archive directory between all the nodes. My understanding is that archiving is necessary if a stand-by node ever missed enough WAL updates to need an old enough WAL that might have been removed from pg_xlog. You don't give details about how the rsync is triggered, but I'd advise against having rsync as part of archive_command. First, it is slow and if there is a lot of activity, the archiver will not be able to keep up. Second, if rsync fails, the WAL file will not be considered archived. Both these things will keep the WAL files from being deleted from pg_xlog. I'd schedule rsync as a cron job or similar. From your later comments, it's also apparent that these archived WALs will be useless after failover (for the purpose of recovery), so there is no reason to send them to all the nodes after all. I obviously lost it here. The archives do need to be synchronized, for the purpose of recovering slaves. If a slave dies, and I want to recover it, it may need the archived WALs, and for this, the archives should be available on the node. So, rsync (or something like that) is necessary. But it's a bad idea to run the rsync from the archive command itself.
Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
On 11/10/2012 02:21 PM, Jeff Janes wrote: On Fri, Nov 9, 2012 at 4:28 PM, Lists li...@benjamindsmith.com wrote: ... 3) For each of the tables from #2, run the commands REINDEX TABLE $table; VACUUM FULL ANALYZE $table; The end result is a squeaky-clean database server with expected disk usage. NOTES: ... 2) It was sheer chance that I discovered the need to reindex prior to vacuum in order to get the disk space back. As of 9.0, a vacuum full inherently does a reindex, so doing an explicit one is neither necessary nor beneficial. I don't know if your discovery is based on a non-full vacuum, or on an older server. I can only state that merely doing a vacuum full or vacuum full $tables sequentially did not free the space, whereas the sequential reindex $table, each followed immediately by a vacuum full $table) did. If you'd like I can easily recreate the scenario by simply not cleaning up one of the DB servers until it bloats up and make available (limit distribution) a binary copy of the database (EG: rsync the /var/lib/pgsql/ filesystem late at night) in order to help identify why it didn't work as expected. 5) I don't yet know if the full option for the vacuum is necessary to free up all space. I will experiment with this and post results if useful. The answer to this is mostly non-deterministic. non-full vacuum can only free space from the end of the table. If all of your long-lived objects were created before pg_attribute got bloated and so the bloat was due only to short-lived objects, then non-full vacuum (if run often enough) should eventually be able to return that space as the short-lived objects near the end start to go away. However, if even a single long-live object finds itself at the end of the table, then only a vacuum full will ever be able to reclaim that space. Since the time period involved (weeks/months) would have included both a large number of created/destroyed temp tables and occasionally altered persistent objects it would appear that the full option a very good idea, at least periodically. -Ben -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Enabling Autovacuum Postgres 9.1 (was Unexpectedly high disk space usage)
The good news is that we have now resolved our critical problem (disk space overuse) with a somewhat hackish, slow answer that is nonetheless good enough for now. Now I'd like to work out how to get autovacuum to work smoothly within our cluster. I'm happy to try to clarify my notes and post them either here or on the PG wiki as I get them resolved. There's a wealth of how to tune PG instruction that's old and (based on this thread alone) often stale enough to be classified as disinformative. For example, nearest I can tell, the entirety of this page is just wrong and/or irrelevant for 9.x and up: http://wiki.postgresql.org/wiki/VACUUM_FULL. In constrast to the advice to avoid Vacuum Full ( http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT) comes the insight that vacuum full is necessary to clean up stale data that is not at the end of the table. (See Jeff Janes 11/10/2012 email) non-full vacuum can only free space from the end of the table. This would imply that a full analyze is a good idea, at least periodically (weekly/monthly/quarterly) in a database that combines the use of temp tables and periodic changes to persistent objects. Does autovacuum ever do a full analyze? What about autovacuum and the reindex question at the end of this email? On 11/10/2012 02:49 PM, Adrian Klaver wrote: Seems to have changed in 8.3: http://www.postgresql.org/docs/8.3/interactive/routine-vacuuming.html Beginning in PostgreSQL 8.3, autovacuum has a multiprocess architecture: There is a daemon process, called the autovacuum launcher, which is in charge of starting autovacuum worker processes for all databases. The launcher will distribute the work across time, but attempt to start one worker on each database every autovacuum_naptime seconds. One worker will be launched for each database, with a maximum of autovacuum_max_workers processes running at the same time... Sadly, this change means that I can no be certain of the utility of the otherwise excellent-sounding advice originally offered by Scott, quoted below. It appears that naptime is (as of 9.x) almost irrelevant since it's defined per database, and dropping this from 1 minute to 5 seconds would have very little overall impact. These two can be killers. Long running transactions can cause autovacuum processes to stall out or be autocancelled. Long running transactions - is now long? In our system it's rare to have a transaction (even a prepared transaction) last much longer than a few minutes. Is that enough time to cause problems with AutoVacuum? As well, since the default nap time is 1 minute, it will take at least 50 minutes to vacuum each db as nap time is how long autovac waits between databases. Reducing autovacuum nap time to 5 or 10 seconds would be a good move here, also possibly making it more aggressive by increasing max worker threads, decreasing cost delay (possibly to zero or close to it) and / or increasing cost limit. After making such a change then watching iostat when vacuum is running to see how hard its hitting your IO subsystem. I'm guessing that with SSDs it isn't gonna be a big problem. As Greg Smith has pointed out in the past, usually the answer to an autovacuum problem is making it more, not less aggressive. Unless you're flooding your IO this is almost always the right answer. Keep in mind that autovacuum by default is setup to be VERY unaggressive because it may be running on a netbook for all it knows. To tune autovacuum with 50 databases, start by dropping nap time to something much lower, like 10s. Then if you need to, drop cost delay until you get to 0. If you get to 0 and it's still not hitting your IO too hard, but not keeping up, then increase cost limit. If you get to something in the 5000 to 1 range, and its still not keeping up then start bumping the thread count Should I increase the max_workers field from the default of 3 to (perhaps) 10? Noting that my solution to the disk space problem is effectively a max_worker of 1 since it's all done sequentially, I wonder if reducing max_workers would actually be better? Also, what's the thread count ? Is that max_workers? Why would I want to reduce the cost delay to 0, and how does this relate to cost_limit? Careful reading of the docs: http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html makes me believe that, given my substantial I/O subsystem, I'd want to drop cost_delay to near zero and set the cost_limit really high, which is a rough restatement of the last quoted paragraph above. (I think) Assuming that I make these suggestions and notice a subsequent system load problem, what information should I be gathering in order to provide better post-incident forensics? We have statistics turned on, and aren't using replication. (yet) Lastly, there's the question of reindexing before full vacuum. I've observed that not doing a manual reindex prior to
[GENERAL] plpgsql cursor reuse
Came across this problem when trying to assign to a variable a field from a record that could come from multiple cursors. PG throws an error - ERROR: type of parameter 7 (bigint) does not match that when preparing the plan (unknown). If I make the null column in c1 null::bigint to match cursor c2, it works fine. Where is this plan coming from? Why would it match c1 to a plan coming from c2? In reality, the two cursors in question are wildly different- a join of about 10 completely different tables. When I saw the text of the error I was a bit concerned that it was being overly flexible in matching the current cursor to another. It errors out on the assignment to I, not the fetch. (maybe the fetch isn't actually being done until the data in r is used). CREATE OR REPLACE FUNCTION demo.test_cursor_bug ( a IN integer ) RETURNS void AS $BODY$ DECLARE c1 cursor FOR SELECT 1 as shipmentid, null as olmid; c2 cursor FOR SELECT 2 as shipmentid, 32::bigint as olmid; r record; i bigint; BEGIN IF ( a = 0 ) THEN open c1; fetch c1 INTO r; close c1; END IF; IF ( a = 1 ) THEN open c2; fetch c2 INTO r; close c2; END IF; i := r.olmid; END; $BODY$ LANGUAGE plpgsql; select demo.test_cursor_bug(0); select demo.test_cursor_bug(1);
Re: [GENERAL] plpgsql cursor reuse
From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of David Greco Sent: Monday, November 12, 2012 3:35 PM To: pgsql-general@postgresql.org Subject: [GENERAL] plpgsql cursor reuse Came across this problem when trying to assign to a variable a field from a record that could come from multiple cursors. PG throws an error - ERROR: type of parameter 7 (bigint) does not match that when preparing the plan (unknown). If I make the null column in c1 null::bigint to match cursor c2, it works fine. Where is this plan coming from? Why would it match c1 to a plan coming from c2? In reality, the two cursors in question are wildly different- a join of about 10 completely different tables. When I saw the text of the error I was a bit concerned that it was being overly flexible in matching the current cursor to another. It errors out on the assignment to I, not the fetch. (maybe the fetch isn't actually being done until the data in r is used). CREATE OR REPLACE FUNCTION demo.test_cursor_bug ( a IN integer ) RETURNS void AS $BODY$ DECLARE c1 cursor FOR SELECT 1 as shipmentid, null as olmid; c2 cursor FOR SELECT 2 as shipmentid, 32::bigint as olmid; r record; i bigint; BEGIN IF ( a = 0 ) THEN open c1; fetch c1 INTO r; close c1; END IF; IF ( a = 1 ) THEN open c2; fetch c2 INTO r; close c2; END IF; i := r.olmid; END; $BODY$ LANGUAGE plpgsql; select demo.test_cursor_bug(0); select demo.test_cursor_bug(1); i := r.olmid; I'm guessing that the system evaluates the fact that since i is a bigint that r.olmid must be a bigint as well. When it goes to find and retrieve olmid at parameter position 7 it expects to find a bigint but instead finds an undefined type with a NULL and so it throws an error. The only solution to your example is to cast the explicit NULL to a bigint. In your real-life situation there are possible other options but generally speaking you need to remove the ambiguity explicitly. The exact mechanics behind why this specific message is being thrown is beyond me but regardless the fundamental reason and solution remain constant. You would get a different but similar message if you avoided cursors and used direct SQL instead. The use of cursors just causes a slightly different ordering of operations to occur. But, it is not match[ing] c1 to a plan coming from c2 but rather it is matching the plan for c1 to the declared type for i. David J.
Re: [GENERAL] PostgreSQL and a clustered file system
Am 12.11.2012 11:03, schrieb Ivan Voras: Hello, Is anyone running PostgreSQL on a clustered file system on Linux? By clustered I actually mean shared, such that the same storage is mounted by different servers at the same time (of course, only one instance of PostgreSQL on only one server can be running on such a setup, and there are a lot of other precautions that need to be satisfied). TBTH, I don't see the potential benefit. Clustered filesystems have benefits for special use cases (e.g. redundant fileservers or applications that actually can work in parallel, relying on file locking, DB clusters that coordinate writes themselves, ...), but PG certainly is not one of these... I'm not even sure if PG will lock any files at all (I'd guess not, as SHM is used for process coordination), so you'd actually have to roll your own safety measures. OTOH, you pay a huge price for the clustering part (google for DLM), which will slow your I/O down significantly. If you're just looking for shared storage (e.g. because SAN space is fricking expensive and you don't want to allocate it twice to your primary and secondary server ;-), it is probably safer performing better to implement the volumes and filesystems as underlying cluster resources. That's something your cluster software should be able to handle... ;-) A newly imported filesystem will also not carry any leftover file locks around, should your primary server fail. The journal replay should not hurt you too much, as the relevant I/O PG does is sync anyway. Just my 2p, -- Gunnar Nick Bluth RHCE/SCLA Mobil +49 172 8853339 Email: gunnar.bl...@pro-open.de __ In 1984 mainstream users were choosing VMS over UNIX. Ten years later they are choosing Windows over UNIX. What part of that message aren't you getting? - Tom Payne -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
On Mon, Nov 12, 2012 at 10:38 AM, Lists li...@benjamindsmith.com wrote: On 11/10/2012 02:21 PM, Jeff Janes wrote: On Fri, Nov 9, 2012 at 4:28 PM, Lists li...@benjamindsmith.com wrote: 2) It was sheer chance that I discovered the need to reindex prior to vacuum in order to get the disk space back. As of 9.0, a vacuum full inherently does a reindex, so doing an explicit one is neither necessary nor beneficial. I don't know if your discovery is based on a non-full vacuum, or on an older server. I can only state that merely doing a vacuum full or vacuum full $tables sequentially did not free the space, whereas the sequential reindex $table, each followed immediately by a vacuum full $table) did. With what version? If you'd like I can easily recreate the scenario by simply not cleaning up one of the DB servers until it bloats up and make available (limit distribution) a binary copy of the database (EG: rsync the /var/lib/pgsql/ filesystem late at night) in order to help identify why it didn't work as expected. Do you think can make an easily script-able way to re-create the resistant bloat? That would be better than trying to disseminate binary files, I think. What I did was just create and drop temp tables in a tight loop, with autovacuum off, and then once pg_attribute got good and bloated, did a vacuum full as the database owner or superuser. If all of your long-lived objects were created before pg_attribute got bloated and so the bloat was due only to short-lived objects, then non-full vacuum (if run often enough) should eventually be able to return that space as the short-lived objects near the end start to go away. However, if even a single long-live object finds itself at the end of the table, then only a vacuum full will ever be able to reclaim that space. Since the time period involved (weeks/months) would have included both a large number of created/destroyed temp tables and occasionally altered persistent objects it would appear that the full option a very good idea, at least periodically. If you can prevent the extreme bloat from occurring in the first place, then the end of the table would not be so far away from its desired size that it needs to get reset by a vacuum full. If you find your self in need of a vacuum full, then you should do one. But you should ask yourself what went wrong that you got into that situation in the first place. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] plpgsql cursor reuse
Hello David, Well, I think this is normal you can t assign null to a variable without a proper cating in your example you can do somthing like this c1 cursor FOR SELECT 1 as shipmentid, null::bigint as olmid; Regards From: David Greco david_gr...@harte-hanks.com To: pgsql-general@postgresql.org pgsql-general@postgresql.org Sent: Monday, November 12, 2012 9:34 PM Subject: [GENERAL] plpgsql cursor reuse Came across this problem when trying to assign to a variable a field from a record that could come from multiple cursors. PG throws an error – “ ERROR: type of parameter 7 (bigint) does not match that when preparing the plan (unknown)”. If I make the null column in c1 null::bigint to match cursor c2, it works fine. Where is this plan coming from? Why would it match c1 to a plan coming from c2? In reality, the two cursors in question are wildly different- a join of about 10 completely different tables. When I saw the text of the error I was a bit concerned that it was being overly flexible in matching the current cursor to another. It errors out on the assignment to I, not the fetch. (maybe the fetch isn’t actually being done until the data in r is used). CREATE OR REPLACE FUNCTION demo.test_cursor_bug ( a IN integer ) RETURNS void AS $BODY$ DECLARE c1 cursor FOR SELECT 1 as shipmentid, null as olmid; c2 cursor FOR SELECT 2 as shipmentid, 32::bigint as olmid; r record; i bigint; BEGIN IF ( a = 0 ) THEN open c1; fetch c1 INTO r; close c1; END IF; IF ( a = 1 ) THEN open c2; fetch c2 INTO r; close c2; END IF; i := r.olmid; END; $BODY$ LANGUAGE plpgsql; select demo.test_cursor_bug(0); select demo.test_cursor_bug(1);
Re: [GENERAL] PostgreSQL and a clustered file system
On 11/12/2012 1:52 PM, Gunnar Nick Bluth wrote: Am 12.11.2012 11:03, schrieb Ivan Voras: Is anyone running PostgreSQL on a clustered file system on Linux? By clustered I actually mean shared, such that the same storage is mounted by different servers at the same time (of course, only one instance of PostgreSQL on only one server can be running on such a setup, and there are a lot of other precautions that need to be satisfied). TBTH, I don't see the potential benefit. Clustered filesystems have benefits for special use cases (e.g. redundant fileservers or applications that actually can work in parallel, relying on file locking, DB clusters that coordinate writes themselves, ...), but PG certainly is not one of these... Although I'm also a non-fan of database over clustered filesystems, I wanted to speak up here since I have a hunch the OP wasn't looking for the solution you're thinking of. I think he's asking if folk have run an HA setup with PG where the database files are stored in a dual-ported/clustered filesystem, the idea being that you can have a failure in the hardware for one DB server, and make the other one take over using the same underlying files. I've seen this done. You need some plumbing to ensure that only one PG instance can run at the same time. From memory, there are HA monitor tools available that do this in a generic way. As far as PG is concerned it has no idea there is a shared access filesystem, and so it has no need to perform file-level locking. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)
On 11/12/2012 01:31 PM, Jeff Janes wrote: On Mon, Nov 12, 2012 at 10:38 AM, Lists li...@benjamindsmith.com wrote: On 11/10/2012 02:21 PM, Jeff Janes wrote: On Fri, Nov 9, 2012 at 4:28 PM, Lists li...@benjamindsmith.com wrote: 2) It was sheer chance that I discovered the need to reindex prior to vacuum in order to get the disk space back. As of 9.0, a vacuum full inherently does a reindex, so doing an explicit one is neither necessary nor beneficial. I don't know if your discovery is based on a non-full vacuum, or on an older server. I can only state that merely doing a vacuum full or vacuum full $tables sequentially did not free the space, whereas the sequential reindex $table, each followed immediately by a vacuum full $table) did. With what version? [root@alpha ~]# rpm -qi postgresql91-server Name: postgresql91-server Relocations: (not relocatable) Version : 9.1.5 Vendor: (none) Release : 3PGDG.rhel6 Build Date: Sun 02 Sep 2012 12:13:18 PM UTC Install Date: Wed 12 Sep 2012 03:04:41 AM UTC Build Host: koji-sl6-x86-64-pg91 Group : Applications/DatabasesSource RPM: postgresql91-9.1.5-3PGDG.rhel6.src.rpm Size: 15191132 License: PostgreSQL Signature : DSA/SHA1, Sun 02 Sep 2012 12:13:24 PM UTC, Key ID 1f16d2e1442df0f8 URL : http://www.postgresql.org/ If you'd like I can easily recreate the scenario by simply not cleaning up one of the DB servers until it bloats up and make available (limit distribution) a binary copy of the database (EG: rsync the /var/lib/pgsql/ filesystem late at night) in order to help identify why it didn't work as expected. Do you think can make an easily script-able way to re-create the resistant bloat? That would be better than trying to disseminate binary files, I think. It would only be better if it actually created the situation that caused the space to not be freed. But, until you know the actual cause of a problem, I've found that it's often not productive to create simulations that may or may not be actually related to the problem. What I did was just create and drop temp tables in a tight loop, with autovacuum off, and then once pg_attribute got good and bloated, did a vacuum full as the database owner or superuser. Based on my understanding, if your loop included an intermittent schema change from within a transaction it might better approximate my actual scenario. Merely creating temp tables and then dropping them would create lots of activity at the end of the table which would free correctly. This still does not explain why reindex $table works when reindex is supposedly implicit in the vacuum. If all of your long-lived objects were created before pg_attribute got bloated and so the bloat was due only to short-lived objects, then non-full vacuum (if run often enough) should eventually be able to return that space as the short-lived objects near the end start to go away. However, if even a single long-live object finds itself at the end of the table, then only a vacuum full will ever be able to reclaim that space. Since the time period involved (weeks/months) would have included both a large number of created/destroyed temp tables and occasionally altered persistent objects it would appear that the full option a very good idea, at least periodically. If you can prevent the extreme bloat from occurring in the first place, then the end of the table would not be so far away from its desired size that it needs to get reset by a vacuum full. If you find your self in need of a vacuum full, then you should do one. But you should ask yourself what went wrong that you got into that situation in the first place. I agree; this is why my questions on enabling autovacuum in a related thread. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Enabling Autovacuum Postgres 9.1 (was Unexpectedly high disk space usage)
Lists wrote: There's a wealth of how to tune PG instruction that's old and (based on this thread alone) often stale enough to be classified as disinformative. For example, nearest I can tell, the entirety of this page is just wrong and/or irrelevant for 9.x and up: http://wiki.postgresql.org/wiki/VACUUM_FULL. In constrast to the advice to avoid Vacuum Full [...] comes the insight that vacuum full is necessary to clean up stale data that is not at the end of the table. (See Jeff Janes 11/10/2012 email) I thought about posting a clarification to what Jeff said there -- it is only necessary to run VACUUM FULL to free up space in the sense of having the database give disk space back to the OS so that the next time you need space in the table it must be re-allocated from the OS. This can be a bad thing in terms of slower allocation of space for new tuples and allocation of non-contiguous disk space. You are much better off running autovacuum aggressively enough that you don't need to run VACUUM FULL than to run it periodically. I have gone for years at a time on large databases (up to several TB) without ever running VACUUM FULL on a table. It is a valuable tool to recover from a bad state due to failure to properly maintain the database; it is not a substitute for doing things right. Long running transactions can cause autovacuum processes to stall out or be autocancelled. Long running transactions - is now long? In our system it's rare to have a transaction (even a prepared transaction) last much longer than a few minutes. Is that enough time to cause problems with AutoVacuum? The only situation where I would expect that to be a problem is in a very small table which is updated hundreds of times per second. Should I increase the max_workers field from the default of 3 to (perhaps) 10? Noting that my solution to the disk space problem is effectively a max_worker of 1 since it's all done sequentially, I wonder if reducing max_workers would actually be better? I would increase autovacuum_max_workers. Also, what's the thread count ? Is that max_workers? I believe that's what was meant. Why would I want to reduce the cost delay to 0, and how does this relate to cost_limit? I've never gone all the way to 0. I would be reluctant to go below 10; rather I would increase the cost limit. Autovacuum adds costs for what it is doing, and when it hits the limit it sleeps for the cost delay interval. I would think you would want to allow the process to work on other things periodically. given my substantial I/O subsystem, I'd want to drop cost_delay to near zero and set the cost_limit really high, which is a rough restatement of the last quoted paragraph above. (I think) I agree. Assuming that I make these suggestions and notice a subsequent system load problem, what information should I be gathering in order to provide better post-incident forensics? Run a report on bloat periodically, to make sure it is staying under control. You might want to monitor for long-running transactions and prepared transactions which the transaction manager has lost track of. The can be real killers. During any incident of poor performance, it is good to gather iostat or vmstat samples at an interval of 1 to 10 seconds for a few minutes, and to capture the contents of pg_stat_activity and pg_locks. Turning on logging of checkpoint activity, autovacuum activity, and slow queries can provide useful information when you match times from the logging up against the times of slow periods. Lastly, there's the question of reindexing before full vacuum. I've observed that not doing a manual reindex prior to vacuum full did not, in fact, free up the space, even though I've been told that reindex is implicit in the vacuum process. (?!) I'm confident that I can reproduce this behavior given a bit of time to allow one of our database servers to bloat back up. You never did say what version that was on. In 9.0 and later, VACUUM FULL will drop and recreate all indexes from scratch after rebuilding the heap, so it's pretty hard to imagine why dropping and recreating all indexes right *before* rebuilding the heap would have much impact. Prior to 9.0, VACUUM FULL would bloat indexes, so it was always a good idea to REINDEX in addition to running VACUUM FULL, although it was much more effective to run it *after* the VACUUM FULL. -Kevin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL and a clustered file system
Ivan, * Ivan Voras (ivo...@freebsd.org) wrote: Is anyone running PostgreSQL on a clustered file system on Linux? By clustered I actually mean shared, such that the same storage is mounted by different servers at the same time (of course, only one instance of PostgreSQL on only one server can be running on such a setup, and there are a lot of other precautions that need to be satisfied). Yes. I've seen PG used across both DRBD and OCFS2. It can be made to work quite well as an HA setup, provided you do appropriate fencing to make sure that two postmasters don't end up trying to access the data directory at the same time, etc, etc. Thanks, Stephen signature.asc Description: Digital signature
Re: [GENERAL] Enabling Autovacuum Postgres 9.1 (was Unexpectedly high disk space usage)
Kevin -- You wrote: ... running transactions can cause autovacuum processes to stall out or be autocancelled. Long running transactions - is now long? In our system it's rare to have a transaction (even a prepared transaction) last much longer than a few minutes. Is that enough time to cause problems with AutoVacuum? The only situation where I would expect that to be a problem is in a very small table which is updated hundreds of times per second. Could you elaborate on this, or point me to a previous thread ? I've got precisely such a database; currently we do an hourly reindex on all user tables in addition to some fairly standard autovac settings. The tables themselves don't seem to bloat much during ordinary operations but if we don't reindex performance tanks; when the site has issues we sometimes see table bloat but it seems to be dealt with by autovac. This is a postgres 9.1 instance w/ SSDs, lots of RAM (24 gigs) and relatively small tables (maybe a few thousands of rows in most cases, total footprint of the database is 240 megs) and being updated in the low thousands of times per second. Thanks! Greg Williamson -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running out of memory while making a join
Hi, That is what I got from gdb: TopMemoryContext: 88992 total in 10 blocks; 10336 free (7 chunks); 78656 used Type information cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used TopTransactionContext: 8192 total in 1 blocks; 7648 free (1 chunks); 544 used MessageContext: 73728 total in 4 blocks; 14752 free (3 chunks); 58976 used smgr relation table: 24576 total in 2 blocks; 13904 free (4 chunks); 10672 used TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used PortalHeapMemory: 1024 total in 1 blocks; 744 free (0 chunks); 280 used ExecutorState: 11586756656 total in 1391 blocks; 4938408 free (6 chunks); 11581818248 used TIDBitmap: 57344 total in 3 blocks; 23936 free (8 chunks); 33408 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 8192 total in 1 blocks; 8016 free (0 chunks); 176 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used Relcache by OID: 24576 total in 2 blocks; 14912 free (3 chunks); 9664 used CacheMemoryContext: 817392 total in 20 blocks; 145728 free (0 chunks); 671664 used uni_codauten: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used pk_nfe: 2048 total in 1 blocks; 632 free (0 chunks); 1416 used idx_wm_nfsp_dtemis: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used idx_wm_nfsp_codpre: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used wm_nfsp_pkey: 2048 total in 1 blocks; 440 free (0 chunks); 1608 used pg_index_indrelid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used pg_attrdef_adrelid_adnum_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used pg_ts_dict_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_aggregate_fnoid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_language_name_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_statistic_relid_att_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used pg_ts_dict_dictname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_namespace_nspname_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_opfamily_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_opclass_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_ts_parser_prsname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_amop_fam_strat_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1624 free (3 chunks); 1448 used pg_trigger_tgrelid_tgname_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used pg_cast_source_target_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used pg_auth_members_role_member_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_attribute_relid_attnum_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used pg_ts_config_cfgname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_authid_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_ts_config_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_conversion_default_index: 3072 total in 2 blocks; 1432 free (3 chunks); 1640 used pg_language_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_enum_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1624 free (3 chunks); 1448 used pg_ts_parser_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_database_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_conversion_name_nsp_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_class_relname_nsp_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used pg_attribute_relid_attnam_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_class_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used pg_index_indexrelid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_type_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_rewrite_rel_rulename_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_authid_rolname_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_auth_members_member_role_index: 3072 total
Re: [GENERAL] Enabling Autovacuum Postgres 9.1 (was Unexpectedly high disk space usage)
On 11/13/2012 04:04 AM, Lists wrote: There's a wealth of how to tune PG instruction that's old and (based on this thread alone) often stale enough to be classified as disinformative. For example, nearest I can tell, the entirety of this page is just wrong and/or irrelevant for 9.x and up: http://wiki.postgresql.org/wiki/VACUUM_FULL. Agreed, that needs fixing. I'll sort that out. That information was important before the VACUUM FULL rewrite, but is now severely outdated. It needs to be split into pre-9.1 and 9.1+ sections. In constrast to the advice to avoid Vacuum Full ( http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT) comes the insight that vacuum full is necessary to clean up stale data that is not at the end of the table. (See Jeff Janes 11/10/2012 email) That part isn't quite right AFAIK. If you keep table bloat under control, ordinary VACCUM is perfectly sufficient. You only need/want to VACUUM FULL if you wish to truncate a table, reducing its size by compacting rows. In an actively used database that's usually pretty pointless, since new rows will then be added to the end, growing the table. You might as well just keep on re-using the space, rather than going through those compact-and-expand cycles. You'll suffer from less file fragmentation that way and won't be paying the costs of file system allocations. If you have a table that you've dramatically reduced in size (say, by deleting the vast majority of it) and you won't be adding more rows to replace the old ones, that's when VACUUM FULL makes sense. It's a bit like those utilities that claim to free or clean or de-fragment memory. They seem good, but they're actually grossly counter-productive, because the system then has to re-read cached data and otherwise fight to get back to its old equilibrium. It's typically the same for Pg: you want to aim for equilibrium, not free space that'll just promptly get re-allocated. If you do have a bad bloat problem, I'd set a non-default FILLFACTOR before doing a VACUUM FULL, so you still have some free space within the table after vacuum completes. That way you won't be immediately paying the cost of allocating space for new rows as soon as any UPDATEs or INSERTs come in. non-full vacuum can only free space from the end of the table. This would imply that a full analyze is a good idea, at least periodically (weekly/monthly/quarterly) in a database that combines the use of temp tables and periodic changes to persistent objects. Does autovacuum ever do a full analyze? What about autovacuum and the reindex question at the end of this email? I'm not sure there's any such thing as a full analyze. VACUUM FULL ANALYZE is Do a VACUUM FULL and an ANALYZE, not Do a full analyze. Autovacuum should be taking care of analyze and table statistics. If it isn't, adjust autovacuum parameters so that it does. These two can be killers. Long running transactions can cause autovacuum processes to stall out or be autocancelled. Long running transactions - is now long? In our system it's rare to have a transaction (even a prepared transaction) last much longer than a few minutes. Is that enough time to cause problems with AutoVacuum? Not generally, no. PostgreSQL can't clean up rows that are still visible to a transaction. So if your transactions are three minutes long, that's a three minute delay before VACUUM can clean up DELETEd rows or dead rows left by UPDATEs. Not a biggie even on a pretty high load DB. You should generally be concerned only when transactions are open over user think time or are abandoned by buggy applications - cases where the transaction length is many minutes or hours, potentially unbounded. Uncommitted prepared transactions are also a problem for similar reasons. Lastly, there's the question of reindexing before full vacuum. I've observed that not doing a manual reindex prior to vacuum full did not, in fact, free up the space, even though I've been told that reindex is implicit in the vacuum process. (?!) VACUUM FULL, post-9.1, should take care of index bloat. Pre-9.1 VACUUM FULL could make index bloat worse. Ordinary VACUUM will not truncate indexes AFAIK, only mark free space within them so it can be re-used. Same deal as with the table its self: this is usually what you want. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [GENERAL] Enabling Autovacuum Postgres 9.1 (was Unexpectedly high disk space usage)
On 11/13/2012 10:29 AM, Craig Ringer wrote: On 11/13/2012 04:04 AM, Lists wrote: There's a wealth of how to tune PG instruction that's old and (based on this thread alone) often stale enough to be classified as disinformative. For example, nearest I can tell, the entirety of this page is just wrong and/or irrelevant for 9.x and up: http://wiki.postgresql.org/wiki/VACUUM_FULL. Agreed, that needs fixing. I'll sort that out. That information was important before the VACUUM FULL rewrite, but is now severely outdated. It needs to be split into pre-9.1 and 9.1+ sections. Correction - pre-9.0 and 9.0+ -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [GENERAL] integer instead of 'double precision'?
On Mon, Nov 12, 2012 at 02:16:21PM +0100, Willy-Bas Loos wrote: On Fri, Sep 9, 2011 at 5:09 PM, Guillaume Lelarge guilla...@lelarge.infowrote: You divide an integer with an integer, that should give you an integer. Can you tell me the reasoning behind that idea? Is it a rule that the output type of an operator must equal the input type? In this case that doesn't seem locigal. I think that the / operator should return something that allows fractions, since the operator creates fractions so frequently. The thing is, you often do need the version that truncates. It's supported by the underlying system and if you want a float as output you can cast one of the arguments to float to do that. It's been like this forever (C does it too for example). For integers it may help if you think of it in combination with the modulus operator (%). Python 3 recently changed to give float output by default, but also provides a // operator to access the truncated version. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature