[GENERAL] File system level copy
Hi I installed PostgresSQL-8.3 on my linux machine. The cluster directory is /usr/local/data and I created three databases named db1, db2, and db3. db1 is in the default tablespace 'pg_default'. db2 is in '/home/tablespace/space1/' and db3 is in '/home/tablespace/space2/'. I want to copy the cluster directory and the db3 tablespace folder('/home/tablespace/space2/') without stopping the database server. Then I want to use the cluster directory and db3's tablespace in another linux machine to recover 'db3' database. Does this way work? If not, why? Regards, Hao
[GENERAL] How do query optimizers affect window functions
Hi, Postgresql, I want to understand how the query optimizers affect the output of the window functions. For example, set cpu_tuple_cost = 50 in postgresql.conf and start the server, I apply the regress test (make installcheck). The test of window function fails. Checking the diff and I found the output of the window functions are different. For example, For the following query: SELECT sum(unique1) over (rows between current row and unbounded following), unique1, four FROM tenk1 WHERE unique1 10; The expected results are: sum | unique1 | four -+-+-- 45 | 4 |0 41 | 2 |2 39 | 1 |1 38 | 6 |2 32 | 9 |1 23 | 8 |0 15 | 5 |1 10 | 3 |3 7 | 7 |3 0 | 0 |0 But the real results are: sum | unique1 | four -+-+-- 45 | 0 |0 45 | 1 |1 44 | 2 |2 42 | 3 |3 39 | 4 |0 35 | 5 |1 30 | 6 |2 24 | 7 |3 17 | 8 |0 9 | 9 |1 There're altogether 6 queries in window test that outputs different query results. I don't understand why the results are different. Intuitively, the queries show return the same results no matter what plan the optimizer choose. I suspected the previous queries had some side effect on the latter one (e.g., change the current row), so I removed all the previous queries before this query in window.sql. But the result did not change. Could anyone explain this behavior? Or point out how to investigate? Thanks a lot! Tianyin
Re: [GENERAL] Using window functions to get the unpaginated count for paginated queries
Clemens Park wrote: Recently, during a performance improvement sweep for an application at my company, one of the hotspots that was discovered was pagination. In order to display the correct pagination links on the page, the pagination library we used (most pagination libraries for that matter) ran the query with OFFSET and LIMIT to get the paginated results, and then re-ran the query without the OFFSET and LIMIT and wrapped them in a SELECT COUNT(*) FROM main_query to get the total number of rows. In an attempt to optimize this, we used a window function as follows: Given a query that looked as follows: SELECT a,b,c FROM table WHERE clauses OFFSET x LIMIT y; add total_entries_count column as follows: SELECT COUNT(table.id) OVER () AS total_entries_count, a,b,c FROM table WHERE clauses OFFSET x LIMIT y; This calculates the total number of unpaginated rows correctly, without affecting the runtime of the query. At least as far as I can tell. It can affect the runtime considerably. I created a 10^6 row test table and tried: test= EXPLAIN ANALYZE SELECT id, val FROM large OFFSET 100 LIMIT 10; QUERY PLAN -- Limit (cost=1.49..1.64 rows=10 width=12) (actual time=0.177..0.195 rows=10 loops=1) - Seq Scan on large (cost=0.00..14902.00 rows=100 width=12) (actual time=0.028..0.114 rows=110 loops=1) Total runtime: 0.251 ms (3 rows) test= EXPLAIN ANALYZE SELECT id, val, COUNT(*) OVER () AS total_entries_count FROM large OFFSET 100 LIMIT 10; QUERY PLAN -- Limit (cost=2.74..3.01 rows=10 width=12) (actual time=1893.606..1893.625 rows=10 loops=1) - WindowAgg (cost=0.00..27402.00 rows=100 width=12) (actual time=1893.435..1893.559 rows=110 loops=1) - Seq Scan on large (cost=0.00..14902.00 rows=100 width=12) (actual time=0.025..647.182 rows=100 loops=1) Total runtime: 1915.255 ms (4 rows) That is because the second query will have to scan all rows, while the first one can stop scanning after 110 rows. The questions I have are: 1) Are there any adverse effects that the above window function can have? I can only think of the performance degradation mentioned above. 2) Are there any cases where the count would return incorrectly? No. 3) In general, is this an appropriate use-case for using window functions? I think it is. Maybe you can do better if you don't retrieve the total count of rows for every set of rows you select. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Understanding streaming replication
Philippe Amelant wrote: So i was thinking it was just a reconnect to the sender (and I can see the standby trying to reconnect in the log) Hmmm. I think I was too quick when I said no. If you ship the WAL archives including the history file to the standby, then the standby should be able to recover across the timeline change from the archives (if you have recovery_target_timeline set to latest in recovery.conf) and then reestablish streaming replication. I never tried that though. (The patch I quoted above would allow the timeline change via streaming replication.) You're right I added recovery_target_timeline='latest' in the recovery.conf then I promoted the standby. The replication on the second standby stopped with a message complaining about timeline. Then I copied the archived wal from the new master to the (stopped) standby (in pg_xlog) The standby restarted on the new timeline and the datas seem to be ok. I also tried to just copy the last 00X.history in pg_xlog and it work too. I suppose this could fail if max_wal_keep_segment is too low Thanks you very much for your help. Could you just point me where you found this information in the doc ? I didn't consult the documentation, I used what I know of how WAL recovery and streaming replication work... However, I find the following in http://www.postgresql.org/docs/current/static/recovery-target-settings.html recovery_target_timeline [...] Setting this to latest recovers to the latest timeline found in the archive, which is useful in a standby server. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] File system level copy
Hi, I installed PostgresSQL-8.3 on my linux machine. The cluster directory is /usr/local/data and I created three databases named db1, db2, and db3. db1 is in the default tablespace 'pg_default'. db2 is in '/home/tablespace/space1/' and db3 is in '/home/tablespace/space2/'. I want to copy the cluster directory and the db3 tablespace folder('/home/tablespace/space2/') without stopping the database server. Then I want to use the cluster directory and db3's tablespace in another linux machine to recover 'db3' database. Does this way work? If not, why?
Re: [GENERAL] How do query optimizers affect window functions
On Wed, Nov 14, 2012 at 10:12 AM, Tianyin Xu t...@cs.ucsd.edu wrote: Hi, Postgresql, I want to understand how the query optimizers affect the output of the window functions. For example, set cpu_tuple_cost = 50 in postgresql.conf and start the server, I apply the regress test (make installcheck). The test of window function fails. Checking the diff and I found the output of the window functions are different. For example, For the following query: SELECT sum(unique1) over (rows between current row and unbounded following), unique1, four FROM tenk1 WHERE unique1 10; The expected results are: sum | unique1 | four -+-+-- 45 | 4 |0 41 | 2 |2 39 | 1 |1 38 | 6 |2 32 | 9 |1 23 | 8 |0 15 | 5 |1 10 | 3 |3 7 | 7 |3 0 | 0 |0 But the real results are: sum | unique1 | four -+-+-- 45 | 0 |0 45 | 1 |1 44 | 2 |2 42 | 3 |3 39 | 4 |0 35 | 5 |1 30 | 6 |2 24 | 7 |3 17 | 8 |0 9 | 9 |1 There're altogether 6 queries in window test that outputs different query results. I don't understand why the results are different. Intuitively, the queries show return the same results no matter what plan the optimizer choose. I suspected the previous queries had some side effect on the latter one (e.g., change the current row), so I removed all the previous queries before this query in window.sql. But the result did not change. Could anyone explain this behavior? Or point out how to investigate? Thanks a lot! Tianyin Hi. In short: if no explicit ordering specivied for a query the resulting set can be in any order. It is up to query optimizer to chose in what order the resulting tuples will be. The window function used in this test case rely on the order of the resulting set (it sums from current to the last) so it will generate different results for different query plans. I think for this test cases (window functions) explicit ordering should be specified. In normal cases order dependent window functions are newer used without explicit ordering.
Re: [GENERAL] Running out of memory while making a join
Hi Tom, Thank you for the analyzes! No problem, there is no problem to use select wm_nfsp.* but as my concern is to prevent this in the future I think I should apply the fix or is there a config parameter to abend the backend if it reaches some kind of storage limit? Thank you! Reimer On Tue, Nov 13, 2012 at 5:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: Carlos Henrique Reimer carlos.rei...@opendb.com.br writes: That is what I got from gdb: ExecutorState: 11586756656 total in 1391 blocks; 4938408 free (6 chunks); 11581818248 used So, query-lifespan memory leak. After poking at this for a bit, I think the problem has nothing to do with joins; more likely it's because you are returning a composite column: select wm_nfsp from 5611_isarq.wm_nfsp ... I found out that record_out() leaks sizable amounts of memory, which won't be recovered till end of query. You could work around that by returning select wm_nfsp.* instead, but if you really want the result in composite-column notation, I'd suggest applying this patch: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c027d84c81d5e07e58cd25ea38805d6f1ae4dfcd regards, tom lane -- Reimer 47-3347-1724 47-9183-0547 msn: carlos.rei...@opendb.com.br
Re: [GENERAL] File system level copy
Hao Wang wrote: I installed PostgresSQL-8.3 on my linux machine. The cluster directory is /usr/local/data and I created three databases named db1, db2, and db3. db1 is in the default tablespace 'pg_default'. db2 is in '/home/tablespace/space1/' and db3 is in '/home/tablespace/space2/'. I want to copy the cluster directory and the db3 tablespace folder('/home/tablespace/space2/') without stopping the database server. Then I want to use the cluster directory and db3's tablespace in another linux machine to recover 'db3' database. Does this way work? If not, why? First, you need a correct backup for recovery. Before copying, run pg_start_backup, and pg_stop_backup afterwards. Then you need to have recovery.conf and WAL archives (or be lucky and all WALs are still in pg_xlog). WAL contains changes to all databases in the cluster, so you cannot recover only one database, you'll have to recover them all. Read http://www.postgresql.org/docs/current/static/continuous-archiving.html for background and details. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] pgBadger 2.2 released : Improvements and benchmarking
Paris, France - November 14th, 2012 DALIBO is proud to announce the release of version 2.2 of pgBadger, the new PostgreSQL log analyzer. pgBadger is built for speed with fully detailed reports from your PostgreSQL log files. It's a single and small Perl script that aims to replace and to outperform the old PHP script pgFouine. = pgBadger 2.2 comes with new features and more options = pgBadger is becoming more flexible with some additional user-requested paramaters such as: * --exclude-user option to ignore a specific user * --select-only option to build report only on select queries * --include-query and --include-file to specify with regular expression the queries/files that must be included in the report * --nocomment option to remove comments (/* ... */) from queries. This new version also has significant improvements: * You can now output XML files for Tsung (http://tsung.erlang-projects.org/), a high-performance benchmarking framework. Based on your logged queries, pgBadger can produce a realistic benchmark scenario. For now, the Simple Protocol is supported (connect/authenticate/sql/close). * pgBadger now ruses the XS version of Text::CSV instead of the Pure Perl implementation. It's a clearly faster. However using csvlog is still a bit slower than syslog or stderr log format. * The PostgreSQL keywords list is now updated for 9.2 * Dhutdown events are reported * Current total of queries and events parsed are reported in the progress bar * The log format detection is improved * ... and many bugfixes For the complete list of changes, please checkout the release note on https://github.com/dalibo/pgbadger/blob/master/ChangeLog All pgBadger users should upgrade as soon as possible. = How to produce a Tsung scenario ? = The purpose of Tsung is to simulate users in order to test the scalability and performance of IP based client/server applications. You can use it to do load and stress testing of your servers. It can be used to stress HTTP, WebDAV, SOAP, PostgreSQL, MySQL, LDAP and Jabber/XMPP servers. To use Tsung against PostgreSQL, you create an XML file that will define several sessions. These sessions will be used to simulate different type of users. Now for instance if we want to create a Tsung scenario based on the recent SELECT queries received by our server, all we need to do is : perl pgbadger -S -o sessions.tsung --prefix '%t [%p]: [%l-1] user=%u,db=%d ' /pglog/postgresql-9.1.log The sessions.tsung file is an XML configuration file for Tsung. We can now run the benchmarck with : tsung -f sessions.tsung start Please read the project documentation for more details about Tsung : http://tsung.erlang-projects.org/user_manual.html = Links Credits = DALIBO would like to thank the developers who submitted patches and the users who reported bugs and feature requests, especially Hubert Depesz Lubaczewski, Birta Levente, David Fetter, and Vincent Laborie. pgBadger is an open project. Any contribution to build a better tool is welcome. You just have to send your ideas, features requests or patches using the GitHub tools or directly on our mailing list. Links : * Download : https://github.com/dalibo/pgbadger/downloads * Mailing List : https://listes.dalibo.com/cgi-bin/mailman/listinfo/pgbagder -- \\ **About pgBadger** : pgBagder is a new generation log analyzer for PostgreSQL, created by Gilles Darold, also author of ora2pg migration tool. pgBadger is a fast and easy tool to analyze your SQL traffic and create HTML5 reports with dynamics graphs. pgBadger is the perfect tool to understand the behavior of your PostgreSQL server and identify which SQL queries need to be optimized. Docs, Download Demo at http://dalibo.github.com/pgbadger/ -- \\ **About DALIBO** : DALIBO is the leading PostgreSQL company in France, providing support, trainings and consulting to its customers since 2005. The company contributes to the PostgreSQL community in various ways, including : code, articles, translations, free conferences and workshops Check out DALIBO's open source projects at http://dalibo.github.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Understanding streaming replication
On 11/13/2012 02:40 AM, Albe Laurenz wrote: The only thing I have seen is RedHat's Cluster Suite, which is commercial. I would recommend to have at least three nodes though, because the two node cluster we had was subject to spurious failovers on short quorum disk hiccups. There's also the Pacemaker + Corosync stack. There are plenty of tutorials on how it works over at Cluster Labs: http://www.clusterlabs.org/ It's totally free and we've been using it for a couple years now to replace Lifekeeper, a commercial cluster offering from SIOS. So there's two more right there. :) -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Understanding streaming replication
Hi, On Tue, 2012-11-13 at 09:40 +0100, Albe Laurenz wrote: The only thing I have seen is RedHat's Cluster Suite, which is commercial. Depends. It is open source, and all components are also available in CentOS and Scientific Linux, and there are companies out there who support clusters on these two. Regards, -- Devrim GÜNDÜZ Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz signature.asc Description: This is a digitally signed message part
Re: [GENERAL] general fear question about move PGDATA from one Disc to another
Dear Craig, Am 14-11-2012 00:44, schrieb Craig Ringer: On 11/13/2012 11:26 PM, Aleksandar Lazic wrote: Dear listmembers, I need to move /var/lib/postgresql/8.4/main from the / partion to another disc. If so, you're probably using `pg_wrapper` for cluster management. Confirm that with `pg_lsclusters`. If the command exists and it shows an 8.4 installation with the data directory you mentioned above, then you're using pg_wrapper to manage Pg. pg_lsclusters Version Cluster Port Status OwnerData directory Log file 8.4 main 5432 online postgres /var/lib/postgresql/8.4/main custom pg_wrapper reads /etc/postgresql/[version]/[clustername]/postgresql.conf to locate the cluster. For example, yours will be /etc/postgresql/8.4/main/postgresql.conf . This means you don't have to edit any init script settings on your system to get Pg to start properly next boot. This was also a option for me. I wanted not to much changes in the system but it looks that this is the best way. 0.) Mount new pgroot, are there any hints for the mount command? mount -t ext4 -o noatime,nodiratime,nodev,noexec,nosuid /dev/sde1 /pgroot output from mount /dev/sde1 on /pgroot type ext4 (rw,nosuid,nodev,noexec,noatime,nodiratime,barrier=1,data=ordered) Unless you add this to fstab as well, the file system won't mount at the next boot and PostgreSQL will fail to start. ;-) I added this to the fstab, of course. 3.) copy the current /var/lib/postgresql/8.4/main to the new dir as user postgres cd /var/lib/postgresql/ tar cvf - . | (cd /pgroot/pgdata tar xvf -) What an odd way to do the copy. I'd use `cp -aR`, or at least use the preserve flag (-p) to tar. Well it is old fashioned way. If you like you can have the new file system (assuming it's dedicated to just PostgreSQL) mount where the old data directory was, so there's no change visible in the system. edit /etc/fstab and add a line like: UUID=b4d54649-a9b5-4a57-aa22-291791ad7a3c /var/lib/postgresql/ ext4 defaults,noatime 0 0 Replace the UUID shown with the UUID of your new file system, determined with the vol_id command on older systems, or blkid on newer ones. Or just use the device node for the partition, like /dev/sdx9 I have written the following into the fstab. /dev/sde1 /var/lib/postgresql ext4 noatime,nodiratime,nodev,noexec,nosuid 0 2 Make a full pg_dumpall backup. Now stop all your PostgreSQL clusters with pg_ctlcluster and: mv /var/lib/postgresql/ mv /var/lib/postgresql.old mkdir /var/lib/postgresql mount /var/lib/postgresql chown postgres:postgres /var/lib/postgresql shopt -s dotglob cp -aR /var/lib/postgresql.old/* /var/lib/postgresql/ Start the cluster with pg_ctlcluster Thanks for the description and explanation. Best regards Aleks -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] SSDs - SandForce or not?
On 11/14/2012 01:11 AM, Toby Corkindale wrote: I'm wondering which type of SSDs would be better for use with PostgreSQL. A few things: 1. While the controller may or may not have an impact, the presence of an on-board super-capacitor will have more. SSDs should be considered malignant devices that will go out of their way to destroy your data, unless they have one of these. 2. Workload on a compressible system like PG is generally dependent on your data sets. If you have lots of TOAST data, which is already compressed, you get no benefit. If your use case doesn't show a lot of random writes, optimizing for them is of questionable value. 3. SSDs also exist as effectively raw NVRAM, in the form of PCIe cards. These cards come in several varieties, and these days, can be mounted in external PCIe chassis in hot-swap bays much like more conventional drive enclosures. Some of these use a kernel-level driver over a proprietary controller, using neither Sandforce or anything else. They are also close to an order of magnitude faster than an SSD because they discard the SATA/SCSI bus entirely. 4. SSDs do have limited write cycles, and whether it's write leveling or drive compression to reduce writes on the actual NVRAM chips, if you honestly have a high write load, you're better off with whatever card reports the highest longevity of the relatively scarce write cycles per cell. 5. You're more likely to get performance improvements pursuing SLC (single layer chips) versus cheaper MLC (multi-layer) for writing, because the controller doesn't have to mask writes to the proper layer. Basically, there's way more involved here than Sandforce vs. Others. Or even Compressible vs. Not. SSDs are still a pretty Wild West kind of thing, and you've got a lot more variables to consider than with standard spindles. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Running out of memory while making a join
Carlos Henrique Reimer carlos.rei...@opendb.com.br writes: No problem, there is no problem to use select wm_nfsp.* but as my concern is to prevent this in the future I think I should apply the fix or is there a config parameter to abend the backend if it reaches some kind of storage limit? You could start the postmaster under a smaller ulimit setting ... regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Access disk from plpython
Greetings all, having a permission issue with writing a file using plpython to a local folder, changed permissions to everyone read and write and even changed the owner to postgres. but no joy, any suggestions? Regards, Rhys
Re: [GENERAL] Access disk from plpython
On 11/14/2012 08:44 AM, Rhys A.D. Stewart wrote: Greetings all, having a permission issue with writing a file using plpython to a local folder, changed permissions to everyone read and write and even changed the owner to postgres. but no joy, any suggestions? What is the actual error message? Regards, Rhys -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Access disk from plpython
On 11/14/2012 08:48 AM, Rhys A.D. Stewart wrote: This is it: ERROR: IOError: [Errno 13] Permission denied: '/root/p1/me.txt' CONTEXT: Traceback (most recent call last): PL/Python anonymous code block, line 3, in module t = open('/root/p1/me.txt','w') PL/Python anonymous code block CCing the list. Does the file already exist? If so, does the file itself have the correct permissions? -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Access disk from plpython
On Wed, Nov 14, 2012 at 8:44 AM, Rhys A.D. Stewart rhys.stew...@gmail.com wrote: Greetings all, having a permission issue with writing a file using plpython to a local folder, changed permissions to everyone read and write and even changed the owner to postgres. but no joy, any suggestions? plpython is a trusted langauge. That means pgsql trusts it not to allow you to break out of the sandbox and interact directly with the file system or the database server internals. If you need to do those things, you can use plpythonu. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Access disk from plpython
On 11/14/2012 09:03 AM, Jeff Janes wrote: On Wed, Nov 14, 2012 at 8:44 AM, Rhys A.D. Stewart rhys.stew...@gmail.com wrote: Greetings all, having a permission issue with writing a file using plpython to a local folder, changed permissions to everyone read and write and even changed the owner to postgres. but no joy, any suggestions? plpython is a trusted langauge. That means pgsql trusts it not to allow you to break out of the sandbox and interact directly with the file system or the database server internals. If you need to do those things, you can use plpythonu. My understanding is that of Postgres 7.4 plpython does not exist. In other words only the untrusted version is available. Cheers, Jeff -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Using Postgresql 9.2 on windows 7 and windows vista
Hi, I am going to use PostgreSQL 9.2 with my application which runs on Windows 7/WIndows Visa 64 bit OS. Since these platforms are not officially supported by PostgreSQL, can i go ahead and use PostgreSQL on these platform? Regards D T
Re: [GENERAL] Access disk from plpython
On 11/14/2012 08:56 AM, Rhys A.D. Stewart wrote: No it doesn't, I was hoping to create the file. Some testing here confirms it is saving file with postgres user permissions. I could get it to save by creating a directory owned by the postgres user in my home directory and saving to there. My guess is a directory in you path has permissions that are preventing saving. At a guess, /root/. -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] word/phrase extraction ranking
Hello, From selected rows in a table, how can one extract and rank words/phrases based on how often they occur? Here's an example: http://developer.yahoo.com/search/content/V1/termExtraction.html INPUT: CREATE TABLE phrases ( idBIGSERIAL, phrase VARCHAR(1)); INSERT INTO phrases (phrase) VALUES (‘Italian sculptors and painters of the renaissance favored the Virgin Mary for inspiration.’) INSERT INTO phrases (phrase) VALUES (‘Andrea Bolgi was an italian sculptor’) OUTPUT: phrase | weight italian sculptor | 5 virgin mary | 2 painters | 1 renaissance | 1 inspiration | 1 Andrea Bolgi | 1 Some notes: * phrases could contain “stop words”, e.g. “easy to answer” * ideally, english language variations and synonyms would be automatically grouped. I understand one might use postgresql’s full text search support, and maybe pg_trgm, but how exactly? Thanks
Re: [GENERAL] Using Postgresql 9.2 on windows 7 and windows vista
On 14/11/2012 17:19, D T wrote: Hi, I am going to use PostgreSQL 9.2 with my application which runs on Windows 7/WIndows Visa 64 bit OS. Since these platforms are not officially supported by PostgreSQL, can i go ahead and use PostgreSQL on these platform? Are they not? I didn't know that. Ignorance evidently is bliss - I have 8.4, 9.1 (64-bit) and 9.2 (64-bit) happily running on my Windows 7 laptop. Ray. -- Raymond O'Donnell :: Galway :: Ireland r...@iol.ie -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Using Postgresql 9.2 on windows 7 and windows vista
On 11/14/2012 11:13 AM, Raymond O'Donnell wrote: On 14/11/2012 17:19, D T wrote: Hi, I am going to use PostgreSQL 9.2 with my application which runs on Windows 7/WIndows Visa 64 bit OS. Since these platforms are not officially supported by PostgreSQL, can i go ahead and use PostgreSQL on these platform? Are they not? I didn't know that. Ignorance evidently is bliss - I have 8.4, 9.1 (64-bit) and 9.2 (64-bit) happily running on my Windows 7 laptop. Well, to be pedantic, I don't think that PostgreSQL is officially supported at all regardless of platform. The open-source community provides the source code and wonderful on-line support from users and developers but pre-compiled binaries are typically contributed by a variety of commercial and non-commercial packagers (BSD ports, EnterpriseDB, Martin Pitt for the Debian/Ubuntu packages, etc.). However the about page does say that PostgreSQL runs on all major operating systems, including Linux, UNIX (AIX, BSD, HP-UX, SGI IRIX, Mac OS X, Solaris, Tru64), and Windows. Cheers, Steve -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How do query optimizers affect window functions
On Wed, Nov 14, 2012 at 12:12 AM, Tianyin Xu t...@cs.ucsd.edu wrote: Hi, Postgresql, I want to understand how the query optimizers affect the output of the window functions. Use EXPLAIN. One is an index scan, one is a bitmap scan. They return rows in a different order. .. I don't understand why the results are different. Intuitively, the queries show return the same results no matter what plan the optimizer choose. My intuition is that the query should refuse to run at all, because the results are order dependent and you haven't specified an ordering. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Failed Login Attempts parameter
Hi, I've been searching the web and reviewing documentation, but I cannot find any reference to whether or not a parameter, for example, failed_login_attempts, exists in PostgreSQL that determines the number of attempts a user can make before being locked. In addition, if such a parameter or similar setup exists, is there also some database object that provides a current count of the failed login attempts? Thanks, Frank Frank Cavaliero Database Administrator IBM Infosphere Guardium IBM Software Group, Information Management 978-899-3635 - Direct For Technical Services Support Please Call 978-899-9195. This communication is intended only for the use of the individual or entity named as the addressee. It may contain information which is privileged and/or confidential under applicable law. If you are not the intended recipient or such recipient's employee or agent, you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited. If you have received this communication in error, please immediately notify us at 978-899-9195 or notify the sender by reply e-mail and expunge this communication without making any copies. Thank you for your cooperation.
[GENERAL] High SYS CPU - need advise
Hello everyone, I'm seeking help in diagnosing / figuring out the issue that we have with our DB server: Under some (relatively non-heavy) load: 300...400 TPS, every 10-30 seconds server drops into high cpu system usage (90%+ SYSTEM across all CPUs - it's pure SYS cpu, i.e. it's not io wait, not irq, not user). Postgresql is taking 10-15% at the same time. Those periods would last from few seconds, to minutes or until Postgresql is restarted. Needless to say that system is barely responsive, with load average hitting over 100. We have mostly select statements (joins across few tables), using indexes and resulting in a small number of records returned. Should number of requests per second coming drop a bit, server does not fall into those HIGH-SYS-CPU periods. It all seems like postgres runs out of some resources or fighting for some locks and that causing kernel to go into la-la land trying to manage it. So far we've checked: - disk and nic delays / errors / utilization - WAL files (created rarely) - tables are vacuumed OK. periods of high SYS not tied to vacuum process. - kernel resources utilization (sufficient FS handles, shared MEM/SEM, VM) - increased log level, but nothing suspicious/different (to me) is reported there during periods of high sys-cpu - ran pgbench (could not reproduce the issue, even though it was producing over 40,000 TPS for prolonged period of time) Basically, our symptoms are exactly as was reported here over a year ago (though for postgres 8.3, we ran 9.1): http://archives.postgresql.org/pgsql-general/2011-10/msg00998.php I will be grateful for any ideas helping to resolve or diagnose this problem. Environment background: Postgresql 9.1.6. Postgres usually has 400-500 connected clients, most of them are idle. Database is over 1000 tables (across 5 namespaces), taking ~150Gb on disk. Linux 3.5.5 (Fedora 17 x64) on 32Gb RAM / 8 cores Default configuration changed: max_connection = 1200 shared_buffers = 3200MB temp_buffers = 18MB max_prepared_transactions = 500 work_mem = 16MB maintenance_work_mem = 64MB max_files_per_process = 3000 wal_level = hot_standby fsync = off checkpoint_segments = 64 checkpoint_timeout = 15min effective_cache_size = 8GB default_statistics_target = 500 -- Vlad
Re: [GENERAL] Failed Login Attempts parameter
Frank Cavaliero fcava...@us.ibm.com writes: I've been searching the web and reviewing documentation, but I cannot find any reference to whether or not a parameter, for example, failed_login_attempts, exists in PostgreSQL that determines the number of attempts a user can make before being locked. There is not in the core database code. But if you use PAM authentication, there are plenty of pluggable modules that do that and ninety-nine other specialized things you might wish to have in the authentication code path. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
On 11/14/12 1:13 PM, Vlad wrote: Postgresql 9.1.6. Postgres usually has 400-500 connected clients, most of them are idle. Database is over 1000 tables (across 5 namespaces), taking ~150Gb on disk. thats a really high client connection count for a 8 core system. I'd consider implementing a connection pool (like pgbouncer), and rewriting your client applications to connect, do a transaction, disconnect, so the actual number of postgres connections is much lower, say in the 16-48 range. -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High SYS CPU - need advise
John, thanks for your feedback. While implementing connection pooling would make resources utilization more efficient, I don't think it's the root of my problem. Most of the connected clients are at IDLE. When I do select * from pg_stat_activity where current_query not like '%IDLE%'; I only see several active queries at any given time. -- Vlad On Wed, Nov 14, 2012 at 3:23 PM, John R Pierce pie...@hogranch.com wrote: On 11/14/12 1:13 PM, Vlad wrote: Postgresql 9.1.6. Postgres usually has 400-500 connected clients, most of them are idle. Database is over 1000 tables (across 5 namespaces), taking ~150Gb on disk. thats a really high client connection count for a 8 core system. I'd consider implementing a connection pool (like pgbouncer), and rewriting your client applications to connect, do a transaction, disconnect, so the actual number of postgres connections is much lower, say in the 16-48 range.
Re: [GENERAL] Using window functions to get the unpaginated count for paginated queries
Thanks for the reply everyone. In my case, it looks like there is no real drawback then, since what used to happen is: SELECT a,b,c FROM table WHERE clauses OFFSET x LIMIT y; followed by: SELECT COUNT(*) FROM ( SELECT a,b,c FROM table WHERE clauses ); (notice the lack of OFFSET and LIMIT) and both of them were replaced with: SELECT COUNT(table.id) OVER () AS total_entries_count, a,b,c FROM table WHERE clauses OFFSET x LIMIT y; On Wed, Nov 14, 2012 at 5:11 AM, Igor Romanchenko igor.a.romanche...@gmail.com wrote: 1) Are there any adverse effects that the above window function can have? It can cause severe performance degradation, as mentioned before. 2) Are there any cases where the count would return incorrectly? It could return incorrect result if there are some rows with table.id = NULL . count(table_field) returns the number of rows, where table_field is not NULL. 3) In general, is this an appropriate use-case for using window functions? It does the job = it is an appropriate use-case for using window functions. If this query causes performance degradation and you do not need the exact count of rows, it is better to use something from http://wiki.postgresql.org/wiki/Count_estimate or google for fast postgresql count.
[GENERAL] FATAL: index contains unexpected zero page at block
Hi list, After planned rebooting the server dropped the database server PostgreSQL 8.4 When it start the server writes to the log: Nov 14 18:24:01 uno postgres84[24207]: [1-1] user=,db= LOG: could not bind IPv6 socket: Cannot assign requested address Nov 14 18:24:01 uno postgres84[24207]: [1-2] user=,db= HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry. Nov 14 18:24:01 uno postgres84[24208]: [2-1] user=,db= LOG: database system was interrupted while in recovery at 2012-11-14 18:18:42 MSK Nov 14 18:24:01 uno postgres84[24208]: [2-2] user=,db= HINT: This probably means that some data is corrupted and you will have to use the last backup for re covery. Nov 14 18:24:01 uno postgres84[24208]: [3-1] user=,db= LOG: database system was not properly shut down; automatic recovery in progress Nov 14 18:24:01 uno postgres84[24208]: [4-1] user=,db= LOG: redo starts at 237B/78806EC8 Nov 14 18:24:04 uno postgres84[24208]: [5-1] user=,db= LOG: unexpected pageaddr 237B/50A1E000 in log file 9083, segment 144, offset 10608640 Nov 14 18:24:04 uno postgres84[24208]: [6-1] user=,db= LOG: redo done at 237B/90A1DF98 Nov 14 18:24:04 uno postgres84[24208]: [7-1] user=,db= LOG: last completed transaction was at log time 2012-11-10 10:26:28.484922+04 Nov 14 18:24:04 uno postgres84[24208]: [8-1] user=,db= FATAL: index 316879235 contains unexpected zero page at block 264 Nov 14 18:24:04 uno postgres84[24208]: [8-2] user=,db= HINT: Please REINDEX it. Nov 14 18:24:04 uno postgres84[24207]: [2-1] user=,db= LOG: startup process (PID 24208) exited with exit code 1 Nov 14 18:24:04 uno postgres84[24207]: [3-1] user=,db= LOG: aborting startup due to startup process failure When I trying to run in a single mode: postgres --single -D /opt/postgresql/data -P postgres user=,db= FATAL: index 316879235 contains unexpected zero page at block 264 user=,db= HINT: Please REINDEX it. Please tell me what can I do to recover? -- Cheers, Dmitriy
Re: [GENERAL] High SYS CPU - need advise
On 11/14/12 1:34 PM, Vlad wrote: thanks for your feedback. While implementing connection pooling would make resources utilization more efficient, I don't think it's the root of my problem. Most of the connected clients are at IDLE. When I do select * from pg_stat_activity where current_query not like '%IDLE%'; I only see several active queries at any given time. what about during these spikes? -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How do query optimizers affect window functions
Thanks a lot, Jeff! On Wed, Nov 14, 2012 at 11:59 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Wed, Nov 14, 2012 at 12:12 AM, Tianyin Xu t...@cs.ucsd.edu wrote: Hi, Postgresql, I want to understand how the query optimizers affect the output of the window functions. Use EXPLAIN. One is an index scan, one is a bitmap scan. They return rows in a different order. .. I don't understand why the results are different. Intuitively, the queries show return the same results no matter what plan the optimizer choose. My intuition is that the query should refuse to run at all, because the results are order dependent and you haven't specified an ordering. What do you mean by refused to run? You mean we have to specify the order when using the window functions? Could you explain more? Thanks! Cheers, Jeff -- Tianyin XU, http://cseweb.ucsd.edu/~tixu/
Re: [GENERAL] FATAL: index contains unexpected zero page at block
user=,db= FATAL: index 316879235 contains unexpected zero page at block 264 user=,db= HINT: Please REINDEX it. Please tell me what can I do to recover? Did you try re-building the index ? Re-Indexing or re-creating an new index should resolve this. Regards, VBN
Re: [GENERAL] How do query optimizers affect window functions
On Wed, Nov 14, 2012 at 2:16 PM, Tianyin Xu t...@cs.ucsd.edu wrote: Thanks a lot, Jeff! On Wed, Nov 14, 2012 at 11:59 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Wed, Nov 14, 2012 at 12:12 AM, Tianyin Xu t...@cs.ucsd.edu wrote: Hi, Postgresql, I want to understand how the query optimizers affect the output of the window functions. Use EXPLAIN. One is an index scan, one is a bitmap scan. They return rows in a different order. .. I don't understand why the results are different. Intuitively, the queries show return the same results no matter what plan the optimizer choose. My intuition is that the query should refuse to run at all, because the results are order dependent and you haven't specified an ordering. What do you mean by refused to run? I mean that it could throw an error. Kind of like the way this currently throws an error: select b, sum(b) from foo; ERROR: column foo.b must appear in the GROUP BY clause or be used in an aggregate function. To be clear, I am not saying that it does do this (clearly it does not), just that my intuition is that it should do this. You mean we have to specify the order when using the window functions? Could you explain more? Not all uses of window functions have results that depend on the order. If you only use partition by, there would be no reason to force an ordering, for example. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] FATAL: index contains unexpected zero page at block
Dmitriy Tyugaev dtyug...@gmail.com writes: Nov 14 18:24:04 uno postgres84[24208]: [6-1] user=,db= LOG: redo done at 237B/90A1DF98 Nov 14 18:24:04 uno postgres84[24208]: [7-1] user=,db= LOG: last completed transaction was at log time 2012-11-10 10:26:28.484922+04 Nov 14 18:24:04 uno postgres84[24208]: [8-1] user=,db= FATAL: index 316879235 contains unexpected zero page at block 264 Nov 14 18:24:04 uno postgres84[24208]: [8-2] user=,db= HINT: Please REINDEX it. Hm. Apparently it's hitting the zero page while trying to clean up an incomplete index page split after reaching the end of WAL. This is not good --- it means your filesystem failed to retain data that it claimed had been written to disk safely. You should look into fsync-related system settings after you get out of the immediate problem. As far as getting out of the immediate problem is concerned, I think you have little option except to use pg_resetxlog. This will mean the loss of whatever actions it was trying to replay, which may well mean that you end up with data corruption (not just index corruption). I'd suggest a dump and reload after you get the server to start. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How do query optimizers affect window functions
Jeff Janes jeff.ja...@gmail.com writes: On Wed, Nov 14, 2012 at 2:16 PM, Tianyin Xu t...@cs.ucsd.edu wrote: What do you mean by refused to run? I mean that it could throw an error. Kind of like the way this currently throws an error: select b, sum(b) from foo; ERROR: column foo.b must appear in the GROUP BY clause or be used in an aggregate function. To be clear, I am not saying that it does do this (clearly it does not), just that my intuition is that it should do this. The SQL standard says that underspecified window ordering gives you implementation-dependent results, but not an error. (Their use of implementation-dependent basically means unspecified.) I think this is a fairly reasonable definition, since in many practical cases it would be hard for the parser to tell whether the window ordering was nailed down sufficiently to give a unique result, anyway. (Even if we required you to give an ORDER BY for each column, there are examples such as zero/minus-zero in float8 where that doesn't produce a unique ordering. And such a requirement would just be a pain in the rear a lot of the time.) It's also consistent with what you get if, for example, you use LIMIT without an ORDER BY or with an ORDER BY that doesn't constrain the results to a unique row ordering. In practice it's on the user to be sure he's nailed down the row ordering sufficiently to get the results he wants in these cases. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] SSDs - SandForce or not?
On 15/11/12 01:42, Shaun Thomas wrote: On 11/14/2012 01:11 AM, Toby Corkindale wrote: I'm wondering which type of SSDs would be better for use with PostgreSQL. Hi Shaun, thanks for your info. I should probably have made it clear that I was curious to know how the compression stuff affected the situation, aside from the other variables. I'm aware of the other issues you've mentioned, but I'm sure it's helpful for other people reading this list to see them. You make a good point about the TOAST tables, I hadn't thought of that. (My data is mostly numeric here though) thanks, Toby 1. While the controller may or may not have an impact, the presence of an on-board super-capacitor will have more. SSDs should be considered malignant devices that will go out of their way to destroy your data, unless they have one of these. 2. Workload on a compressible system like PG is generally dependent on your data sets. If you have lots of TOAST data, which is already compressed, you get no benefit. If your use case doesn't show a lot of random writes, optimizing for them is of questionable value. 3. SSDs also exist as effectively raw NVRAM, in the form of PCIe cards. These cards come in several varieties, and these days, can be mounted in external PCIe chassis in hot-swap bays much like more conventional drive enclosures. Some of these use a kernel-level driver over a proprietary controller, using neither Sandforce or anything else. They are also close to an order of magnitude faster than an SSD because they discard the SATA/SCSI bus entirely. 4. SSDs do have limited write cycles, and whether it's write leveling or drive compression to reduce writes on the actual NVRAM chips, if you honestly have a high write load, you're better off with whatever card reports the highest longevity of the relatively scarce write cycles per cell. 5. You're more likely to get performance improvements pursuing SLC (single layer chips) versus cheaper MLC (multi-layer) for writing, because the controller doesn't have to mask writes to the proper layer. Basically, there's way more involved here than Sandforce vs. Others. Or even Compressible vs. Not. SSDs are still a pretty Wild West kind of thing, and you've got a lot more variables to consider than with standard spindles. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Access disk from plpython
On 11/15/2012 01:08 AM, Adrian Klaver wrote: On 11/14/2012 09:03 AM, Jeff Janes wrote: On Wed, Nov 14, 2012 at 8:44 AM, Rhys A.D. Stewart rhys.stew...@gmail.com wrote: Greetings all, having a permission issue with writing a file using plpython to a local folder, changed permissions to everyone read and write and even changed the owner to postgres. but no joy, any suggestions? plpython is a trusted langauge. That means pgsql trusts it not to allow you to break out of the sandbox and interact directly with the file system or the database server internals. If you need to do those things, you can use plpythonu. My understanding is that of Postgres 7.4 plpython does not exist. In other words only the untrusted version is available. ... and that's because Python's design - in particular, the introspection features - means the the restricted mode wasn't particularly restricted. See http://docs.python.org/2/library/restricted.html , http://wiki.python.org/moin/SandboxedPython . -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [GENERAL] Using Postgresql 9.2 on windows 7 and windows vista
On 11/15/2012 01:19 AM, D T wrote: Hi, I am going to use PostgreSQL 9.2 with my application which runs on Windows 7/WIndows Visa 64 bit OS. Since these platforms are not officially supported by PostgreSQL, can i go ahead and use PostgreSQL on these platform? Not officially supported according to what document? Links? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] File system level copy
This is PITR, right? I don't want to use this way because I'm not allowed to change the configuration parameter of database server. I just want to use some whole DB copy to restore db3 in another machine. And I don't want to use pg_dump because I think db3 is so large that pg_dump will probably have bad performance. -Original Message- From: Albe Laurenz [mailto:laurenz.a...@wien.gv.at] Sent: Wednesday, November 14, 2012 6:49 PM To: Wang, Hao; pgsql-general@postgresql.org Subject: RE: [GENERAL] File system level copy Hao Wang wrote: I installed PostgresSQL-8.3 on my linux machine. The cluster directory is /usr/local/data and I created three databases named db1, db2, and db3. db1 is in the default tablespace 'pg_default'. db2 is in '/home/tablespace/space1/' and db3 is in '/home/tablespace/space2/'. I want to copy the cluster directory and the db3 tablespace folder('/home/tablespace/space2/') without stopping the database server. Then I want to use the cluster directory and db3's tablespace in another linux machine to recover 'db3' database. Does this way work? If not, why? First, you need a correct backup for recovery. Before copying, run pg_start_backup, and pg_stop_backup afterwards. Then you need to have recovery.conf and WAL archives (or be lucky and all WALs are still in pg_xlog). WAL contains changes to all databases in the cluster, so you cannot recover only one database, you'll have to recover them all. Read http://www.postgresql.org/docs/current/static/continuous-archiving.html for background and details. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general