Re: [GENERAL] yet another image: db or filesystem ? question
a) if the images are in the filesystem (and also under the web root), no problem. Just use b) if the images are in the database... You use and setup URL rewriting in your webserver so that a HTTP request on "images/filename.jpg" becomes "serve_image?fname=filename.jpg" with serve_image being a php, jsp, whatever script. This way when the performance starts to suck too much you can simply serve images off the filesystem very easily, just remove the URL rewriting. Please use your primary key (an integer) as filename, don't let the users name files on your filesystem !! If you are trying to control user access rights to files, it is much faster to use an authenticator plugin, or lighttpd's mod_sec_download. In both cases the web application is only invoked to decide if the user can access the image or not ; it does not actually handle the (potentially large) file. It is the webserver that does it, and webservers are optimized for this purpose. If it's for an intranet where you don't expect lots of traffic, though, the PHP echoing a bytea it got from postgres works well... do I need to create a temporary file first in order to use the tag? Or is there some other HTML way of doing this? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] optimizing postgres
The parsing has turned out to be pretty intense. It takes about 10-20 minutes for any project. When we are parsing data, it really slows down the site's response. I tested serving static webpages from apache, endless loops in php , but the choke point seems to be doing any other query on postgres when constructing a php page during parsing. Do you do lots of INSERTs without explicitly using transactions ? You also need to run EXPLAIN ANALYZE on your most frequent queries. It is very possible the slowdown is just from a forgotten index. As an example, the original designer specified separate tables for each project. Since they were all the same structure, I suggested combining them into a single table with a project_id column, but he said it would take too long to query. I was suspicious, but I went with his design anyway. From the small size of the dataset I don't see a justification for this... It turned out he was right for our current set up. When I needed to empty the project table to re-parse data, doing a cascading delete could take up to 10 minutes! I cut re-parsing time in half by just dropping the table and creating a new one. Which was an okay thing to do, since the table only belonged to one project anyway. But I hate to think how long it would have taken to do a delete, cascading to child tables, if all the project data was in a single table. That's probably because you forgot to create an index on the referenced column. They are not created automatically. Since I'm not an expert in Postgres database design, I'm assuming I've done something sub-optimal. Are there some common techniques for tuning postgres performance? Do we need beefier hardware? Or is it a problem with how PHP or apache pools connections to the database? It depends on a lot of stuff, but the first thing is to run EXPLAIN ANALYZE on your queries and post the results here. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Polymorphic delete help needed
O>> Here is how I implemented something very similar (in PHP) : - "Node" class and several derived classes. - "nodes" table which contains the fields for the base class with node_id as a PK and a field which indicates the class - "nodes_***" tables which contain the extra fields for the derived class "***", having node_id as a primary key. As you see this is very similar to what you got. All the "nodes_***" tables have : node_id PRIMARY KEY REPERENCES nodes( id ) ON DELETE CASCADE Your method is not exactly what I am doing because I do not have the "nodes" table. I have only the "nodes_***" tables. But, I believe your approach has many advantages. OK, I see. The advantage of my approach is that : - all common fields that are shared by all derived classes are stored in the "base" nodes table - the PK on this table uniquely identifies any instance, whatever its object class - the tables holding the derived classes's extra members (ie columns) merely refer to it via a FK which is clean When you want to get a Node instance by its id, the ORM grabs the row in the main table, which contains the type information, then instantiates the right object class, grabs the extra row in the extra tables (which shares the same id), and gives this data to the instantiated object which then uses it to populate its member variables. Therefore, there can be no duplicates, and you only need to look into one table to know if an object with a specific id exists or not, or to delete it. Say Nodes can be, for instance, Photos, or Albums, or Articles ; say an Album contains Photos and Articles and you want to display its contents : - select from relations r JOIN nodes n ON n.id=r.child_id WHERE r.parent_id = (id of album) => gets the ids of children along with extra info stored in the relations table (like sort order, etc) - scan the result set and group it by object class - for each object class : - grab the rows into the extra tables for the ids of objects of this class using SELECT WHERE id IN (...) - instantiate and populate So if you have an Album with 5 Articles and 10 Photos, you do 3 queries - - get 15 rows from Nodes JOIN relations - get 5 rows from nodes_articles - get 10 rows from nodes_photos It's pretty efficient. I believe you can tweak Rails' ORM into doing that. FYI it's a PHP ORM that I wrote over the course of various projects. Unfortunately PHP is very bad at generic/meta programming, so the implementation is quite convoluted, and I did not opensource it because the documentation would be quite an effort to write ;) Rails has an inheritance ability but they do it differently. They simply have a "nodes" table with all the "nodes_***" tables smashed in to it. I did not like that approach at all. Hm, another reason why I don't really like the Rails approach... do they at least use unique IDs ? Please say yes ;) But doing what you are doing, I believe I can very nicely fit into Rails and (obviously) PostgreSQL. Plus, your suggestion of moving entries to a "trash" bin seems very wise. The first problem that creates is it makes it hard to do a constraint on the name/id pair. I thought about writing a function that would Is this ClassName / id only found in the "main" table or is every FK implemented as a ClassName / id pair ? The ClassName / id pair is found only in the relationships table. There are two instances of it however: parent and child. I believe duplicating the ClassName and storing it everywhere to be bad. With my approach you only need to use the id in reference since two instances of the same base class cannot have the same id regardless of their class, and you only need the id to instantiate a row, whatever its class. If you want to specify that a class instance can only have children of specific classes (or their subclasses), for example a FriendList can only contain instances of People, or whatever, you can use a constraint trigger which will check the class of the referenced row against a table describing the allowed parent/child combinations. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Polymorphic delete help needed
I am doing a project using Ruby On Rails with PostgreSQL as the database. I have not seen the term polymorphic used with databases except with Rails so I will quickly describe it. Instead of holding just an id as a foreign key, the record holds a "type" field which is a string and an id. The string is the name of the table to which the id applies. (That is slightly simplified). Here is how I implemented something very similar (in PHP) : - "Node" class and several derived classes. - "nodes" table which contains the fields for the base class with node_id as a PK and a field which indicates the class - "nodes_***" tables which contain the extra fields for the derived class "***", having node_id as a primary key. As you see this is very similar to what you got. All the "nodes_***" tables have : node_id PRIMARY KEY REPERENCES nodes( id ) ON DELETE CASCADE So when I delete a Node, the derived class records are automatically deleted in the auxiliary tables. Since there can be only one distinct node_id per node, you can put ON DELETE CASCADE safely. Now, for your tree-thing, the fact that references are polymorphic isn't important since they all refer to the same main "nodes" table. However as soon as you say "when a node no longer has any relations pointing to it", then you get to choose between various flavors of garbage collection and reference counting... Personally I like to move the deleted or orphaned rows to a "trash" folder so they can be recovered if the user did delete the wrong node for instance. Once in a while i "empty trash". The first problem that creates is it makes it hard to do a constraint on the name/id pair. I thought about writing a function that would Is this ClassName / id only found in the "main" table or is every FK implemented as a ClassName / id pair ? ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] [pgsql-general] In memory tables/databases
Why not have a table type that writes no WAL and is truncated whenever postgres starts? Such a table could then be put in a ramdisk tablespace and there would be no transaction atomicity repercussions. Is there something I'm missing? Is this not in the TODO (if not already scheduled for next version ?) Check ALTER TABLE SET PERSISTENCE ... ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] a JOIN on same table, but 'slided over'
OK, check... test=> CREATE TABLE test (id INTEGER PRIMARY KEY); test=> INSERT INTO test SELECT random()*5 + n*10 FROM generate_series( 1,10 ) AS n; test=> SELECT * FROM test LIMIT 10; id - 11 23 31 41 52 63 70 85 94 103 test=> ANALYZE test; ANALYZE -- Self Join 1 test=> EXPLAIN ANALYZE SELECT t1.id AS current_id, t2.id AS next_id FROM test t1, test t2 WHERE t2.id = ( SELECT min(id) FROM test AS t3 WHERE t3.id > t1.id ) ORDER BY t1.id; QUERY PLAN --- Sort (cost=26703.19..26953.19 rows=10 width=8) (actual time=5240.392..5271.529 rows=9 loops=1) Sort Key: t1.id -> Hash Join (cost=2691.00..18398.37 rows=10 width=8) (actual time=106.588..5179.737 rows=9 loops=1) Hash Cond: ((subplan) = t2.id) -> Seq Scan on test t1 (cost=0.00..1441.00 rows=10 width=4) (actual time=0.013..34.782 rows=10 loops=1) -> Hash (cost=1441.00..1441.00 rows=10 width=4) (actual time=106.420..106.420 rows=10 loops=1) -> Seq Scan on test t2 (cost=0.00..1441.00 rows=10 width=4) (actual time=0.007..43.077 rows=10 loops=1) SubPlan -> Result (cost=0.03..0.04 rows=1 width=0) (actual time=0.023..0.023 rows=1 loops=19) InitPlan -> Limit (cost=0.00..0.03 rows=1 width=4) (actual time=0.021..0.022 rows=1 loops=19) -> Index Scan using test_pkey on test t3 (cost=0.00..1029.59 rows=3 width=4) (actual time=0.020..0.020 rows=1 loops=19) Index Cond: (id > $0) Filter: (id IS NOT NULL) Total runtime: 5295.677 ms -- Self Join 2 test=> set enable_hashjoin TO 0; test=> EXPLAIN ANALYZE SELECT t1.id AS current_id, t2.id AS next_id FROM test t1, test t2 WHERE t2.id = ( SELECT min(id) FROM test AS t3 WHERE t3.id > t1.id ) ORDER BY t1.id; QUERY PLAN --- Sort (cost=30806.48..31056.48 rows=10 width=8) (actual time=2876.249..2903.011 rows=9 loops=1) Sort Key: t1.id -> Merge Join (cost=9745.82..22501.66 rows=10 width=8) (actual time=2547.830..2820.347 rows=9 loops=1) Merge Cond: (t2.id = "inner"."?column2?") -> Index Scan using test_pkey on test t2 (cost=0.00..2828.26 rows=10 width=4) (actual time=0.035..67.747 rows=10 loops=1) -> Sort (cost=9745.82..9995.82 rows=10 width=4) (actual time=2547.779..2582.889 rows=10 loops=1) Sort Key: (subplan) -> Seq Scan on test t1 (cost=0.00..1441.00 rows=10 width=4) (actual time=0.060..2487.728 rows=10 loops=1) SubPlan -> Result (cost=0.03..0.04 rows=1 width=0) (actual time=0.023..0.023 rows=1 loops=10) InitPlan -> Limit (cost=0.00..0.03 rows=1 width=4) (actual time=0.021..0.022 rows=1 loops=10) -> Index Scan using test_pkey on test t3 (cost=0.00..1029.59 rows=3 width=4) (actual time=0.020..0.020 rows=1 loops=10) Index Cond: (id > $0) Filter: (id IS NOT NULL) Total runtime: 2923.804 ms -- DISTINCT ON test=> EXPLAIN SELECT DISTINCT ON (t1.id) t1.id AS current_id, t2.id AS next_id FROM test t1 JOIN test t2 ON t2.id > t1.id ORDER BY t1.id, t2.id; QUERY PLAN - Unique (cost=729806679.75..746473346.41 rows=10 width=8) -> Sort (cost=729806679.75..738140013.08 rows=33 width=8) Sort Key: t1.id, t2.id -> Nested Loop (cost=0.00..100028973.00 rows=33 width=8) -> Seq Scan on test t1 (cost=0.00..1441.00 rows=10 width=4) -> Index Scan using test_pkey on test t2 (cost=0.00..583.61 rows=3 width=4) Index Cond: (t2.id > t1.id) (7 lignes) This one takes much longer (I interrupted it). -- Using a function CREATE TYPE test_type AS ( current_id INTEGER, next_id INTEGER ); CREATE OR REPLACE FUNCTION testfunc( ) RETURNS SETOF test_type LANGUAGE plpgsql AS $$ DECLARE _rowtest_type; BEGIN _row.current_id = NULL; FOR _row.next_id IN SELECT
Re: [GENERAL] Ordering in SELECT statement
On Jun 26, 2007, at 14:41 , [EMAIL PROTECTED] wrote: and what I need is the following ("old fashion", that is, the "SPACE" is another character whose ASCII value is before any other LATIN letter's!!) AB CD AB EF ABAB ABD E What you don't want : peufeu=> SELECT column1 FROM (VALUES ('ABCD'), ('A BCD'), ('abcd'), ('ABcd'), ('AB'), ('AbC d')) AS foo ORDER BY column1; column1 - AB abcd AbC d ABcd ABCD A BCD (6 lignes) What you want : peufeu=> SELECT column1 FROM (VALUES ('ABCD'), ('A BCD'), ('abcd'), ('ABcd'), ('AB'), ('AbC d')) AS foo ORDER BY string_to_array( column1, ' ' ); column1 - A BCD AB AbC d abcd ABcd ABCD (6 lignes) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] a JOIN on same table, but 'slided over'
Now, I'd like to make a JOIN-ed query of that table with itself, so that I'd get rows paiwise: every row containing data from *two* rows of the original TEST table so, that those data come from rows of consequtive ID's - not neceserly (depending on the TEST table contents) continuesly consequtive. Like: SELECT * from view_of_test; id | id+X | thread | thread+X | info | info+X +--++--+---+- 2 |3 |763 | 764 | A | B 3 |6 |764 |5 | B | C 6 |8 | 5 |88946 | C | Cats 8 |9 | 88946 |69315 | Cats | Eifel 9 | 10 | 69315 |96379 | Eifel | G - Is there an SQL construct to get it? I would use a plpgsql procedure, select all the rows ORDER BY id, keep the current and last row in a variable, and that's it. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] minimum function
Check out greatest() and least()... (I think ;) On Sat, 23 Jun 2007 18:35:36 +0200, Raymond O'Donnell <[EMAIL PROTECTED]> wrote: On 23/06/2007 17:17, Gunther Mayer wrote: Any way I can achieve that on one line? I.e. I want it simpler than IF arg1 < arg2 THEN RETURN arg1; ELSE RETURN arg2; END IF; That looks pretty simple already, but why not enclose it in a pl/pgsql function - something like: create function minimum(a1 integer, a2 integer) returns integer as $$ begin if a1 < a2 then return a1; else return a2; end if; end; $$ language plpgsql; - and then you can call it in one line: select minimum(5, 4); Ray. --- Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland [EMAIL PROTECTED] --- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] [PGSQL 8.2.x] INSERT+INSERT
I think it would be greatly helpful if the insert...returning could be seen as a "select" statement and, thus, being usable in the way I have described. I suspect that the insert...returning is actually implemented as an inser plus a select. You can create a function which does the INSERT RETURNING and returns the result, and use this function in a subsequent INSERT. You can also write, in plpgsql : FOR a,b,c IN SELECT ... LOOP INSERT INTO table1 (...) VALUES (a,b) RETURNING your_id INTO _var; INSERT INTO table2 (...) VALUES (c , _var ); END LOOP ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Accent insensitive search
Hi. I have a few databases created with UNICODE encoding, and I would like to be able to search with accent insensitivity. There's something in Oracle (NLS_COMP, NLS_SORT) and SQL Server (don't remember) to do this, but I found nothing in PostgreSQL, just the 'to_ascii' function, which AFAIK, doesn't work with UNICODE. The easiest way is to create an extra column which will hold a copy of your text, with all accents removed. You can also convert it to lowercase and remove apostrophes, punctuation etc. Said column is kept up to date with a trigger. Python is suitable for this (use unicodedata.normalize). Keeping a copy of the processed data will speed up search versus WHERE remove_accents( blah ) = 'text', even with a function index. Note that this function could be written in C and use a table on the first 64K unicode symbols for speedup. See attached file. create_ft_functions.sql Description: Binary data ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] [pgsql-advocacy] [PERFORM] [ADMIN] Postgres VS Oracle
2. Oracle, Microsoft, and IBM have a "lot" to fear in the sense of a database like PostgreSQL. We can compete in 90-95% of cases where people would traditionally purchase a proprietary system for many, many thousands (if not hundreds of thousands) of dollars. Oracle also fears benchmarks made by people who don't know how to tune Oracle properly... ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] PHP sucks!! - was: persistent db connections in PHP
I wouldn't call Python *strongly* typed, but I do know what you mean. I think. It is strongly typed (string + int = error), just not statically typed (but you saw what I mean ;) "PHP: very loosely typed, does whatever it wants" yeah php got a life of its own! sure be a lazy programmer and blame sql injection etc crap on php or try http://www.xdebug.org/ and others. No need. I either use pg_query_params() which automagically handles all quoting, or an ORM which does the same. There is no reason to include strings in SQL statements except laziness. MySQL does not have a mysql_query_params() for PHP, so you have to write one, it's pretty simple. Python's (and perl) strength in this respect is that they make it easier to use the safe solution, ie : query( "sql with ? or $1 or %s", arg, arg, arg ) PEAR::DB is horrendous. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] PHP sucks!! - was: persistent db connections in PHP
I've seen similar negative comments before on this list about PHP, and I'm curious to know what informs them. Maybe the fact that, when I coded a database object/form library, it took me LONGER to ensure that empty strings / NULLs / zero valued floats and integers / etc were handled correctly than it took me to code the entire thing ? I use PHP quite a bit, and though I wouldn't claim to be any sort of an expert, I like it a lot: IMHO it's powerful, fast and easy to use. Mind you, I came to PHP from an ASP/VBscript background, so anything would have been an improvement. LOL. Yeah. Having said that, the main gripes I would have with PHP are (i) variables aren't strongly typed, which can bite you unless you're careful, and (ii) you don't have to declare variables before using them, which can also cause trouble - in VBScript you have "option expicit" which forces you to declare your variables; I'd like to see something similar in PHP. Apologies if this is off-topic for this list, but I'm curious as to why others reckon that PHP sucks. It's a matter of mindset. PHP and Postgres have really opposite mindsets. Python is a lot more similar to Postgres, for instance : - Postgres, Python : strongly typed, throws an error rather than doing funny stuff with your data, your code does what you think it should do. - PHP : very loosely typed, does whatever it wants, your code will surprise you... To stay on topic : - There is no way to know if your persistent connection is a new connection (so you might want to setup and issue a few PREPARE statements) or an already "second hand" connection. - There is no way to reconnect a dead persistent connection (if you reboot your database server, you'll have to restart all your webservers/PHP processes manually) - pg_query() returns everything as strings (python/psycopg2 converts postgres types to python types, DATE becomes datetime.datetime, arrays too, and you can customize it) - handling of NULL, 0, '', array(), 0.0 is horrendous - NO DAMN UNICODE SUPPORT WTF ? (python/psycopg does the right thing) - register_shutdown_function() isn't always called - the object model is horrible - no closures or lexical scoping, no first class classes or function, strings aren't objects - built by accretion rather than thoughtful design - etc, etc Still, I use it, and it works. It is always poised to bite though. You have to wear chain mail pants with PHP. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] What O/S or hardware feature would be useful for databases?
Seems CPU makers currently have more transistors than they know what to do with, so they're adding cores and doing a lot of boring stuff like SSE2, SSE3, SSE4, etc. SSE(n) isn't useless since it speeds up stuff like video encoding by, say, a few times. For databases, I'd say scatter/gather IO, especially asynchronous out-of-order read with a list of blocks to read passed to the OS. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] INSERT ... RETURNING in v8.2
Holy Crud! you mean to tell me I can replace: insert into table(string) values(('one'),('two'),('three')); select idx from table where string in ('one','two','three'); Yes. A smart ORM library should, when you create a new database object from form values, use INSERT RETURNING to grab all the default values (SERIALs, DEFAULTs, trigger-generated stuff etc). This is much more elegant than digging to find the sequence name to currval() it ! I think this feature is priceless (but it would be even better if I could do INSERT INTO archive (DELETE FROM active WHERE blah RETURNING *) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] allocate chunk of sequence
The chunk to be allocated is not the same size, so to set the increment value will not help. I'm sometimes not that subtle, so I'd just use a BIGINT sequence. Think about the largest chunk you'll ever get (probably less than 2^30 rows, yes ?), set this sequence increment to this very high value (this will still let you have 2^32 chunks at least), and make each chunk be (big_sequence_value + N). I'm not sure how the nextval function to handle this internally, if it has to read and update the sequence object. Does it use some kind of lock ? Otherwise the problem mentioned here should happen to nextval function also. Yes it takes a lock, but the lock is held for a very short time (just increment the value and release the lock). In InnoDB, the AUTO_INCREMENT lock is held for the entire duration of the SQL statement, including any triggers that might fire. This means if you have ON INSERT triggers which take some time to process, INSERT concurrency is horrendous. Not so with Postgres sequences. The lock command does not work for the sequence, so in my example, I have to use a dummy table for lock to work. Another thought is to have the nextval function takes an extra argument for the increment value (instead of the default increment value). If you really really want your ids to be contiguous, you can't use the bigint sequence above. So : If you do not want to spoil Postgres' good concurrency on inserts, you'll want to avoid doing I/O while holding a lock. Burning CPU while holding a lock is OK unless you want to use all your cores, but waiting for IO while locked is BAD. So, here's the deal : - the hard way is to patch nextval() for an extra argument - the easy way is to write a plpgsql function doing this : - you have a chunk of N rows to insert. - get lock - X := nextval() - call nextval() (N-1) times in a plpgsql FOR loop - release lock - do your INSERTs, using X ... (X+N-1) as ids (generate_series() is your friend) Thus you only lock while burning CPU calling nextval(). Not that bad. The IO-intensive INSERT isn't under the lock so it can be concurrent. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] persistent db connections in PHP
Hello all! I'm working on a PHP site using Postgres as a back-end. I have an include at the top of each page that runs the pg_connect function. I'm implementing some temporary tables, which I understand are destroyed automatically at the end of the session. It seems to me that - Temp tables are local to a session (ie. a connection) - Persistent connections in PHP creates a pool of connections. When you pg_pconnect() you get a connection from the pool. You have no control over which connection you get, so you don't really know if the temp tables have been created by a previous connection or not. So, the best solution is to create the temp tables inside a transaction, with ON COMMIT DROP, do your work, and let them disappear. This will save you from getting temp tables already filled by stuff from the previous webpage which got this connection. For the same reason, pg_pconnect() should ALWAYS be followed by pg_query( "ROLLBACK" ). If you put rollback in register_shutdown_function(), be aware that it will NOT be called if your script is nuked for trying to malloc more than the allowed memory limit, or for other brutal reasons... Also note that PHP, being PHP, sucks, and thusly, will not reconnect persistent connections when they fail. You have to kick it a bit. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Postmaster processes taking all the CPU
I promised that I will get back to the group with the reason. Well, of course was a query :). I do use a search engine file system based(lucene) that will take any desired entity saved into the database and find the primary keys and then do a select * from entity where id is in (:ids)If I get too many matches(3000-4000)... that will delay my postmaster and that postmaster associated with the query would take 10-15 minutes to process that query. So, now I limit that to 500, anything bigger than that will ask user to refine the query.However this whole investigation made me observe some things. On my server box 7.4.7 I have some queries that are executing pretty slow 1.2-1.5secs it's a lot for a query that goes thru 5000 records. On my local environment 8.1 the same queries on similar table size executes much faster like 200-400ms. Do you know if this is a known issue or my dev box is better than my server box? PG developers are not Microsoft, hence, every version of PG gets meaner and faster instead of slower and more bloated. The speedup you see might simply be due to developer competence... Anyway, if you use Lucene for full text search (I didn't try it, I tried Xapian which seems similar) : Users don't care about searches returning 5000 results ; they usually only look at the first pages. It's the sorting that is important, and this should be done by Lucene itself (sorting on phrase relevance, for instance, is a search engine job). So, you are right in limiting the results to 500, it could even be lower. The role of the search engine is to sort the good results at the top of the list so users are happy. Do you really display 500 results ? If you paginate the results, the users will probably never go past page3. Fetching a lot of results is therefore a waste of database power : just fetch the ids from Lucene and grab the results from Postgres, but only for the page you are going to display. If you use Postgres to do the sorting (example : lucene matches the records and you want to order them by a field which is not stored in Lucene but in Postgres) obviously this advice does not apply. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] pointer to feature comparisons, please
The DELETE should block, no? Why ? Foreign keys put an ON DELETE trigger on the referenced table Foreign keys that silently, automatic DELETE records? Did I read that correctly? Isn't that the point of ON DELETE CASCADE ? besides checking the referencing column on insert/update... If you just implement a constraint, you only get half the functionality. But when I define a FK *constraint*, that's all I *want*! When you add a REFERENCE foo( bar ) foreign key and don't specify ON DELETE clause, I believe you get ON DELETE NO ACTION, which means the referenced row can't be deleted if there are referencing rows... so when you have a FK constraint, it actually constrains two tables, and pg uses share locking to ensure race conditions can't cause an inconsistent database. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] inner join problem with temporary tables
This message appears: ERROR: relation "t_arti" does not exist SQL state: 42P01 Context: SQL function "test1" Why it does not work??? thanks for your help Because plpgsql functions are compiled on first execution and all queries are then prepared. All tables are referenced directly in prepared statements, not by name. Any prepared statement that refers to dropped tables (even dropped temp tables) is thus unfit for consumption. This allows queries in plpgsql functions to be extremely fast, but it isn't smart enough (yet) to recompile functions when a table the function depends on is dropped. Just disconnect and reconnect, all prepared plans will be lost, and it will work. Or issue your queries directly instead of using a function. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] pointer to feature comparisons, please
On Thu, 14 Jun 2007 00:09:20 +0200, Ron Johnson <[EMAIL PROTECTED]> wrote: On 06/13/07 16:59, PFC wrote: Isn't it *supposed* to mis UNcommitted changes from other transactions? Well, if the "uncommited change" is a DELETE of the row that allowed the constraint check to pass, then when this delete is commited, your data is no longer consistent. The DELETE should block, no? Why ? Foreign keys put an ON DELETE trigger on the referenced table besides checking the referencing column on insert/update... If you just implement a constraint, you only get half the functionality. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] pointer to feature comparisons, please
Isn't it *supposed* to mis UNcommitted changes from other transactions? Well, if the "uncommited change" is a DELETE of the row that allowed the constraint check to pass, then when this delete is commited, your data is no longer consistent. Consider this : CREATE TABLE A( attributes INT[], CHECK( is_valid_attributes( attributes )) ) CREATE TABLE valid_attributes ( attribute_id INTEGER ) You want to check that A.attributes is an array of values, the only allowed values being stored in valid_attributes table. If you delete a row in valid_attributes, many rows in A can become invalid unless you use some form of trigger on valid_attributes which would start to look a lot like a foreign key ON DELETE trigger. If you insert stuff in A while concurrently deleting a row in valid_attributes, you have problems. This is why foreign key checks take share locks on referenced tables... ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Numeric performances
If you try it with max() you'd likely get less-surprising answers. So it was in fact the type conversions that got timed. Damn. I got outsmarted XDD Rewind : CREATE TEMPORARY TABLE test AS SELECT a::FLOAT AS f, (a::NUMERIC)*100 AS n, a::INTEGER AS i, a::BIGINT AS b FROM generate_series( 1,10 ) AS a; Max and Sort will use comparisons : SELECT max(i) FROM test; SELECT max(b) FROM test; SELECT max(f) FROM test; SELECT max(n) FROM test; Temps : 42,132 ms Temps : 59,499 ms Temps : 58,808 ms Temps : 54,197 ms SELECT * FROM test ORDER BY i LIMIT 1; SELECT * FROM test ORDER BY b LIMIT 1; SELECT * FROM test ORDER BY f LIMIT 1; SELECT * FROM test ORDER BY n LIMIT 1; Temps : 58,723 ms Temps : 60,520 ms Temps : 53,188 ms Temps : 61,779 ms SELECT count(*) FROM test a JOIN test b ON (a.i=b.i); Temps : 275,411 ms SELECT count(*) FROM test a JOIN test b ON (a.b=b.b); Temps : 286,132 ms SELECT count(*) FROM test a JOIN test b ON (a.f=b.f); Temps : 295,956 ms SELECT count(*) FROM test a JOIN test b ON (a.n=b.n); Temps : 321,292 ms SELECT count(*) FROM test a JOIN test b ON (a.i=b.b); Temps : 281,162 ms SELECT count(*) FROM test a JOIN test b ON (a.n=b.i::NUMERIC*100); Temps : 454,706 ms Now, addition : SELECT count(i+1) FROM test; Temps : 46,973 ms SELECT count(b+1) FROM test; Temps : 60,027 ms SELECT count(f+1) FROM test; Temps : 56,829 ms SELECT count(n+1) FROM test; Temps : 103,316 ms Multiplication : SELECT count(i*1) FROM test; Temps : 46,950 ms SELECT count(b*1) FROM test; Temps : 58,670 ms SELECT count(f*1) FROM test; Temps : 57,058 ms SELECT count(n*1) FROM test; Temps : 108,379 ms SELECT count(i) FROM test; Temps : 38,351 ms SELECT count(i/1234) FROM test; Temps : 48,961 ms SELECT count(b/1234) FROM test; Temps : 62,496 ms SELECT count(n/1234) FROM test; Temps : 186,674 ms Conclusion : numeric seems a bit slower (division being slowest obviously) but nothing that can't be swamped by a few disk seeks... ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: NULLS and User Input WAS Re: [GENERAL] multimaster
Aaaargh! No, it doesn't. It means NULL. Nothing else. Well, x = UNKNOWN doesn't make any sense... the answer is UNKNOWN. x IS UNKNOWN does make sense, the answer is true or false. Replace UNKNOWN with NULL... Actually it means what the DBA wants it to mean (which opens the door to many a misguided design...) I hereby light a candle to the pgsql designers who didn't inflict 00-00- 00:00:00 upon us besides NULL dates. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Numeric performances
This is a 32 bit CPU by the way. Consider this : - There are 100K rows - The CPU executes about 3 billion instructions per second if everything is optimum - "SELECT sum(n) FROM test", takes, say 60 ms This gives about 1800 CPU ops per row. A Float addition versus an Int addition is a drop in the sea. I believe the marked difference between Floats/Ints (faster) and Numeric/Bigint (slower) comes from being passed by value or by pointers. A single access which misses the CPU cache and has to go fetch data from the real RAM spends a lot more cycles than the simple loops in a NUMERIC addition which will hit L1 cache. Nowadays cache access patterns matter more than how many actual CPU instructions are executed... forum_bench=> SELECT sum(i) FROM test; Temps : 46,589 ms forum_bench=> SELECT sum(f) FROM test; Temps : 63,865 ms forum_bench=> SELECT sum(b) FROM test; Temps : 157,018 ms forum_bench=> SELECT sum(n) FROM test; Temps : 124,816 ms ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Numeric performances
It is. But why do you care? You either have the correctness that NUMERIC gives, or you don't. I suspect it's still useful to know what order of magnitude slower it is. After all if it is 1000x slower (not saying it is), some people may decide it's not worth it or roll their own. Any hints/gotchas for/when doing such performance tests? forum_bench=> CREATE TEMPORARY TABLE test AS SELECT a::FLOAT AS f, a::NUMERIC AS n, a::INTEGER AS i, a::BIGINT AS b FROM generate_series( 1,10 ) AS a; SELECT Temps : 1169,125 ms forum_bench=> SELECT sum(i) FROM test; Temps : 46,589 ms forum_bench=> SELECT sum(b) FROM test; Temps : 157,018 ms forum_bench=> SELECT sum(f) FROM test; Temps : 63,865 ms forum_bench=> SELECT sum(n) FROM test; Temps : 124,816 ms SELECT * FROM test ORDER BY i LIMIT 1; SELECT * FROM test ORDER BY b LIMIT 1; SELECT * FROM test ORDER BY f LIMIT 1; SELECT * FROM test ORDER BY n LIMIT 1; Temps : 68,996 ms Temps : 68,917 ms Temps : 62,321 ms Temps : 71,880 ms BEGIN; CREATE INDEX test_i ON test(i); CREATE INDEX test_b ON test(b); CREATE INDEX test_f ON test(f); CREATE INDEX test_n ON test(n); ROLLBACK; CREATE INDEX Temps : 102,901 ms CREATE INDEX Temps : 123,406 ms CREATE INDEX Temps : 105,255 ms CREATE INDEX Temps : 134,468 ms ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] why postgresql over other RDBMS
I believe you have made quite a good description of what happens. Index-building isn't very CPU-intensive for integers (geometrics and tsearch is another matter, of course), so building all indexes of a large table in one pass is a possibility that works now, provided you issue all create index commands in concurrent connections at roughly the same time. I don't think pgrestore does this, though. So it works right now, except it doesn't have (yet) the infrastructure to keep the scans synchronized Perhaps you only got one read of the table because the process is essentially self-synchronizing. Whenever one process "gets ahead", it requires a disk read for the next page, which causes it to block for a relatively long time, during which time the other two processes either proceed reading rows from cache, or come to the end of the cache and block waiting for the same page to be read from disk. Obviously not a guarantee, as indexing a relatively more expensive type COULD cause one process to get multiple pages behind, and memory usage by other processes COULD cause intervening pages to be flushed from cache. But I have a suspicion that the experiment was not just a happy fluke, that there will be a strong tendency for multiple simultaneous index operations to stay sufficiently closely synch'd that the table will only be read from disk once. (Especially when such operations are done while the database is otherwise quiescent, as would be the typical case during a restore.) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: NULLS and User Input WAS Re: [GENERAL] multimaster
Yeah, it is awful ;^) However the existing system is equally awful because there is no way to enter NULL! Consider this form : First name :Edgar Middle name : J. Last name : Hoover Now, if someone has no middle name, like "John Smith", should we use NULL or "" for the middle name ? NULL usually means "unknown" or "not applicable", so I believe we have to use the empty string here. It makes sense to be able to concatenate the three parts of the name, without having to put COALESCE() everywhere. Now consider this form : City: State : Country : If the user doesn't live in the US, "State" makes no sense, so it should be NULL, not the empty string. There is no unnamed state. Also, if the user does not enter his city name, this does not mean he lives in a city whose name is "". So NULL should be used, too. It is very context-dependent. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] general features
1-max amount of available storage data. How many hard disks can you buy ? 2-Clustering support. Please be more precise. 3-price. Free. 4-speed. Proportional to the expertise of the DBA. Postgres can be extremely fast if used correctly, it can totally suck if the database is badly designed. But this is completely normal. It is as fast as the other major players, and all of them need expertise to work well. If you're a newbie, you'll make newbie errors, fortunately this list has many friendly & knowledgeable people who can help you, and the docs are excellent. THIS EMAIL AND ANY ATTACHED FILES ARE CONFIDENTIAL. Aw. Come on. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Transactional DDL
On Sun, 03 Jun 2007 01:39:11 +0200, Tom Lane <[EMAIL PROTECTED]> wrote: "Jaime Casanova" <[EMAIL PROTECTED]> writes: Tom's example seems to show that mysql inserts a commit immidiatelly after a DDL but this one example shows the thing is worse than that. Actually, I think their behavior is just "DDL issues a COMMIT", so that after that you are out of the transaction and the INSERT commits immediately. Some experimentation shows that mysql doesn't issue a warning for rollback-outside-a-transaction, so the lack of any complaint at the rollback step is just standard mysql-ism. Yes, their manual explains this and warns against it. The full list is here : http://www.mysql.org/doc/refman/5.1/en/implicit-commit.html ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Transactional DDL
This is what happens in every RDBMS. Whats so special about postgres then? mysql> BEGIN; Query OK, 0 rows affected (0.00 sec) mysql> CREATE TABLE ble ( id INTEGER ) ENGINE=InnoDB; Query OK, 0 rows affected (0.09 sec) mysql> INSERT INTO ble VALUES (1),(2),(3); Query OK, 3 rows affected (0.02 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> ROLLBACK; Query OK, 0 rows affected (0.00 sec) mysql> SELECT * FROM ble; +--+ | id | +--+ |1 | |2 | |3 | +--+ 3 rows in set (0.00 sec) psql=> BEGIN; BEGIN psql=> CREATE TABLE ble ( id INTEGER ); CREATE TABLE psql=> INSERT INTO ble VALUES (1),(2),(3); INSERT 0 3 psql=> ROLLBACK; ROLLBACK psql=> SELECT * FROM ble; ERREUR: la relation «ble» n'existe pas ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Stored Procedures and Functions
MySQL supports procedures and functions. Functions can return results but cannot update the database. Procedures can update the database but cannot return results. However : - a function can call a procedure that updates the database ! - a procedure can return result through OUT parameters ! It's a friggin mess. In pgsql, if you want, a STABLE or IMMUTABLE procedure is a function since it is repeatable : it will always return the same results with the same parameters, and has no side-effects. This is the definition of a function. It is better not to draw useless lines in the ground with huge "don't walk over this line" stickers. People will always find a way around. Better offer features that users need. Is it true that postgres doesn't have a notion of Stored Procedures and functions is what it has instead? RDBMS like Sql Server supports both stored procedures and functions. So I was wondering what is the difference between a Stored Procedure and a function. Pascal has functions and procedures. C has only functions. That doesn't say anything about the relative usability of each language. Those are just names. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] multimaster
Have you looked at raidb? http://c-jdbc.objectweb.org. Narrow niche, but if it happens to be the one you are in, then it's an option. I took a quick look at the user's page, and both of them were using PostgreSQL. I just love those Java guys. The world starts and ends with Java. How do you execute a psql script with that thing ? How do you actually use any non-java stuff with it ? The same features could be implemented in a connection pool like pgpool2... ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] why postgresql over other RDBMS
On Sat, 02 Jun 2007 00:14:28 +0200, Ron Johnson <[EMAIL PROTECTED]> wrote: On 06/01/07 16:38, PFC wrote: Will the synchronized seq scan patch be able to do this by issuing all the CREATE INDEX commands at the same time from several different database connections ? No, but it could someday. Actually I tested, it does it right now, albeit unconsciously (pg doesn't do anything to synchronize the scans, but if you launch the concurrent connections at the same time and issue all your "create index" at the same time, only 1 table scan is needed). Maybe if the tables were bigger, it would lose sync between the 3 concurrent scans and would end up going slower. That's why I spoke about the "synchronized scan" patch. How much of this, though, is from the OS's disk cache? Or are Seq Scans O_DIRECT and bypass the OS cache? Well, the file was larger than disk cache, and I checked in vmstat's number of actual bytes read from disks... Three threads read the table once, One thread reads the table 3 times. So it works right now, except it doesn't have (yet) the infrastructure to keep the scans synchronized, and psql can't open several connections (yet). ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] why postgresql over other RDBMS
Will the synchronized seq scan patch be able to do this by issuing all the CREATE INDEX commands at the same time from several different database connections ? No, but it could someday. Actually I tested, it does it right now, albeit unconsciously (pg doesn't do anything to synchronize the scans, but if you launch the concurrent connections at the same time and issue all your "create index" at the same time, only 1 table scan is needed). Maybe if the tables were bigger, it would lose sync between the 3 concurrent scans and would end up going slower. That's why I spoke about the "synchronized scan" patch. Or would a "CREATE MANY INDEXES" (where in one statement you specify all the indexes on a single table) command be easier to implement? You can get the effect right now by using concurrent connections it seems. Not very practical in a psql script, though... ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] collision in serial numbers after INSERT?
In the last versions of postgres, do : INSERT INTO blah RETURNING blah_id No need to worry about sequences or anything. It inserts, then it returns the inserted id, as the name says. Very much unlike MySQL where insert_id() returns the id of the last insert, even if it was done in an ON INSERT TRIGGER so isn't what you want at all ! On Fri, 01 Jun 2007 21:39:49 +0200, Bill Moran <[EMAIL PROTECTED]> wrote: In response to [EMAIL PROTECTED]: Hello all - I'm working on a site with PHP and Postgres, coming from a MySQL background. I was looking for an equivalent to the mysql_insert_id() function, and a site recommended this: function postg_insert_id($tablename, $fieldname) { global connection_id; $result=pg_exec($connection_id, "SELECT last_value FROM ${tablename}_ ${fieldname}_seq"); $seq_array=pg_fetch_row($result, 0); return $seq_array[0]; } It relies on pg's sequencing ability. However, I wondered, if I were in an environment where there were many concurrent inserts, would it be possible that I didn't get the serial number of the insert that *I* just did? That if I do an insert, and several inserts happen after mine, wouldn't I get the id of the latest row, which is several inserts after mine? Don't do that. Please let us know what site recommended that so I can send an email to the author correcting them. Instead, do SELECT currval(''), which is guaranteed to be isolated from other sessions. If you use the code above, sooner or later you're going to get bit. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] why postgresql over other RDBMS
On Thu, 31 May 2007 23:36:32 +0200, PFC <[EMAIL PROTECTED]> wrote: On Thu, 31 May 2007 22:20:09 +0200, Vivek Khera <[EMAIL PROTECTED]> wrote: On May 25, 2007, at 5:28 PM, Tom Lane wrote: That's true at the level of DDL operations, but AFAIK we could parallelize table-loading and index-creation steps pretty effectively --- and that's where all the time goes. I would be happy with parallel builds of the indexes of a given table. That way you have just one scan of the whole table to build all its indexes. Just did a test : - large table (does not fit in RAM) - rows with text column (forum posts) - about 700K rows Time to create 3 indexes : 61 s Time to create 3 indexes with 3 simultaneous connections : 22 s That's what you would expect... vmstat shows the data is really loaded from disk, once with the 3 threads, 3 times when indexes are created one at a time. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] why postgresql over other RDBMS
On Thu, 31 May 2007 22:20:09 +0200, Vivek Khera <[EMAIL PROTECTED]> wrote: On May 25, 2007, at 5:28 PM, Tom Lane wrote: That's true at the level of DDL operations, but AFAIK we could parallelize table-loading and index-creation steps pretty effectively --- and that's where all the time goes. I would be happy with parallel builds of the indexes of a given table. That way you have just one scan of the whole table to build all its indexes. Will the synchronized seq scan patch be able to do this by issuing all the CREATE INDEX commands at the same time from several different database connections ? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Geographic data sources, queries and questions
In short, you have to update every instance of the key, not only in the database, but in every application and even in every other representation in the real world. That could include changing people's bookmarks, notes in PDAs, even paper reports sitting on people's desks -- a tall order for an SQL query. This also applies to misguided databases that REUSE values from "auto_increment" columns. I once had two orders with the same PK value. One had been mistakenly deleted, then another one took its place, and all hell broke loose. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] SELECT from mytbl;
On Wed, 30 May 2007 05:24:57 +0200, Erwin Brandstetter <[EMAIL PROTECTED]> wrote: On May 30, 2:11 am, Rodrigo De León <[EMAIL PROTECTED]> wrote: (... useful code example snipped) Now see: http://www.postgresql.org/docs/8.2/static/plpgsql-control-structures Thanks for your hints, Rodrigo! I am aware I can consult pg_catalog / information_schema to retrieve the information, but that's not quite the "short way" I was hoping for. Your example is still helpful, though. :) Python example : import psycopg2 db = psycopg2.connect( host='/var/run/postgresql', user='annonces', database='annonces' ) cursor = db.cursor() bad = set(('email','website','comment')) cursor.execute( "SELECT * FROM contacts LIMIT 0" ) fields = [d[0] for d in cursor.description] print fields ['id', 'name', 'person', 'telephone', 'address', 'zipcode', 'city', 'fax', 'email', 'website', 'comment', 'group_id', 'name_search', 'address_search'] print "SELECT "+(','.join([d for d in fields if d not in bad]))+" FROM contacts" SELECT id,name,person,telephone,address,zipcode,city,fax,group_id,name_search,address_search FROM contacts print "SELECT " + (','.join(set(fields).difference(bad)))+ " FROM contacts" SELECT city,fax,name_search,name,zipcode,telephone,person,address_search,address,group_id,id FROM contacts Regards Erwin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] optimisation for a table with frequently used query
again thanks - bit of a noob question I know, but it's good to learn :-) Well not really since the answer is quite subtle... You kave two columns A and B. Say you have index on A, and index on B. These queries will make direct use of the index : A=... or any range on A (BETWEEN <, >, <=, >= etc ) B=... or any range on B (BETWEEN <, >, <=, >= etc ) Now if you ask for (A=... AND B=...) or ask for (A=... OR B=...) one index cannot be used, so postgres uses a bitmap scan to combine the indexes (read the docs). It is slightly slower than a direct index scan, but still much faster than not using indexes at all. If you had an index on A,B it would have been used directly. If one of the two indexes has very poor selectivity (like just a few different values), bitmap scan will not be optimal. If your indexes have lots of different values, it will be about as fast as a real index. An index on A,B can also do WHERE A=... ORDER BY A,B without actually doing the sort (it will pick the rows in index order), which is nice for many things, like getting blog comments in order. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] problems with SELECT query results
SELECT 'PV.LINEITEM:' || partnum || ',' || round(onhand) || ',' || round(qm5) || ',' || round(lsm4) || ',' || ',' || ',' || round(onorder) || ',' || ',' || ',' || binone || ',' || ',' || round(backorderqty) || ',' || ',' || round(onhold) || ',' || ',' || ',' || ',' || ',' || ',' || ',' || round(qtyperjob) || ',' || round(ordermax) AS gmrim FROM slparts WHERE vendor LIKE 'CH%' You could select columns and build the string in your application ? The query does work and I am getting results from the database. There are values for all 'partnum' in the database, however, the query results include blank fields here and there in between the returned records. Why am I receiving blank fields for 'gmrim' This absolutely defies logic Because one of your fields is probably NULL, and NULL || anything stays NULL. You have probably been misled to believe they are "blanks" because they don't display as "NULL" but as "". I set psql to display NULL as NULL. If these columns can, must, or should not contain NULLs depends on your application... it's for you to chose. Use COALESCE, add NOT NULL constraints, grab the columns and build the string in your application, you chose. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Languages and Functions
2. Is there any performance or other advantage to using PL/pgsql over Pl/Perl or Python? Yes, if you want to loop over large amounts of data (FOR row IN SELECT) plpgsql will be faster since it does not have to convert the data from postgres to python/perl format. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] optimisation for a table with frequently used query
SELECT * from my_table where id_1 = x and id_2 = y; Neither id_1 or id_2 or the combination of them is unique. I expect this table to become large over time. Create an index on (id_1, id_2), or (id_2,id_1). ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] table referencing several others
Here what I would do : - Have a table "objects", which contains locations, shows, persons, well anything you want to relate to the forums in question. - Simply add an object_id to your forums table. Now, how do you distinguish between different types of objects in your objects table ? You can add an object_type field and teach your object-relational mapper that it expresses the class of object that should be instantiated when fetching the rows. You can have fields that are only used by some types of objects, and null for others. You can use table inheritance. You can also have an objects table, and tables locations_extra_infos, people_extra_info which store the fields that are unique to each type of object. - every subsequent message of the thread will carry these useless fields (only the first message in thread uses one of them), If the fields are set to NULL, they will only use one bit, so this is not a problem. Do not store the location_id in each message of the thread : if you want to change it, you'll have to update all the messages ! ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] why postgresql over other RDBMS
MySQL has a related problem, which is that they have embedded IPv4 addressing rather deeply into their client authentication logic (by making userids be [EMAIL PROTECTED] not just a username). This is probably why they still haven't got IPv6 support: http://bugs.mysql.com/bug.php?id=8836 I wonder what their plans are for fixing that ... regards, tom lane Well, I have now almost finished my spare-time forum benchmark of MySQL versus Postgres... It was a really interesting experience, and by turning a few stones I discovered some really nasty stuff about MySQL.. well, about InnoDB really. Here's one that you probably didn't know about : - auto_increment isn't concurrent, this means all inserts into a table which has an auto_increment column are serialized - yes this means INSERT INTO SELECT locks out all other inserts - the lock is held during ON INSERT TRIGGERS, which means all INSERT TRIGGERs on a given table can't execute concurrently ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] index vs. seq scan choice?
Would it be possible to look at a much larger number of samples during analyze, then look at the variation in those to generate a reasonable number of pg_statistic "samples" to represent our estimate of the actual distribution? More datapoints for tables where the planner might benefit from it, fewer where it wouldn't. Maybe it would be possible to take note somewhere of the percentage of occurence of the most common value (in the OP's case, about 3%), in which case a quick decision can be taken to use the index without even looking at the value, if we know the most common one is below the index use threshold... ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] why postgresql over other RDBMS
Indeed. Wouldn't it be a cool feature to persists transaction states across connections so that a new connection could get access to a sub- transaction state? That way, you could make your schema changes and test them with any number of test clients (which designate the state to connect with) and then you would commit when everything works. Actually you can hack this by starting your webserver with only 1 thread, use persistent connections, and disable all commits in the application. But I'd call that "a very hackish hack". ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Integrity on large sites
Flickr uses InnoDB, by the way. On Thu, 24 May 2007 18:07:21 +0200, Joshua D. Drake <[EMAIL PROTECTED]> wrote: Dave Page wrote: problem with your setup. Granted, MySQL is a pretty bad database, but it's not *that* bad -- your example implies that heavily MyISAM-based (you don't say whether this is MyISAM or InnoDB) sites such as Slashdot and Flickr should be falling over every hour. I'm not going to comment on who's fault it is, but the OP quoted 100 updates and 600 selects per *second*. I can't imagine Flickr or Slashdot (which is heavily csched for reading) are under anything like that sort of constant load. Uhmmm I would not be surprised at *all* at slashdot or flickr doing that type of velocity. We have customers right now that under peak are doing 10 times that and yes these are customers that have similar types of websites. Sincerely, Joshua D. Drake ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Integrity on large sites
What version of that pathetic RDBMS is this? MySQL 5.0.40, on gentoo Linux, Core 2 Duo. The table in question takes about 100 inserts/deletes and 600 selects per second. MyISAM isn't able to finish the benchmark. Actually, I have to run REPAIR TABLE every 20 minutes, since it corrupts. I find it hard to believe that this is MySQL's fault and not some problem with your setup. Yeah, me too. Is it a MyISAM bug, a gentoo bug, a hardware bug ? Who knows. Go into bugs.mysql.com and search for "corrupt" or "corruption" Postgres, InnoDB and memtest86 are perfectly happy on this machine. However, I have sometimes upgraded MySQL on websites, and found it to crash repeatedly, then had to downgrade it. I submitted this one, consider it my pet bug : http://bugs.mysql.com/bug.php?id=28534 - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGVLLGS9HxQb37XmcRAgpiAJ4nHpdGXL5HFdosWvkIy16CEyXiSwCgjqtB qYgCmePqgZkGCpdJ/JAFLoE= =P7OR -END PGP SIGNATURE- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Integrity on large sites
Some big sites do of course juggle performance vs in-database run-time checks, but the statements as typically presented by MySQL partisans, Live from the front : This freshly created database has had to endure a multithreaded query assault for about 2 hours. It gave up. TABLE `posts` ( `post_id` int(11) NOT NULL auto_increment, `topic_id` int(11) NOT NULL, etc... mysql> SELECT max(post_id) FROM posts; +--+ | max(post_id) | +--+ | 591257 | +--+ mysql> INSERT INTO posts (topic_id,post_text,user_id) VALUES (1,'DIE BASTARD',666); ERROR 1062 (23000): Duplicate entry '591257' for key 1 mysql> CHECK TABLE posts; +---+---+--+-+ | Table | Op| Msg_type | Msg_text| +---+---+--+-+ | forum_bench.posts | check | warning | Table is marked as crashed | | forum_bench.posts | check | error| Found 588137 keys of 588135 | | forum_bench.posts | check | error| Corrupt | +---+---+--+-+ mysql> REPAIR TABLE posts; +---++--+--+ | Table | Op | Msg_type | Msg_text | +---++--+--+ | forum_bench.posts | repair | status | OK | +---++--+--+ mysql> INSERT INTO posts (topic_id,post_text,user_id) VALUES (1,'DIE BASTARD',666); Query OK, 1 row affected, 1 warning (0.10 sec) mysql> SHOW WARNINGS; +-+--++ | Level | Code | Message| +-+--++ | Warning | 1364 | Field 'post_time' doesn't have a default value | +-+--++ mysql> SELECT max(post_id) FROM posts; +--+ | max(post_id) | +--+ | 591257 | +--+ mysql> SELECT count(*) FROM posts UNION ALL SELECT sum( topic_post_count ) FROM topics; +--+ | count(*) | +--+ | 588137 | | 588145 | +--+ mysql> SELECT count(*) FROM topics WHERE topic_id NOT IN (SELECT topic_id FROM posts); +--+ | count(*) | +--+ |11583 | +--+ (Note : there cannot be a topic without a post in it, ha !) Try Postgres : forum_bench=> SELECT count(*) FROM posts UNION ALL SELECT sum( topic_post_count ) FROM topics; count 536108 536108 (2 lignes) forum_bench=> SELECT count(*) FROM topics WHERE topic_id NOT IN (SELECT topic_id FROM posts); count --- 0 (1 ligne) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Integrity on large sites
"*Really* big sites don't ever have referential integrity. Or if the few spots they do (like with financial transactions) it's implemented on the application level (via, say, optimistic locking), never the database level." Sure, but in the forum benchmark I just did, when using MyISAM, with no reference integrity checks, at the end of the benchmark, there is an impressive number of records with broken foreign key relations... when the user kills his HTTP connection or reloads at the wrong moment, and the script is interrupted, or killed by an exception or whatever, boom. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Using a trigger with an object-relational manager
On Wed, 23 May 2007 14:41:00 +0200, Rick Schumeyer <[EMAIL PROTECTED]> wrote: Actually, the situation is slightly more complicated. It's more like I have tables A1, A2, and A3 each of which must have a corresponding row in B. So each of A1, A2 and A3 has a BEFORE INSERT trigger that creates a row in B and sets a FK in A1 (or A2 or A3). So I can't just use the same PK in both the A tables and B. It's a hack, but A1, A2, A3 etc could all use the same sequence to generate their PK... The best would be to patch rails so it uses INSERT RETURNING. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Using a trigger with an object-relational manager
I'm using Ruby on Rails and have two tables, A and B. Every row in A needs a corresponding row in B. A also contains a FK pointing to B. I created a before insert trigger on A that inserts a new row in B, and sets the FK in A. This seems to be running fine. So, A has a b_id field linking to B ? If you need a 1-1 relationship, you could try instead to use the same primary key in B than in A : A : id SERIAL PRIMARY KEY B : id INTEGER PRIMARY KEY (not SERIAL) Then, AFTER INSERT trigger on A checks the value the sequence put in A.id and inserts in B with this value as the PK. Postgres has INSERT ... RETURNING which is a very clean and elegant solution but Rails never heard about it... ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Permance issues with migrated db
I thought the same thing. While I'm not a MS SQL guru, I put 'TOP 100 PERCENT' after SELECT in the query. The Enterprise Manager does not indicate how many rows come back. I save it as a VIEW in MS SQL and do a 'select count(*)...' and, yes, it comes back 42164877 records. No, it comes back 1 record with the count in it, the ORDER BY is useless for a count(*), etc. What is it that you are trying to do exactly ? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Schema sanity check
The other option is to have a column on the mailbox table to flag that it is a default_mailbox -- but then I'd have to ensure there's only one column for each "person" flagged that way. - is_default BOOL column in mailbox table - conditional index : UNIQUE INDEX ON mailboxes( owner ) WHERE is_default = 't' Second question. So, after a while the obvious problem happens and users have too many mailboxes and they want a way to group them into Looks like a tree. Why not use a LTREE ? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] feature suggestions
Which list is the most appropriate for proposing features and ideas for postgres ? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Are foreign key's automatically set as indicies?
On Wed, 16 May 2007 06:09:15 +0200, camb <[EMAIL PROTECTED]> wrote: Hey all, I'm sorta new to PG and am just wondering, when I create a foreign key is it automatically set as an indexed column or do I need to also set that manually? Primary key creates unique index. Foreign keys do not create any indexes, it is up to you to decide if you need indexes or not. Say : CREATE TABLE items ( ... category_id REFERENCES categories( id ) ) if you want cascaded deletes/updates to your items table to be fast, or you need it for other reasons, create an index on category_id. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] 8.0, UTF8, and CLIENT_ENCODING
I have a small database (PgSQL 8.0, database encoding UTF8) that folks are inserting into via a web form. The form itself is declared ISO-8859-1 and the prior to inserting any data, pg_client_encoding is set to LATIN1. Wouldn't it be simpler to have the browser submit the form in utf8 ? Most of the high-bit characters are correctly translated from LATIN1 to UTF8. So for e-accent-egu I see the two-byte UTF8 value in the database. Sometimes, in their wisdom, people cut'n'paste information out of MSWord Argh. and put that in the form. Instead of being mapped to 2-byte UTF8 high-bit equivalents, they are going into the database directly as one-byte values > 127. That is, as illegal UTF8 values. Sometimes you also get HTML entities in the mix. Who knows. All my web forms are UTF-8 back to back, it just works. Was I lucky ? Normally postgres rejects illegal UTF8 values, you wouldn't be able to insert them... When I try to dump'n'restore this database into PgSQL 8.2, my data can't made the transit. Firstly, is this "kinda sorta" encoding handling expected in 8.0, or did I do something wrong? Duh ? pg isn't supposed to accept bad unicode data... something suspicious is going on. Besides, if it was dumped, it should be reloadable... did pg_dump use a funky encoding ? Secondly, anyone know any useful tools to pipe a stream through to strip out illegal UTF8 bytes, so I can pipe my dump through that rather than hand editing it? Yes, use iconv (see man page), it can do this for you quite easily. It's probably already installed on your system. Be warned, though, that illegal multibyte characters eat quotes at night while you aren't looking... unterminated strings are a pain. You could also load your database with C locale, and have a script select from the records you wish to convert, and update the rows. Python has very good Unicode support, should be easy to make such a script. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Fault Tolerant Postgresql (two machines, two postmasters, one disk array)
Aren't there PCI heartbeat cards that are independent of the load on the host machine? But, if the machine is fork-bombed, or drowning in swap, or generally slowly committing suicide, it's not shall we say "available" anymore, so you might want to finish it off... ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Transaction commit in a trigger function
I can't wrap BEGIN/COMMIT around the INSERT in the trigger. Is there another way of achieving this? - Savepoints (won't work with your trigger approach) - dblink would allow you to open another connection concurrently ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Paypal and "going root"
Has anybody pondered this and come up with anything? Well, I've done e-commerce stuff although it was with banks, not paypal. There are a few levels of safety : - Level 0 : Total stupidity (osCommerce) Bank redirects to your site. Order is marked as paid with "paid=1" in the URL. Solution : fire the programmer - Level 1 : callback Bank hits URL on your site with data you provided (order_id) so you may mark the order as paid. Technical documentation says to verify the transaction by checking the bank server's IP address. Solution : choose another bank - Level 3 : crypto Bank hits URL on your site with data you provided (order_id) so you may mark the order as paid. Bank encrypts everything and securely signs the data (HMAC, certificate...) This works well. Additionally, you can hit a URL on the bank to check validity. However, if this is not encrypted (SSL...) can you be sure who you are talking to ? My advice is to store, in your table, the signed and encrypted data the bank sends you. Suppose a hacker breaks your verification code and marks his order as paid in the database. Even so, he will never be able to use the bank's certificate to generate a fake signed payment confirmation to insert in your log table. And, usually, the data the bank signs includes the user name, id, order id, amount, and user-specified fields, that you use to store a SHA1 hash of the list of things the user ordered, so he can't steal the payment confirmation from another user, or from a past order. So, if later you suspect something, you can re-check the signatures on the payment confirmation data, and the fraud will be apparent. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Performance issues of one vs. two split tables.
We use bitfields on our large user table. It is becoming unworkable to scan for matches, since overall most people have very few selections made. We are moving it to a model like your favorite_colors table which just links the option and the user. We find that doing joins on large tables which can be indexed to avoid full table scans are very fast in postgres, since the index can do much of your culling of potential matching rows. With bitfields, you are more or less forced into doing a sequence scan to find everyone who likes the color red. Of course, if you're playing with only a few thousand users, either approach works well. Things you could try : * Use an integer array instead of a bitfield (for instance, in users table, column favourites would contain { 1,2 } if the user selected items 1 and 2 ) Then, you can make a Gist index on it and use the indexed intersection operator This is likely the optimal solution if the maximum number of items is small (say, 100 is good, 10 is not) * keep your bitfields and create conditional indexes : CREATE INDEX ... WHERE bitfield_column & 1; CREATE INDEX ... WHERE bitfield_column & 2; CREATE INDEX ... WHERE bitfield_column & 4; CREATE INDEX ... WHERE bitfield_column & 8; CREATE INDEX ... WHERE bitfield_column & 16; etc... Obviously this will only work if you have, say, 10 favouritess. 100 indexes on a table would really suck. Then, when looking for users who chose bits 1 and 2, do : SELECT WHERE (bitfield_column & 1) AND (bitfield_column & 2) postgres will do a bitmap-and using the two indexes (note : when we get bitmap indexes, this will be even better) * use tsearch2 : favourites = 'red blue' and fulltext-search it * use a favourites table : This makes queries hard to optimize. Consider the table (user_id, item_id) meaning user selected this item as favourite. If you want to know which users did select both items 1 and 2, you have to do a self-join, something like : SELECT... FROM favourites a, favourites b WHERE a.user_id = b.user_id AND a.item_id=1 AND b.item_id = 2 This is likely to be "not very fast" if 1 million users check each option but only 100 check both. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Performance issues of one vs. two split tables.
Thus, if there are a whole bunch of columns on each table, the data in those extra columns (e.g. - all columns aside from "id", the one that was asked for in the result set) will indeed be drawn into memory. Yeah, I wanted to mean that ;) All the columns are loaded (except the TOASTed ones which are not mentioned in the query) into memory, but only the requested ones are processed and returned to the client... Is that specific to Postgresql? Nope. All databases do more or less the same. From an outside perspective it just seems odd that potentially a large amount of data would be pulled off disk into memory that is never used. Perhaps there's an overriding reason for this. Yeah, where would you put this data if you didn't put it where it is now ? If you alter tables "customer" and "order", taking some columns off, and stowing them in separate tables, then you'll find that more tuples of "customer" and "order" will fit into a buffer page, and that the join will be assembled with somewhat less memory usage. Whether or not that is a worthwhile change to make will vary considerably. Makes designing the schema a bit tough. ;) "Premature optimization is the root of all evil" Build a test database, fill it with data, and experiment. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Performance issues of one vs. two split tables.
SELECT o.id FROM order o JOIN customer c on o.customer = c.id Does that bring into memory all columns from both order and customer? Maybe that's not a good example due to indexes. No, it just pulls the columns you ask from the table, nothing less, nothing more. Splitting tables (vertical partitioning) is used to reduce the size of the working set that has to fit in RAM... this is a different reason than what you're thinking about. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Performance issues of one vs. two split tables.
Say, if you first SELECT fname, lname FROM user_table; and then you issue SELECT * FROM user_table; -- the second select will be returned from buffer cache -- since all rows are already in the cache. ...Unless your table contains some large TEXT columns that have been stored out of line (TOASTed) by postgres. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Performance issues of one vs. two split tables.
Say I have a "user" table that has first, last, email, password, and last_accessed columns. This user table will be accessed often. (It's not really "user", but that's not important in this discussion) Say that there's also about 10 columns of settings or preferences for each user. Are there any cases or reasons to have a separate "user_preferences" table vs. just placing all the columns together in one table? I did something like that on MySQL some time ago. In the Users table there was stuff that other users need to see (like his login name, etc), and stuff that only this user needs to see (like his preferences). So, when displaying posts in the forum, for instance, only a small part of the fields in the Users table was needed, the rest was just dead weight, that made the table unable to fit in RAM. So I split the table, and it was faster. However, you can also buy more RAM... ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Views- Advantages and Disadvantages
On Fri, 11 May 2007 04:24:55 +0200, Tom Lane <[EMAIL PROTECTED]> wrote: "Leif B. Kristensen" <[EMAIL PROTECTED]> writes: Would it be reasonable to suggest that later versions of PostgreSQL could examine if a function changes data, and quietly marks a function as 'stable' if it doesn't? My instinctive CS-major reply to that is "only if you've found a solution to the halting problem". However, it's possible that we could detect this case for a useful subset of real-world functions ... not sure offhand what could be covered. regards, tom lane Why not simply have PG issue a warning if the user doesn't specify one of ("stable", "immutable", etc) on function creation ? like : WARNING: Function marked as Volatile by default INFO: if the function does not modify the database, you might want to mark it STABLE or IMMUTABLE to improve performance ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] In theory question
This may be a question for -hackers, but I don't like disturbing them unnecessarily. I've been having a look at memcached. I would like to ask, is there any reason that, theoretically, a similar caching system could be built right into the db serving daemon? I.e., the hash tables and libevent could sit on top of postmaster as an optional component caching data on a per-query basis and only hitting the actual db in the event of a cache miss? I think this is close to what MySQL's query cache does. The question is if this should be the job of the DBMS and not another layer. At least the pgmemcache author and I think that it's better done outside the DBMS. See http://people.FreeBSD.org/~seanc/pgmemcache/pgmemcache.pdf for the idea. I have always found MySQL's query cache to be utterly useless. Think about it this way : It only works for tables that seldom change. It does not work for big tables (like the posts table of a forum) because the cache would have to be huge. So, the most frequent usage of MySQL's query cache is for dumb applications who use, for instance, PHP, store their configuration in MySQL, and reload it on each and every page with a SELECT * FROM configuration. In this case, you save the query time, but you don't save : the roundtrip between PHP and the database, extracting query results, building objects from them, time spent in ORMs, etc. A much better solution is to do your own caching, for instance using shared memory in the application server, and then you cache native language objects. You not only save the query time, but also all the time spent building those objects on every page load. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] typical schema for a forum?
I have written a little PHP+postgres forum for benchmarking purposes, to see how fast postgres could go. It has basic forum features, like forums (duh), topics, posting, pagination, watching topics, topic & post count, display newest topic and post in topic & forum pages, templates, topics pagination in O(1) not O(number of pages), etc. I also wrote a companion benchmark script which tortures the server with HTTP requests, viewing and posting. it answered my "how fast can it go" question : more than 500 dynamic pages/second on a Core 2. Most of the load is PHP : postgres could handle much, much more than this. If people are interested in this as a benchmark, I can write some docs (and make a MySQL version). If you want to use it as a base for your forum, you're welcome. Who wants the source ? On Wed, 09 May 2007 10:49:17 +0200, Louis-David Mitterrand <[EMAIL PROTECTED]> wrote: Hi, I'm trying to implement a forum with mason and postgresql. What is the typical database schema of a forum (threaded or flat) application? Thanks, ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Slow query and indexes...
Thanks for a good answer, I'll try to find a workaround. The number of data_loggers will change, but not to frequently. I was actually hoping to make a view showing the latest data for each logger, maybe I can manage that with a stored procedure thingy... - Create a table which contains your list of loggers (since it's good normalization anyway, you probably have it already) and have your data table's logger_id REFERENCE it - You now have a simple way to get the list of loggers (just select from the loggers table which will contain 3 rows) - Then, to get the most recent record for each logger_id, you do : SELECT l.logger_id, (SELECT id FROM data d WHERE d.logger_id = l.logger_id ORDER BY d.logger_id DESC, d.date_time DESC LIMIT 1) AS last_record_id FROM loggers l 2 minute example : forum_bench=> CREATE TABLE loggers (id SERIAL PRIMARY KEY, name TEXT ); CREATE TABLE forum_bench=> INSERT INTO loggers (name) VALUES ('logger 1'),('logger 2'),('logger 3'); INSERT 0 3 forum_bench=> CREATE TABLE data (id SERIAL PRIMARY KEY, logger_id INTEGER NOT NULL REFERENCES loggers( id )); CREATE TABLE forum_bench=> INSERT INTO data (logger_id) SELECT 1+floor(random()*3) FROM generate_series(1,100); forum_bench=> SELECT logger_id, count(*) FROM data GROUP BY logger_id; logger_id | count ---+ 3 | 333058 2 | 333278 1 | 333664 NOTE : I use id rather than timestamp to get the last one forum_bench=> EXPLAIN ANALYZE SELECT logger_id, max(id) FROM data GROUP BY logger_id; QUERY PLAN - HashAggregate (cost=19166.82..19169.32 rows=200 width=8) (actual time=1642.556..1642.558 rows=3 loops=1) -> Seq Scan on data (cost=0.00..14411.88 rows=950988 width=8) (actual time=0.028..503.308 rows=100 loops=1) Total runtime: 1642.610 ms forum_bench=> CREATE INDEX data_by_logger ON data (logger_id, id); CREATE INDEX forum_bench=> EXPLAIN ANALYZE SELECT l.id, (SELECT d.id FROM data d WHERE d.logger_id=l.id ORDER BY d.logger_id DESC, d.id DESC LIMIT 1) FROM loggers l; QUERY PLAN - Seq Scan on loggers l (cost=0.00..3128.51 rows=1160 width=4) (actual time=0.044..0.074 rows=3 loops=1) SubPlan -> Limit (cost=0.00..2.68 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=3) -> Index Scan Backward using data_by_logger on data d (cost=0.00..13391.86 rows=5000 width=8) (actual time=0.018..0.018 rows=1 loops=3) Index Cond: (logger_id = $0) Total runtime: 0.113 ms (6 lignes) forum_bench=> SELECT l.id, (SELECT d.id FROM data d WHERE d.logger_id=l.id ORDER BY d.logger_id DESC, d.id DESC LIMIT 1) FROM loggers l; id | ?column? +-- 1 | 99 2 | 100 3 | 90 (3 lignes) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid
2b- LARGE UPS because HDs are the components that have the higher power consomption (a 700VA UPS gives me about 10-12 minutes on a machine with a XP2200+, 1GB RAM and a 40GB HD, however this fall to.. less than 25 secondes with seven HDs ! all ATA), I got my hands on a (free) 1400 VA APC rackmount UPS ; the batteries were dead so I stuck two car batteries in. It can power my computer (Athlon 64, 7 drives) for more than 2 hours... It looks ugly though. I wouldn't put this in a server rack, but for my home PC it's perfect. It has saved my work many times... Harddisks suck in about 15 watts each, but draw large current spikes on seeking, so the VA rating of the UPS is important. I guess in your case, the batteries have enough charge left; but the current capability of the UPS is exceeded. Some hardware ctrlrs are able to avoid the loss of a disk if you turn to have some faulty sectors (by relocating internally them); software RAID doesn't as sectors *must* be @ the same (linear) addresses. Harddisks do transparent remapping now... linux soft raid can rewrite bad sectors with good data and the disk will remap the faulty sector to a good one. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] [SQL] input from a external text file......!
inside psql, type : \i filename On Sat, 11 Mar 2006 11:29:20 +0100, AKHILESH GUPTA <[EMAIL PROTECTED]> wrote: Hi All.! I just want to know one thing that is it possible with PGSQL that, if I want to insert and execute a query from a external text file instead of giving it at the pgsql prompt? just like in Oracle the file having query is executed with a '@ filename' statement at the sql prompt..! plz help me and mail me @ [EMAIL PROTECTED], it's urgent. thanks in advance...! (i have searched alot, but didn't found anything) -- Thanks & Regards, Akhilesh DAV Institute of Management Faridabad(Haryana) GSM:-(+919891606064) (+911744293789) "FAILURES CAN BE FORGIVEN BUT AIMING LOW IS A CRIME" ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] pg 8.0.1-r3 killls pgadmin3 and phppgadmin
You need a newer pgadmin --- pg_database.datpath went away in 8.0. I'm installing the new version. Thanks. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] pg 8.0.1-r3 killls pgadmin3 and phppgadmin
pgadmin3 does this (from pg's log with level set at debug5): INSTRUCTION : SELECT db.oid, datname, datpath, datallowconn, datconfig, datacl, pg_encoding_to_char(encoding) AS serverencoding, pg_get_userbyid(datdba) AS datowner,has_database_privilege(db.oid, 'CREATE') as cancreate FROM pg_database db ORDER BY datname ERREUR: 42703: la colonne <> n'existe pas EMPLACEMENT : transformColumnRef, parse_expr.c:1099 Do you know if this is normal, should this column exist, is it a problem with pgadmin ? Thanks ! On Mon, 11 Apr 2005 18:42:29 +0200, Tom Lane <[EMAIL PROTECTED]> wrote: PFC <[EMAIL PROTECTED]> writes: I have no idea what to type in gbd to get the trace, though What I usually do is - start a psql session - in another window, find out the PID of the backend attached to the psql session, and do $ gdb /path/to/postgres backend_PID ... gdb> b errfinish gdb> cont - go back to psql session and issue problem command - when gdb stops execution, do gdb> bt ... useful printout is here ... gdb> quit sure you want to exit? y regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] pg 8.0.1-r3 killls pgadmin3 and phppgadmin
Sometimes life has an irony of itself. Today I modified some of my gentoo USE flags for something totally unrelated to postgres. Tonight I built postgres in debug mode : the offending query worked. I thught "hm." I rebuilt it without debug, and it still works. I don't know what made it NOT to work before, I sure didn't hallucinate. It must be some obscure incompatibility deep inside the Gentoo package and build manager... phppgadmin works, and pgadmin doesn't, telling me the "datapath" column doesn't exist, no idea what this means. I'm going to look into it. Thanks for your help and sorry about bothering you ! On Mon, 11 Apr 2005 18:42:29 +0200, Tom Lane <[EMAIL PROTECTED]> wrote: PFC <[EMAIL PROTECTED]> writes: I have no idea what to type in gbd to get the trace, though What I usually do is - start a psql session - in another window, find out the PID of the backend attached to the psql session, and do $ gdb /path/to/postgres backend_PID ... gdb> b errfinish gdb> cont - go back to psql session and issue problem command - when gdb stops execution, do gdb> bt ... useful printout is here ... gdb> quit sure you want to exit? y regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] pg 8.0.1-r3 killls pgadmin3 and phppgadmin
If you want, you can try building with --enable-debug and getting a gdb traceback from the call to errfinish(). That would at least give us some clue where in the rule text it's getting confused. Is this : ./configure --enable-debug ? I have no idea what to type in gbd to get the trace, though ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] pg 8.0.1-r3 killls pgadmin3 and phppgadmin
You might try comparing the results of select ev_action from pg_rewrite where ev_class = 'pg_user'::regclass; from the two builds. regards, tom lane Well... I'll spare your eyes and tell you right away that the results are identical... they're at the bottom of the email. I tried recompiling postgres 8.0.1-r2 with : CFLAGS="-O2 -pipe -march=pentium3" which is quite conservative (I have a Pentium-M, this is my dev laptop). $ gcc --version gcc (GCC) 3.3.5 (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3, pie-8.7.7.1) And I still get (in 8.0.1-r2) : test=> SELECT pdb.datname AS datname, pu.usename AS datowner, pg_encoding_to_char(encoding) AS datencoding, test-> (SELECT description FROM pg_description pd WHERE pdb.oid=pd.objoid) AS datcomment, test-> (SELECT spcname FROM pg_catalog.pg_tablespace pt WHERE pt.oid=pdb.dattablespace) AS tablespace test-> FROM pg_database pdb, pg_user pu test-> WHERE pdb.datdba = pu.usesysid AND NOT pdb.datistemplate test-> ORDER BY pdb.datname; ERREUR: unexpected right parenthesis Let's try something else : postgresql $ createuser -P peufeu Entrez le mot de passe pour le nouvel utilisateur : Entrez-le de nouveau : Le nouvel utilisateur a-t'il le droit de créer des bases de données ? (y/n) o (note that I have to type "o" and not "y" as it's in French... although it still displays y/n, "y" means "no" ! it bit me !) Le nouvel utilisateur a-t'il le droit de créer des utilisateurs ? (y/n) n Mot de passe : CREATE USER peufeu $ createdb Mot de passe : CREATE DATABASE [EMAIL PROTECTED] peufeu $ psql Mot de passe : Bienvenue dans psql 8.0.1, l'interface interactive de PostgreSQL. Tapez: \copyright pour les termes de distribution \h pour l'aide-mémoire sur les commandes SQL \? pour l'aide-mémoire sur les commandes internes \g ou terminez avec un point-virgule pour exécuter une requête \q pour quitter peufeu=> SELECT pdb.datname AS datname, pu.usename AS datowner, pg_encoding_to_char(encoding) AS datencoding, peufeu-> (SELECT description FROM pg_description pd WHERE pdb.oid=pd.objoid) AS datcomment, peufeu-> (SELECT spcname FROM pg_catalog.pg_tablespace pt WHERE pt.oid=pdb.dattablespace) AS tablespace peufeu-> FROM pg_database pdb, pg_user pu peufeu-> WHERE pdb.datdba = pu.usesysid AND NOT pdb.datistemplate peufeu-> ORDER BY pdb.datname; ERREUR: unexpected right parenthesis ka-blam. I checked there was no invisible UTF-8 character inside the query... there isn't, everything's normal... I cut bits of the query, trying to get to the bug, and to my surprise, I got : peufeu=> SELECT pdb.datname FROM pg_database pdb, pg_user pu; ERREUR: unexpected right parenthesis Which is interesting, as there is no parenthesis in this query ;) I'll copypaste the rest of the session : peufeu=> SELECT pdb.datname FROM pg_database pdb, pg_user pux; ERREUR: unexpected right parenthesis peufeu=> SELECT pdb.datname FROM pg_database pdb, pg_user pu; ERREUR: unexpected right parenthesis peufeu=> SELECT pdb.datname FROM pg_database pdb, pg_user; ERREUR: unexpected right parenthesis peufeu=> SELECT foo.datname FROM pg_database foo, pg_user; ERREUR: unexpected right parenthesis peufeu=> SELECT foo.datname FROM pg_database foo; datname --- test peufeu template1 template0 (4 lignes) peufeu=> SELECT foo.datname FROM pg_user, pg_database foo; ERREUR: unexpected right parenthesis peufeu=> SELECT * FROM pg_user LIMIT 1; ERREUR: unexpected right parenthesis peufeu=> SELECT * FROM pg_user LIMIT 1; pg_user peufeu=> SELECT * FROM pg_user LIMIT 1; ERREUR: unexpected right parenthesis peufeu=> SELECT * FROM pg_user; ERREUR: unexpected right parenthesis peufeu=> SELECT 1 FROM pg_user; ERREUR: unexpected right parenthesis Seems I have a cursed table. I login as postgres : $ psql -U postgres peufeu Mot de passe : Bienvenue dans psql 8.0.1, l'interface interactive de PostgreSQL. Tapez: \copyright pour les termes de distribution \h pour l'aide-mémoire sur les commandes SQL \? pour l'aide-mémoire sur les commandes internes \g ou terminez avec un point-virgule pour exécuter une requête \q pour quitter peufeu=# SELECT 1 FROM pg_user; ERREUR: unexpected right parenthesis peufeu=# ANALYZE VERBOSE pg_user; ERREUR: unexpected right parenthesis peufeu=# ANALYZE VERBOSE "pg_user"; ERREUR: unexpected right parenthesis [EMAIL PROTECTED] peufeu $ echo "ANALYZE VERBOSE" | psql -U postgres peufeu 2>log Mot de passe : ANALYZE [EMAIL PROTECTED] peufeu $ grep pg_catalog log INFO: Analyse de "pg_catalog.pg_shadow" INFO: Analyse de "pg_catalog.pg_opclass" INFO: Analyse de "pg_catalog.pg_am" INFO: Analyse de "pg_catalog.pg_amop" etc... [EMAIL PROTECTED] peufeu $ grep user log [EMAIL PROTECTED] peufeu $ I'm totally perplexed. Have you got any idea ? I'm itching to do an initdb... 8.0.0 : ({QUERY :commandType 1 :quer
Re: [GENERAL] pg 8.0.1-r3 killls pgadmin3 and phppgadmin
Hmm; the only occurrence of that string in the sources is in nodes/read.c, which AFAIK would never be invoked for a query entered straight from the client. What could trigger it would be trying to read a rule that is misformatted for some reason. Maybe you have a damaged pg_user view definition --- does "select * from pg_user" fail the same way? I installed pg 8.0.0 over 8.0.1-r3 (without touching the data directory) and then it worked again, so I don't think it has to do with the data being broken. I didn't do createdb or anything. I used gentoo emerge. If you need it for debug, I can reinstall 8.0.1-r3 and see if it still fails. Can you get phppgadmin to work on this latest version ? Thank you, P.F. Caillaud ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[GENERAL] pg 8.0.1-r3 killls pgadmin3 and phppgadmin
Hello, Just installed pg 8.0.1-r3 and now phppgadmin and pgadmin3 die with a cryptic error message. I've traced it to the following query to get information about the database : SELECT pdb.datname AS datname, pu.usename AS datowner, pg_encoding_to_char(encoding) AS datencoding, (SELECT description FROM pg_description pd WHERE pdb.oid=pd.objoid) AS datcomment, (SELECT spcname FROM pg_catalog.pg_tablespace pt WHERE pt.oid=pdb.dattablespace) AS tablespace FROM pg_database pdb, pg_user pu WHERE pdb.datdba = pu.usesysid AND NOT pdb.datistemplate ORDER BY pdb.datname; This query looks right to me and pg 8.0.0 does it fine, while 8.0.1-r3 says : Error : Unexpected right parenthesis. This is not very helpful... What's going on ? I've downgraded to 8.0.0 in the meantime. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] basic pg lock question
To use this to prevent simultaneous inserts of the same data (for example if two employees try to insert the same contact into the DB), I suppose you could use a constraint (before insert) that checks that there is no data matching the md5 checksum, right? CREATE TABLE blah ( mymd5 TEXT NOT NULL, UNIQUE( mymd5 ) ); will do this for you automatically and create an index to enforce it (don'tcreate another index !) ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Does indexing help >= as well as = for integer columns?
This I don't get. Why is an index scan not used? Isn't an index supposed to help when using > < >= <= too? It should ! Explain Analyze Select count(smiles) from structure where _c >= 30 Aggregate (cost=196033.74..196033.74 rows=1 width=32) (actual time=42133.432..42133.434 rows=1 loops=1) -> Seq Scan on structure (cost=0.00..191619.56 rows=1765669 width=32) (actual time=8050.437..42117.062 rows=1569 loops=1) Filter: (_c >= 30) Total runtime: 42133.746 ms See these : -> Index Scan using "Nc" on structure (cost=0.00..105528.89 rows=26486 width=32) (actualtime=0.098..16.095 rows=734 loops=1) -> Seq Scan on structure (cost=0.00..191619.56 rows=1765669 width=32) (actual time=8050.437..42117.062 rows=1569 loops=1) In the index scan case, Planner thinks it'll get "rows=26486" but in reality only gets 734 rows. In the seq scan case, Planner thinks it'll get "rows=1765669" but in reality only gets 1569 rows. The two are way off-mark. 26486 still makes it choose an index scan because it's a small fraction of the table, but 1765669 is not. Analyze, use more precise statistics (alter table set statistics), whatever... but you gotta get the planner correctly estimating these rowcounts. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] Howto determin the number of elemnts of an array
contrib/intarray On Mon, 31 Jan 2005 20:00:06 +0100, Együd Csaba <[EMAIL PROTECTED]> wrote: Hi, this function is part of GiST which is not installed on my server. Should I install the whole GiST or there is a single package to implement only this feature? thank you, -- Csaba -Original Message- From: PFC [mailto:[EMAIL PROTECTED] Sent: Monday, January 31, 2005 5:16 PM To: Együd Csaba; 'Sven Willenberger'; pgsql-general@postgresql.org Subject: Re: [GENERAL] Howto determin the number of elemnts of an array icount( array ) Hi Seven, it would be better for me if it returned the number of elements as an integer even if the array is empty (in this case it returns with NULL). No metter this is easily can be worked around, but it would have been more confortable. Thank you very much. -- Csaba Együd -Original Message- From: Sven Willenberger [mailto:[EMAIL PROTECTED] Sent: Monday, January 31, 2005 4:27 PM To: pgsql-general@postgresql.org Cc: Együd Csaba Subject: Re: [GENERAL] Howto determin the number of elemnts of an array Együd Csaba wrote: Hi, how can I determin the number of elements of an array? I saw function array_dims() which returns a string value representing the dimensions of the array. Is there a function which returns only the number of elements as an integer. -- Csaba array_upper(arrayname,dimension) will return the number of elements in the specified dimension of array arrayname. Sven ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Howto determin the number of elemnts of an array
icount( array ) Hi Seven, it would be better for me if it returned the number of elements as an integer even if the array is empty (in this case it returns with NULL). No metter this is easily can be worked around, but it would have been more confortable. Thank you very much. -- Csaba Együd -Original Message- From: Sven Willenberger [mailto:[EMAIL PROTECTED] Sent: Monday, January 31, 2005 4:27 PM To: pgsql-general@postgresql.org Cc: Együd Csaba Subject: Re: [GENERAL] Howto determin the number of elemnts of an array Együd Csaba wrote: Hi, how can I determin the number of elements of an array? I saw function array_dims() which returns a string value representing the dimensions of the array. Is there a function which returns only the number of elements as an integer. -- Csaba array_upper(arrayname,dimension) will return the number of elements in the specified dimension of array arrayname. Sven ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Splitting queries across servers
postgresql-8.0.0beta4 $ time make -j 5 ... lots of output ... real0m41.274s user1m36.315s sys 0m15.451s Yikes. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] Splitting queries across servers
I have never heard of Propolice SSP. What is it ? Any relation to the honey 'Propolys'. just kidding. Max The name says little although I like it. http://www.gentoo.org/proj/en/hardened/ I was out of date -- Propolice has been renamed PaX. The hardened project has many parts, you should read the help on grsecurity, but PaX is very interesting : --- from http://www.gentoo.org/proj/en/hardened/docs/pax-howto.xml : What is PaX? PaX is a patch to the Linux kernel that provides hardening in two ways. The first, ASLR (Address Space Layout Randomization) provides a means to randomize the addressing scheme of all data loaded into memory. When an application is built as a PIE (Position Independent Executable), PaX is able to also randomize the addresses of the application base in addition. The second protection provided by PaX is non-executable memory. This prevents a common form of attack where executable code is inserted into memory by an attacker. More information on PaX can be found throughout this guide, but the homepage can be found at http://pax.grsecurity.net. At run time, when a buffer is created, SSP adds a secret random value, the canary, to the end of the buffer. When the function returns, SSP makes sure that the canary is still intact. If an attacker were to perform a buffer overflow, he would overwrite this value and trigger that stack smashing handler. --- For instance, imagine you have a version of Samba with the latest unpatched hole. An attacker can enter. Now if you have PaX all he can do is crash the process, and his intrusion attempt is detected and logged. It's not the final cure for everything, but it covers unpatched holes. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Splitting queries across servers
Thanks for reminding me that 64bit translates to: recompile everything you need! I think this is exactly the choice of configuration we are going to make. Someone just reminded me that windows and linux come down to the same performance, but that the real overhead is on maintenance. It's true that linux implies less downtime, and you can always upgrade. On top of that, there's more granularity control on linux. I won't say anything on linux vs windows performance (although I sure do have an opinion), but simply keep in mind that postgres on window is young and postgres on linux is vary mature ; therefore one is likely to have a lot more performance refinements than the other. Also Linux is not that harder to administer (especially gentoo). After all, all you want is a DB server, you'll install few software. Not like if you needed 50 different apps. In your case recompiling everything is also a bonus because you'll be sure that everything is 64-bit optimized down to the last driver. I wonder how long your killer machine will take to compile the whole OS. And Postgres too... could you post that ? If you are concerned about security, you can compile everything with the stack protector (Propolice SSP) enabled. It's extremely advisable. All the other options like grsec et al are nice but require careful thinking, propolice just needs to be activated. If you don't know what it is I'll post docs. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] Allowing update of column only from trigger
First you should use a ltree type for the uri field : - you write it foo.bar instead of /foo/bar - there are operators on ltree types to express "is parent of", "is children of" - these operators are indexed Check the readme : http://www.sai.msu.su/~megera/postgres/gist/ltree/ If you have this type of path, I guess you'll often make tree traversal operations, and that you'll find the ltree operators extremely useful. You can update it with a trigger just like before. Now about your update problem, when you rename foo.bar into foo.crum.bar you could, in a single update, replace all foo.bar by foo.crum.bar in all your table with the ltree operators and special functions. And for your checks, you can add a CHECK on the url field to be sure it's equal to the url of the parent + the name of the current row. It'll make one more SELECT request, though. I have a table like this: create table objects ( id serial primary key, namevarchar not null, parent integer references objects(id) default 1 not null, uri varchar not null ) without oids; The uri column is a denormalization for performance, storing a "path" to the object in the hierarchy, consisting of a sequence of names. # select id, name, parent, uri from objects; id | name | parent | uri +--++-- 1 | | 1 | / 2 | foo | 1 | /foo 3 | bar | 2 | /foo/bar (3 rows) The uri is calculated by a trigger before update on objects. The original version of the trigger function would re-calculate the uri for an object and its immediate children if the name, parent, or uri changed. It would apply the uri change to the children, which would cascade down the hierarchy. This generally worked, but (1) I was having data visibility voodoo, and (2) it was calculating every child's uri twice, which would be inefficient for large hierarchies. So I changed the trigger function so that it would only fire if name or parent had changed. I created a recursive helper function that changes the uri for all descendants of an object, if the object's name or parent has changed. There is no cascade of changes (the trigger fires for all the descendants, of course, but doesn't do anything). Works great, is more efficient, and I can manage the max_stack_size to fit the size of the hierarchy. The PROBLEM with this is that anyone can now # update objects set uri='/ha/ha/your/screwed' where id=2; I want the trigger function and its helper alone to be able to update the uri. What is the best way to do this? Should I put the uri column in a separate table, and play with permissions? (Yuck.) Do I need to bite the bullet, go back to cascading triggers, and work out the data visibility voodoo and the efficiency issue? Or is there a better way that I haven't thought of? Thanks, Shawn Harrison ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] How to get back the current count of a auto incrementing number.
This question pops about once a day... can't the list server grep for "auto increment" and send a link to the docs itself ;) hi How to get the current value of a auto incrementing number generated using CREATE SEQUENCE seqname START 1 Is there any way and if yes what is it. Can we run a select query and may be a inbuilt function which will return the current value of that number? Regards. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] My postmaster just crashed !
I'm sorry, I forgot to say I had the bug with 8rc3, then installed 8.0.0 and it is still crashing. On Thu, 27 Jan 2005 10:52:37 -0700, Michael Fuhr <[EMAIL PROTECTED]> wrote: [I've Cc'ed pgsql-bugs and set the Reply-To header to that list.] On Thu, Jan 27, 2005 at 05:26:26PM +0100, PFC wrote: It seems that contrib/intagg crashes my server : - select int_agg_final_array(1); server closed the connection unexpectedly I see the same thing with PostgreSQL 8.0.0 (REL8_0_STABLE) on Solaris 9 and FreeBSD 4.11. Here's part of the backtrace from the core dump on FreeBSD: (gdb) bt #0 0x285077f4 in ShrinkPGArray (p=0x1) at int_aggregate.c:130 #1 0x28507929 in int_agg_final_array (fcinfo=0xbfbfeb90) at int_aggregate.c:177 #2 0x8106db8 in ExecMakeFunctionResult (fcache=0x83410a0, econtext=0x83411e8, isNull=0xbfbfecef "\b0ÃÂÂÂK\024\bÂz3\bÃ\003", isDone=0x0) at execQual.c:1042 #3 0x8107583 in ExecEvalFunc (fcache=0x83410a0, econtext=0x83411e8, isNull=0xbfbfecef "\b0ÃÂÂÂK\024\bÂz3\bÃ\003", isDone=0x0) at execQual.c:1459 #4 0x8108beb in ExecEvalExprSwitchContext (expression=0x83410a0, econtext=0x83411e8, isNull=0xbfbfecef "\b0ÃÂÂÂK\024\bÂz3\bÃ\003", isDone=0x0) at execQual.c:2781 #5 0x8145353 in evaluate_expr (expr=0x8337ab8, result_type=1007) at clauses.c:2399 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] visualizing B-tree index coverage
I realize that using OR will not result in an index scan. I will never be interested in a OR condition for the kinds of searches I use. In my Select statements, I always name every column of the multi-column index in same order that they were named when creating the index. I always use the >= condition, and very rarely, the = condition. All the leftmost index column must be named, but the order is unimportant. You can use (a BETWEEN x AND y) instead of (a>=x AND a<=y), it is cleaner. However, I am concerned that I must place the most selective column first in my index. I cannot tell, a priori, which column will be most selective. That depends on the nature of search, which can vary widely each time. Are you saying that if my first column is not selective, even though the remaining columns are, the planner may choose not to use the index after seeing that the first column is not very selective? I thought this was true but made some tests and the index scanner is smart. Try this : CREATE TABLE test (id serial primary key, a INTEGER, z INTEGER, e INTEGER, r INTEGER, t INTEGER, y INTEGER ) WITHOUT OIDS; INSERT 1M rows into table using a plpgsql function, with a,z,e,r,t,y being floor(random()*10) for instance. Then you can try various selects. a,z,e,r,t,y are a linear distribution between 0 and 9 included, so : a>=A AND z>=Z ... y>=Y gives a result set of about (10-A)*(10-Z)*...*(10-Y) results. You'll see the planner will use an index scan when needed. You can try the easiest case (a>=9) which just explores one part of the tree, and the worst case which explores a part of all leafs (y>=9). Both should yield about the same number of results, but the first should be faster. To know how much, just try ;) That seems like an oversight, IMHO. Shouldn't the overall effect of using all the columns be considered before choosing not to use an index scan? I think it is. There are no cross column correlation stats though. Since I'm using every column of my multi-column index for every search, and I always use >=, Explain Analyze always shows that every column is considered in the index scan. However, that is only when the index scan is used. Sometimes, Explain Analyze shows it is not used. That appears to happen when my search condition is very general. This it to be expected, so I am not worried. Most of my searches will be intermediate, namely not VERY selective, but also not VERY general. So the idea of the multi-column index is to "characterize" each row sufficiently, even when it is a perfectly ordinary row with no ONE feature being distinctive, but rather several features together giving it it's distinctive character. That is my interpretation of the multi-column index. If you have some features which are highly selective, you can create a single column index on them. It won't be used often, but when it will, it will really work. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] My postmaster just crashed !
It seems that contrib/intagg crashes my server : - select int_agg_final_array(1); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. - Here's the logfile : TRACE: le systeme de bases de donnees est pret TRACE: processus serveur (PID 26882) a ete arrete par le signal 11 TRACE: Arret des autres processus serveur actifs FATAL: Le systeme de bases de donnees est en cours de restauration TRACE: Tous les processus serveur se sont arretes, reinitialisation TRACE: le systeme de bases de donnees a ete interrompu a 2005-01-27 17:22:48 CET TRACE: l'enregistrement du point de verification est a 1/F3854A94 TRACE: re-execution de l'enregistrement a 1/F3854A94 ; l'annulation de l'enregistrement est a 0/0 ; arret TRUE TRACE: prochain identifiant de transaction : 5800 ; prochain OID : 4533584 TRACE: le systeme de bases de donnees n'a pas ete arrete proprement ; restauration automatique en cours TRACE: enregistrement de longueur nulle sur 1/F3854AD0 TRACE: la re-execution n'est pas requise TRACE: le systeme de bases de donnees est pret What do you think ? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] self-join on subselect
Thanks, I was hoping there was some way to avoid it. You have to write the subselect twice if you want to match up different rows. In some cases it might be worth making a copy in a temp table. For simple subselects where there is an index on id, leaving it as is should work fine. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] Extended unit
I wonder if it makes sense to implement the units as separate data types ? Cause that's what they are really. So "amper" would be a data type which aliases one of the numeric data types (depending on what precision range you need), but does not allow to be added with anything else than "amper". Any other interaction with other units (read data types) would be achieved by defining the needed operators on the respective data types (read units). You'd have to create a postgres datatype for every variation on m, m/s, m/s², etc... which would be kinda unworkable... I think it's better to have one datatype (number with unit) and have the operators raise an exception when trying to add incompatible units ? As for the encoding, why not just use a (float, text) with the text as a parseable representation of the unit, which could as well be the SI unit (like m/s) which would be a lot more flexible than bitfields. Problem is I think it'll always be variable length. Maybe there is enough space in an int64 to fit it all ? Maybe with huffman coding ? Is it really important to save a few bytes ? I don't think so. For table columns, the type would be saved in the column definition so you'd just have a float anyway. I think it's an exciting project ! ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] visualizing B-tree index coverage
I think you missed an important "feature" of multicolumn indexes, that you better not use 'OR' in your expressions. You seem to want only to use '>=' so this should be OK. Suppose you have 3 columns a,z,e containing values linearly distributed between ... select min(a),max(a),min(z),max(z),min(e),max(e) from test; min | max | min | max | min | max -+-+-+-+-+- 0 | 13 | 0 | 99 | 0 | 99 For instance the following query is indexed : explain analyze select * from test where a>=0 and z>=90 and e>=0; QUERY PLAN - Index Scan using testa on test (cost=0.00..1637.56 rows=11345 width=16) (actual time=0.085..51.441 rows=13000 loops=1) Index Cond: ((a >= 0) AND (z >= 90) AND (e >= 0)) Total runtime: 56.307 ms The following is only partially indexed : explain analyze select * from test where (a=1 or a=2) and (z=1 or z=8) and e>=0; QUERY PLAN Index Scan using testa, testa on test (cost=0.00..3269.06 rows=346 width=16) (actual time=0.328..52.961 rows=400 loops=1) Index Cond: ((a = 1) OR (a = 2)) Filter: (((z = 1) OR (z = 8)) AND (e >= 0)) Total runtime: 53.297 ms You see the 'index cond' field which is what determines the fetched rows, which are then fetched and filtered with the 'filter' expression. Having the most selective index cond is important because it will diminish the number of rows to be fetched. However, in your case the filter expression is also beneficial because any row eliminated by the filter will not need to go through your expensive matching function. In this case : SELECT count(*) FROM test; => 131072 SELECT count(*) FROM test WHERE ((a = 1) OR (a = 2)); => 2 SELECT count(*) FROM test WHERE (a=1 or a=2) and (z=1 or z=8) and e>=0; => 400 In this case the index fetches 20k rows out of 131072 but only 400 are used... If you don't use OR, index use is more likely : explain analyze select * from test where (a,z,e) >= (0,50,80); QUERY PLAN - Index Scan using testa on test (cost=0.00..1669.78 rows=12627 width=16) (actual time=0.087..58.316 rows=13000 loops=1) Index Cond: ((a >= 0) AND (z >= 50) AND (e >= 80)) Total runtime: 63.049 ms Here you have a full index scan. To determine the efficiency of your indexes, you can thus use this method, and look at the 'index cond' and 'filter' expressions, and counting the rows matched by each. particular number of columns for indexing. I don't want to use too many, nor too few columns. I also want to optimize the nature(which atom types, bond types, etc.) of the count columns. While I could do this and use the speedup as the measure of success, I think that if my B-tree were "covering" the data well, I would get the best results. Covering means finding that optimal situation where there is not one index for all rows and also not a unique index for every row - something inbetween would be ideal, or is that basically a wrong idea? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] Recursive queries
Check out ltree http://www.sai.msu.su/~megera/postgres/gist/ltree/ On Tue, 25 Jan 2005 22:03:58 +0100, tmp <[EMAIL PROTECTED]> wrote: I don't think anybody has written the syntactic sugar, but someone did write a function that provides equivalent output. I think it is important that the funcionality lies in the database engine itself: In that way it can more efficiently make use of the optimizer. Also, I think this "recursive" feature is *the* most important upcoming improvements: Currently there are simply no efficient way of fetching linked structures, which however is quite common in many areas. Regards ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [GENERAL] Extended unit
If you allow multiplication and division, you'd need to store not only one type, but an expression like m.s^-2, etc. You'll end up with something with Maple. Isn't there some free open source algebraic computation toolkit with equations and units somewhere ? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] on update / on delete performance of foreign keys
It's a bit more complicated than that as there are also locking issues, like what if other processes insert rows while some others are being deleted, really the whole thing isn't trivial. Since postgres already incoporates code to check foreign keys more efficiently (when doing alter table ... add constraint .. foreign key, postgres seems to use a merge or a hash join, instead of a nested loop), I wondered how hard it would be to use this for the triggers too. I imagined creating a statement-level trigger in parallel to the row-level triggers, and defining some threshold (let's say, more than 10% of the rows deleted). If the threshold is reached, the row-level trigger would just do nothing, and the statement-level trigger would delete the referencing records doing a join. Would this be feasable? And would it be something a newbie could tackle, or is it more involved than I think? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] Calculating a moving average
Make a plpgsql function which will iterate over the rows on which the moving average is to be done (FOR row IN SELECT), of course use the correct order, then use an array as a FIFO, add a row to the moving average and push it, pop the old one and substract it. Roundoff errors will bite your nether regions. I spose the purpose of this is not to fetch the whole thing so that your moving average will not have one result row per source row (or you'd do it in the application), thus you can pre-shrink your dataset by putting some avg() and group by in your source select. Hi, I need to calculate a moving average and I would like to do it with SQL, or a Pg function built for this purpose. I'm on Pg 7.4. Is this possible in Pg without a bunch of self joins, or is there a funtion available? Thanks, Mike ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org