Re: [GENERAL] Sorting CSV string and removing Duplicates
I have a csv string in a text field that is unsorted and contains duplicates. Is there a simple way to remove these and sort the string. E.g 2,18,20,23,1,27,1,2,8,16,17,18,20,22,23,27 Do you need to eventually load the data in Postgres? I'd personally use python to deal with this, we're talking 4 or 5 lines here, if even. I suspect you can do the same with perl or ruby or whatever is your weapon of choice. How columns does your csv file has? Is it a one-line file? -- http://yves.zioup.com gpg: 4096R/32B0F416 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting CSV string and removing Duplicates
On Mon, Jul 27, 2015 at 12:57 PM, dinesh kumar dineshkuma...@gmail.com wrote: On Mon, Jul 27, 2015 at 12:53 PM, Alex Magnum magnum11...@gmail.com wrote: Hello, I have a csv string in a text field that is unsorted and contains duplicates. Is there a simple way to remove these and sort the string. E.g 2,18,20,23,1,27,1,2,8,16,17,18,20,22,23,27 i tried string to array and unique but that did not work... Any suggestions on how to do this without writing a function? Any help is appreciated. Are you looking for this. postgres=# SELECT unnest(string_to_array(t, ',')) from test group by 1; unnest 2 18 8 20 22 16 27 17 23 1 (10 rows) OR Might be something like this postgres=# WITH sortedstring as postgres-# ( postgres(# SELECT unnest(string_to_array(t, ','))::int from test group by 1 ORDER BY 1 postgres(# ) SELECT array_agg(unnest) FROM sortedstring; array_agg -- {1,2,8,16,17,18,20,22,23,27} (1 row) Regards, Dinesh manojadinesh.blogspot.com Regards, Dinesh manojadinesh.blogspot.com Thanks A
Re: [GENERAL] Sorting CSV string and removing Duplicates
Hello, I have a csv string in a text field that is unsorted and contains duplicates. Is there a simple way to remove these and sort the string. E.g 2,18,20,23,1,27,1,2,8,16,17,18,20,22,23,27 i tried string to array and unique but that did not work... Any suggestions on how to do this without writing a function? Any help is appreciated. Thanks A chris=# SELECT distinct x::int from unnest(string_to_array('2,18,20,23,1,27,1,2,8,16,17,18,20,22,23,27', ',')) x order by x::int; x 1 2 8 16 17 18 20 22 23 27 (10 rows) Bye, Chris. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting CSV string and removing Duplicates
On Mon, Jul 27, 2015 at 12:53 PM, Alex Magnum magnum11...@gmail.com wrote: Hello, I have a csv string in a text field that is unsorted and contains duplicates. Is there a simple way to remove these and sort the string. E.g 2,18,20,23,1,27,1,2,8,16,17,18,20,22,23,27 i tried string to array and unique but that did not work... Any suggestions on how to do this without writing a function? Any help is appreciated. Are you looking for this. postgres=# SELECT unnest(string_to_array(t, ',')) from test group by 1; unnest 2 18 8 20 22 16 27 17 23 1 (10 rows) Regards, Dinesh manojadinesh.blogspot.com Thanks A
[GENERAL] Sorting CSV string and removing Duplicates
Hello, I have a csv string in a text field that is unsorted and contains duplicates. Is there a simple way to remove these and sort the string. E.g 2,18,20,23,1,27,1,2,8,16,17,18,20,22,23,27 i tried string to array and unique but that did not work... Any suggestions on how to do this without writing a function? Any help is appreciated. Thanks A
Re: [GENERAL] Sorting CSV string and removing Duplicates
Hi Danish, yes thats the one I was looking for. Thanks a lot!!! On Tue, Jul 28, 2015 at 1:32 AM, dinesh kumar dineshkuma...@gmail.com wrote: On Mon, Jul 27, 2015 at 12:57 PM, dinesh kumar dineshkuma...@gmail.com wrote: On Mon, Jul 27, 2015 at 12:53 PM, Alex Magnum magnum11...@gmail.com wrote: Hello, I have a csv string in a text field that is unsorted and contains duplicates. Is there a simple way to remove these and sort the string. E.g 2,18,20,23,1,27,1,2,8,16,17,18,20,22,23,27 i tried string to array and unique but that did not work... Any suggestions on how to do this without writing a function? Any help is appreciated. Are you looking for this. postgres=# SELECT unnest(string_to_array(t, ',')) from test group by 1; unnest 2 18 8 20 22 16 27 17 23 1 (10 rows) OR Might be something like this postgres=# WITH sortedstring as postgres-# ( postgres(# SELECT unnest(string_to_array(t, ','))::int from test group by 1 ORDER BY 1 postgres(# ) SELECT array_agg(unnest) FROM sortedstring; array_agg -- {1,2,8,16,17,18,20,22,23,27} (1 row) Regards, Dinesh manojadinesh.blogspot.com Regards, Dinesh manojadinesh.blogspot.com Thanks A
[GENERAL] Sorting rows by a column and storing a row number
Hello, I have a card game for each I'd like to introduce weekly tournaments. I'm going to save the score (virtual money) won by each player into: create table pref_money ( id varchar(32) references pref_users, yw char(7) default to_char(current_timestamp, '-WW'), money real ); create index pref_money_yw_index on pref_money(yw); This way I don't have to perform any special calculations at the end of a week to find the weekly winner - just select all records for the current year-week, sort them by money column and take the 1st one. But I wonder, if there is a nice way in SQL to tell an interested user his current rank in the table? I.e. given a user id, sort all records by the money column and then let him know his rank. I'm sure I can implement this in Perl, but then this will have to be a cronjob, because I'll have to perform somewhat complex calculations for each user id and store them into another table. But a cronjob isn't nice, because it won't show the rank in real time. Is there maybe an elegant and quick way for this in SQL? Thank you for any ideas Alex P.S.: Using postgresql-8.4.5-1PGDG.rhel5 with CentOS 5.5 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting rows by a column and storing a row number
Use the Postgres window functions like rank(); this is what they're for. http://www.postgresql.org/docs/8.4/interactive/queries-table-expressions.html#QUERIES-WINDOW -- Darren Duncan Alexander Farber wrote: Hello, I have a card game for each I'd like to introduce weekly tournaments. I'm going to save the score (virtual money) won by each player into: create table pref_money ( id varchar(32) references pref_users, yw char(7) default to_char(current_timestamp, '-WW'), money real ); create index pref_money_yw_index on pref_money(yw); This way I don't have to perform any special calculations at the end of a week to find the weekly winner - just select all records for the current year-week, sort them by money column and take the 1st one. But I wonder, if there is a nice way in SQL to tell an interested user his current rank in the table? I.e. given a user id, sort all records by the money column and then let him know his rank. I'm sure I can implement this in Perl, but then this will have to be a cronjob, because I'll have to perform somewhat complex calculations for each user id and store them into another table. But a cronjob isn't nice, because it won't show the rank in real time. Is there maybe an elegant and quick way for this in SQL? Thank you for any ideas Alex P.S.: Using postgresql-8.4.5-1PGDG.rhel5 with CentOS 5.5 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting with materialized paths
On Sun, May 9, 2010 at 8:33 AM, Ovid curtis_ovid_...@yahoo.com wrote: My apologies. This isn't PG-specific, but since this is running on PostgreSQL 8.4, maybe there are specific features which might help. I have a tree structure in a table and it uses materialized paths to allow me to find children quickly. However, I also need to sort the results depth-first, as one would expect with threaded forum replies. id | parent_id | matpath | created +---+-+ 2 | 1 | 1 | 2010-05-08 15:18:37.987544 3 | 1 | 1 | 2010-05-08 17:38:14.125377 4 | 1 | 1 | 2010-05-08 17:38:57.26743 5 | 1 | 1 | 2010-05-08 17:43:28.211708 7 | 1 | 1 | 2010-05-08 18:18:11.849735 6 | 2 | 1.2 | 2010-05-08 17:50:43.288759 9 | 5 | 1.5 | 2010-05-09 14:02:43.818646 8 | 6 | 1.2.6 | 2010-05-09 14:01:17.632695 So the final results should actually be sorted like this: id | parent_id | matpath | created +---+-+ 2 | 1 | 1 | 2010-05-08 15:18:37.987544 6 | 2 | 1.2 | 2010-05-08 17:50:43.288759 8 | 6 | 1.2.6 | 2010-05-09 14:01:17.632695 3 | 1 | 1 | 2010-05-08 17:38:14.125377 4 | 1 | 1 | 2010-05-08 17:38:57.26743 5 | 1 | 1 | 2010-05-08 17:43:28.211708 9 | 5 | 1.5 | 2010-05-09 14:02:43.818646 7 | 1 | 1 | 2010-05-08 18:18:11.849735 Rationale: this is for a threaded forum and id 6 is a reply to id 2, so it needs to show up after that one. Here's the rough structure of what the output would look like (imagine an HTML forum): * id 1 (root post) * id 2 * id 6 * id 8 * id 3 * id 4 * id 5 * id 9 * id 7 How would I work that out? Can I do that in straight SQL or should additional information be added to this table? This is (once more) a flat query if you use a set / subset tree implementation. Joe Celko's book Trees and Hierarchies in SQL for Smarties might be the fastest way to get up to speed on this, but you can also figure it out if you spend a bit of time with Google Basically, every node in the tree is a table row with two columns, say left and right. All children are contained within the left and right of the parent. Pre-order tree traversal gives the algorithm for assigning left and right. Once done, your problem is solved by ordering on left. -- Peter Hunsberger -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting with materialized paths
On Sun, May 9, 2010 at 4:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: Ovid curtis_ovid_...@yahoo.com writes: My apologies. This isn't PG-specific, but since this is running on PostgreSQL 8.4, maybe there are specific features which might help. I have a tree structure in a table and it uses materialized paths to allow me to find children quickly. However, I also need to sort the results depth-first, as one would expect with threaded forum replies. I think contrib/ltree might help you here. However, it seems to sort node names textually rather than numerically, so you might need to change it a bit for your own purposes. That's rather unfortunate. Ltree is awfully convenient and it would be nice to be able to use it. If you just used plain Postgres arrays of integers you would get the sorting you want. But you lose all the useful ltree operators for trees. -- greg -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting with materialized paths
On 10 May 2010, at 20:06, Greg Stark wrote: On Sun, May 9, 2010 at 4:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: Ovid curtis_ovid_...@yahoo.com writes: My apologies. This isn't PG-specific, but since this is running on PostgreSQL 8.4, maybe there are specific features which might help. I have a tree structure in a table and it uses materialized paths to allow me to find children quickly. However, I also need to sort the results depth-first, as one would expect with threaded forum replies. I think contrib/ltree might help you here. However, it seems to sort node names textually rather than numerically, so you might need to change it a bit for your own purposes. That's rather unfortunate. Ltree is awfully convenient and it would be nice to be able to use it. If you just used plain Postgres arrays of integers you would get the sorting you want. But you lose all the useful ltree operators for trees. I recall from the docs that you can create arrays of ltrees. It has some special operators for that. I couldn't figure out what the use case for those ltree-arrays was (the docs are rather sparse), but it might just be what you're looking for. Alban Hertroys -- Screwing up is an excellent way to attach something to the ceiling. !DSPAM:737,4be8791c10411720337464! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting with materialized paths
Ovid wrote on 09.05.2010 15:33: My apologies. This isn't PG-specific, but since this is running on PostgreSQL 8.4, maybe there are specific features which might help. I have a tree structure in a table and it uses materialized paths to allow me to find children quickly. However, I also need to sort the results depth-first, as one would expect with threaded forum replies. id | parent_id | matpath | created +---+-+ 2 | 1 | 1 | 2010-05-08 15:18:37.987544 3 | 1 | 1 | 2010-05-08 17:38:14.125377 4 | 1 | 1 | 2010-05-08 17:38:57.26743 5 | 1 | 1 | 2010-05-08 17:43:28.211708 7 | 1 | 1 | 2010-05-08 18:18:11.849735 6 | 2 | 1.2 | 2010-05-08 17:50:43.288759 9 | 5 | 1.5 | 2010-05-09 14:02:43.818646 8 | 6 | 1.2.6 | 2010-05-09 14:01:17.632695 So the final results should actually be sorted like this: id | parent_id | matpath | created +---+-+ 2 | 1 | 1 | 2010-05-08 15:18:37.987544 6 | 2 | 1.2 | 2010-05-08 17:50:43.288759 8 | 6 | 1.2.6 | 2010-05-09 14:01:17.632695 3 | 1 | 1 | 2010-05-08 17:38:14.125377 4 | 1 | 1 | 2010-05-08 17:38:57.26743 5 | 1 | 1 | 2010-05-08 17:43:28.211708 9 | 5 | 1.5 | 2010-05-09 14:02:43.818646 7 | 1 | 1 | 2010-05-08 18:18:11.849735 Try this: with recursive thread_display (id, parent_id, matpath, created, sort_key) as ( select id, parent_id, matpath, created, array[id] as sort_key from threads where id = 1 union all select c.id, c.parent_id, c.matpath, c.created, p.sort_key||array[c.id] from threads c join thread_display p on c.parent_id = p.id ) select id, parent_id, matpath, created from thread_display order by sort_key; Thomas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Sorting with materialized paths
My apologies. This isn't PG-specific, but since this is running on PostgreSQL 8.4, maybe there are specific features which might help. I have a tree structure in a table and it uses materialized paths to allow me to find children quickly. However, I also need to sort the results depth-first, as one would expect with threaded forum replies. id | parent_id | matpath | created +---+-+ 2 | 1 | 1 | 2010-05-08 15:18:37.987544 3 | 1 | 1 | 2010-05-08 17:38:14.125377 4 | 1 | 1 | 2010-05-08 17:38:57.26743 5 | 1 | 1 | 2010-05-08 17:43:28.211708 7 | 1 | 1 | 2010-05-08 18:18:11.849735 6 | 2 | 1.2 | 2010-05-08 17:50:43.288759 9 | 5 | 1.5 | 2010-05-09 14:02:43.818646 8 | 6 | 1.2.6 | 2010-05-09 14:01:17.632695 So the final results should actually be sorted like this: id | parent_id | matpath | created +---+-+ 2 | 1 | 1 | 2010-05-08 15:18:37.987544 6 | 2 | 1.2 | 2010-05-08 17:50:43.288759 8 | 6 | 1.2.6 | 2010-05-09 14:01:17.632695 3 | 1 | 1 | 2010-05-08 17:38:14.125377 4 | 1 | 1 | 2010-05-08 17:38:57.26743 5 | 1 | 1 | 2010-05-08 17:43:28.211708 9 | 5 | 1.5 | 2010-05-09 14:02:43.818646 7 | 1 | 1 | 2010-05-08 18:18:11.849735 Rationale: this is for a threaded forum and id 6 is a reply to id 2, so it needs to show up after that one. Here's the rough structure of what the output would look like (imagine an HTML forum): * id 1 (root post) * id 2 * id 6 * id 8 * id 3 * id 4 * id 5 * id 9 * id 7 How would I work that out? Can I do that in straight SQL or should additional information be added to this table? Cheers, Ovid -- Buy the book - http://www.oreilly.com/catalog/perlhks/ Tech blog- http://blogs.perl.org/users/ovid/ Twitter - http://twitter.com/OvidPerl Official Perl 6 Wiki - http://www.perlfoundation.org/perl6 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting with materialized paths
Ovid curtis_ovid_...@yahoo.com writes: My apologies. This isn't PG-specific, but since this is running on PostgreSQL 8.4, maybe there are specific features which might help. I have a tree structure in a table and it uses materialized paths to allow me to find children quickly. However, I also need to sort the results depth-first, as one would expect with threaded forum replies. I think contrib/ltree might help you here. However, it seems to sort node names textually rather than numerically, so you might need to change it a bit for your own purposes. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Tue, Feb 23, 2010 at 6:48 AM, Scott Marlowe scott.marl...@gmail.com wrote: I'm relieved that Postgresql itself does not, in fact, suck, but slightly disappointed in the behavior of psql. I suppose it needs to buffer everything in memory to properly format its tabular output, among other possible reasons I could imagine. It's best when working with big sets to do so with a cursor and fetch a few thousand rows at a time. It's how we handle really big sets at work and it works like a charm in keeping the client from bogging down with a huge memory footprint. You can do \set FETCH_COUNT to have psql use a cursor automatically. -- greg -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
I am under the impression that MySQL does not have anything resembling Postgres' support for regular expressions. Though some might think that regular expressions are a sort of poor man's SQL, in any application which manages large amounts of text they are crucial. Postgres definitely does not suck. Is this the authoratative webpage for Snowball (which I never realized was a play on Snobol)? http://snowball.tartarus.org/ Thanks, John On Feb 23, 2010, at 6:51 AM, Yang Zhang wrote: I'm relieved that Postgresql itself does not, in fact, suck, -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
Greg Stark wrote: You can do \set FETCH_COUNT to have psql use a cursor automatically. It seems like a big win in this case. What would be the downside of having a fetch_count set default in psql? regards Yeb Havinga -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Tue, Feb 23, 2010 at 9:48 AM, Yeb Havinga yebhavi...@gmail.com wrote: Greg Stark wrote: You can do \set FETCH_COUNT to have psql use a cursor automatically. It seems like a big win in this case. What would be the downside of having a fetch_count set default in psql? They were mentioned previously in this thread: 1) If an error occurs partway through the execution of the query you might receive part of the result set. 2) psql won't be able to align the columns properly -- greg -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Tue, Feb 23, 2010 at 00:02, Yang Zhang yanghates...@gmail.com wrote: Thing is, this is how I got here: - ran complex query that does SELECT INTO. - that never terminated, so killed it and tried a simpler SELECT (the subject of this thread) from psql to see how long that would take. You might have better luck if you paste the EXPLAIN ANALYZE of the SELECT INTO here (Maybe a new thread? Maybe on -performance? use your judgement...). But I bet if its doing something with transactionid like your straight select was, an index would help. If you are just using SELECT INTO to copy all of the data into a new table... COPY might be faster or CREATE TABLE AS. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
Hi, On Tue, Feb 23, 2010 at 12:51 AM, Yang Zhang yanghates...@gmail.com wrote: When running the query in MySQL InnoDB: $ vmstat 10 procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 13 13733604 83020 5648 2193884 3 3 936 168 2 1 4 2 89 5 0 1 12 13749952 80164 5600 2178032 0 4354 908 4379 3586 2638 0 1 38 60 0 0 19 13762228 80576 5556 2145220 208 3527 1280 3690 3668 2635 1 1 39 59 0 0 19 13778632 79420 5560 2135228 52 4186 1046 4191 3682 2418 0 1 37 62 0 [snip] I'm guessing the swap numbers are because MySQL uses mmap? InnoDB doesn't use mmap. Baron -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Sorting performance vs. MySQL
I have the exact same table of data in both MySQL and Postgresql. In Postgresql: tpcc=# \d metarelcloud_transactionlog Table public.metarelcloud_transactionlog Column| Type | Modifiers -+---+-- id | integer | not null default nextval('metarelcloud_transactionlog_id_seq'::regclass) transactionid | integer | not null queryid | smallint | not null tableid | character varying(30) | not null tupleid | integer | not null querytype | character varying | not null graphpartition | smallint | replicatedpartition | smallint | justifiedpartition | smallint | hashpartition | smallint | nodeid | integer | manualpartition | smallint | Indexes: metarelcloud_transactionlog_pkey PRIMARY KEY, btree (id) Check constraints: metarelcloud_transactionlog_graphpartition_check CHECK (graphpartition = 0) metarelcloud_transactionlog_hashpartition_check CHECK (hashpartition = 0) metarelcloud_transactionlog_justifiedpartition_check CHECK (justifiedpartition = 0) metarelcloud_transactionlog_manualpartition_check CHECK (manualpartition = 0) metarelcloud_transactionlog_querytype_check CHECK (querytype::text = ANY (ARRAY['select'::character varying, 'insert'::character varying, 'delete'::character varying, 'update'::character varying]::text[])) metarelcloud_transactionlog_replicatedpartition_check CHECK (replicatedpartition = 0) In MySQL: CREATE TABLE `metarelcloud_transactionlog` ( `id` int(11) NOT NULL AUTO_INCREMENT, `transactionid` int(11) NOT NULL, `queryid` tinyint(4) NOT NULL, `tableid` varchar(30) NOT NULL, `tupleid` int(11) NOT NULL, `querytype` enum('select','insert','delete','update') NOT NULL, `graphpartition` tinyint(3) unsigned DEFAULT NULL, `replicatedpartition` tinyint(3) unsigned DEFAULT NULL, `justifiedpartition` tinyint(3) unsigned DEFAULT NULL, `hashpartition` tinyint(3) unsigned DEFAULT NULL, `nodeid` int(11) DEFAULT NULL, `manualpartition` tinyint(3) unsigned DEFAULT NULL, PRIMARY KEY (`id`), KEY `transactionid` (`transactionid`), KEY `tableid` (`tableid`,`tupleid`), KEY `nodeid` (`nodeid`) ) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1 I'm running: select * from metarelcloud_transactionlog order by transactionid; It takes MySQL 6 minutes, but Postgresql is still running after 70 minutes. Is there something like a glaring misconfiguration that I'm overlooking? Thanks in advance. -- Yang Zhang http://www.mit.edu/~y_z/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
hello the speed depends on setting of working_memory. Try to increase a working_memory set working_memory to '10MB'; Regards Pavel Stehule 2010/2/22 Yang Zhang yanghates...@gmail.com: I have the exact same table of data in both MySQL and Postgresql. In Postgresql: tpcc=# \d metarelcloud_transactionlog Table public.metarelcloud_transactionlog Column | Type | Modifiers -+---+-- id | integer | not null default nextval('metarelcloud_transactionlog_id_seq'::regclass) transactionid | integer | not null queryid | smallint | not null tableid | character varying(30) | not null tupleid | integer | not null querytype | character varying | not null graphpartition | smallint | replicatedpartition | smallint | justifiedpartition | smallint | hashpartition | smallint | nodeid | integer | manualpartition | smallint | Indexes: metarelcloud_transactionlog_pkey PRIMARY KEY, btree (id) Check constraints: metarelcloud_transactionlog_graphpartition_check CHECK (graphpartition = 0) metarelcloud_transactionlog_hashpartition_check CHECK (hashpartition = 0) metarelcloud_transactionlog_justifiedpartition_check CHECK (justifiedpartition = 0) metarelcloud_transactionlog_manualpartition_check CHECK (manualpartition = 0) metarelcloud_transactionlog_querytype_check CHECK (querytype::text = ANY (ARRAY['select'::character varying, 'insert'::character varying, 'delete'::character varying, 'update'::character varying]::text[])) metarelcloud_transactionlog_replicatedpartition_check CHECK (replicatedpartition = 0) In MySQL: CREATE TABLE `metarelcloud_transactionlog` ( `id` int(11) NOT NULL AUTO_INCREMENT, `transactionid` int(11) NOT NULL, `queryid` tinyint(4) NOT NULL, `tableid` varchar(30) NOT NULL, `tupleid` int(11) NOT NULL, `querytype` enum('select','insert','delete','update') NOT NULL, `graphpartition` tinyint(3) unsigned DEFAULT NULL, `replicatedpartition` tinyint(3) unsigned DEFAULT NULL, `justifiedpartition` tinyint(3) unsigned DEFAULT NULL, `hashpartition` tinyint(3) unsigned DEFAULT NULL, `nodeid` int(11) DEFAULT NULL, `manualpartition` tinyint(3) unsigned DEFAULT NULL, PRIMARY KEY (`id`), KEY `transactionid` (`transactionid`), KEY `tableid` (`tableid`,`tupleid`), KEY `nodeid` (`nodeid`) ) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1 I'm running: select * from metarelcloud_transactionlog order by transactionid; It takes MySQL 6 minutes, but Postgresql is still running after 70 minutes. Is there something like a glaring misconfiguration that I'm overlooking? Thanks in advance. -- Yang Zhang http://www.mit.edu/~y_z/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
There is no index on the column transactionid in your PostgreSQL- table, as there is in your MySQL-table. This explains the difference. CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog (transactionid); Op 22 feb 2010, om 19:10 heeft Yang Zhang het volgende geschreven: I have the exact same table of data in both MySQL and Postgresql. In Postgresql: tpcc=# \d metarelcloud_transactionlog Table public.metarelcloud_transactionlog Column| Type | Modifiers -+--- + -- id | integer | not null default nextval('metarelcloud_transactionlog_id_seq'::regclass) transactionid | integer | not null queryid | smallint | not null tableid | character varying(30) | not null tupleid | integer | not null querytype | character varying | not null graphpartition | smallint | replicatedpartition | smallint | justifiedpartition | smallint | hashpartition | smallint | nodeid | integer | manualpartition | smallint | Indexes: metarelcloud_transactionlog_pkey PRIMARY KEY, btree (id) Check constraints: metarelcloud_transactionlog_graphpartition_check CHECK (graphpartition = 0) metarelcloud_transactionlog_hashpartition_check CHECK (hashpartition = 0) metarelcloud_transactionlog_justifiedpartition_check CHECK (justifiedpartition = 0) metarelcloud_transactionlog_manualpartition_check CHECK (manualpartition = 0) metarelcloud_transactionlog_querytype_check CHECK (querytype::text = ANY (ARRAY['select'::character varying, 'insert'::character varying, 'delete'::character varying, 'update'::character varying]::text[])) metarelcloud_transactionlog_replicatedpartition_check CHECK (replicatedpartition = 0) In MySQL: CREATE TABLE `metarelcloud_transactionlog` ( `id` int(11) NOT NULL AUTO_INCREMENT, `transactionid` int(11) NOT NULL, `queryid` tinyint(4) NOT NULL, `tableid` varchar(30) NOT NULL, `tupleid` int(11) NOT NULL, `querytype` enum('select','insert','delete','update') NOT NULL, `graphpartition` tinyint(3) unsigned DEFAULT NULL, `replicatedpartition` tinyint(3) unsigned DEFAULT NULL, `justifiedpartition` tinyint(3) unsigned DEFAULT NULL, `hashpartition` tinyint(3) unsigned DEFAULT NULL, `nodeid` int(11) DEFAULT NULL, `manualpartition` tinyint(3) unsigned DEFAULT NULL, PRIMARY KEY (`id`), KEY `transactionid` (`transactionid`), KEY `tableid` (`tableid`,`tupleid`), KEY `nodeid` (`nodeid`) ) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1 I'm running: select * from metarelcloud_transactionlog order by transactionid; It takes MySQL 6 minutes, but Postgresql is still running after 70 minutes. Is there something like a glaring misconfiguration that I'm overlooking? Thanks in advance. -- Yang Zhang http://www.mit.edu/~y_z/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general Frank Heikens frankheik...@mac.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens frankheik...@mac.com wrote: There is no index on the column transactionid in your PostgreSQL-table, as there is in your MySQL-table. This explains the difference. CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog (transactionid); Does an index help a sort operation in PostgreSQL? -- Regards, Richard Broersma Jr. Visit the Los Angeles PostgreSQL Users Group (LAPUG) http://pugs.postgresql.org/lapug -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 1:13 PM, Pavel Stehule pavel.steh...@gmail.com wrote: hello the speed depends on setting of working_memory. Try to increase a working_memory set working_memory to '10MB'; It's already at tpcc=# show work_mem; work_mem -- 2kB (1 row) I also wouldn't have imagined an external merge-sort as being very memory-intensive--wouldn't it only enough buffer space to read 2x and write 1x in big-enough chunks for mostly-sequential access? -- Yang Zhang http://www.mit.edu/~y_z/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 1:30 PM, Richard Broersma richard.broer...@gmail.com wrote: On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens frankheik...@mac.com wrote: There is no index on the column transactionid in your PostgreSQL-table, as there is in your MySQL-table. This explains the difference. CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog (transactionid); Does an index help a sort operation in PostgreSQL? I also share the same doubt. An external merge-sort needs to make complete passes over the entire dataset, with no index-directed accesses. -- Yang Zhang http://www.mit.edu/~y_z/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
Op 22 feb 2010, om 19:30 heeft Richard Broersma het volgende geschreven: On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens frankheik...@mac.com wrote: There is no index on the column transactionid in your PostgreSQL- table, as there is in your MySQL-table. This explains the difference. CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog (transactionid); Does an index help a sort operation in PostgreSQL? Yes it does, see the manual: http://www.postgresql.org/docs/8.4/interactive/indexes-ordering.html Example without index: Sort (cost=804.39..829.39 rows=1 width=4) (actual time=16.006..17.171 rows=1 loops=1) Sort Key: bar Sort Method: quicksort Memory: 491kB - Seq Scan on bla (cost=0.00..140.00 rows=1 width=4) (actual time=0.015..2.236 rows=1 loops=1) Total runtime: 18.098 ms Same query with index (btree): Index Scan Backward using i_bar on bla (cost=0.00..406.25 rows=1 width=4) (actual time=0.093..4.408 rows=1 loops=1) Total runtime: 5.381 ms -- Regards, Richard Broersma Jr. Visit the Los Angeles PostgreSQL Users Group (LAPUG) http://pugs.postgresql.org/lapug Regards, Frank Heikens -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On 22 Feb 2010, at 19:35, Yang Zhang wrote: I also wouldn't have imagined an external merge-sort as being very Where's that external merge-sort coming from? Can you show an explain analyze? If your work-mem is too low there's a good chance that Postgres has to use your disks for sorting, which will obviously be quite slow. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4b82d18510442035320951! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
Yang Zhang escribió: I'm running: select * from metarelcloud_transactionlog order by transactionid; It takes MySQL 6 minutes, but Postgresql is still running after 70 minutes. Is there something like a glaring misconfiguration that I'm overlooking? Thanks in advance. How large is the table, and have you vacuumed it? Did you analyze it? What Pg version is this? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 1:48 PM, Alban Hertroys dal...@solfertje.student.utwente.nl wrote: On 22 Feb 2010, at 19:35, Yang Zhang wrote: I also wouldn't have imagined an external merge-sort as being very Where's that external merge-sort coming from? Can you show an explain analyze? I just assumed that the Sort in the EXPLAIN output meant an external merge-sort, given that the table has over 50 million tuples and is over 3GB, *and* there is no index on the sort key: tpcc=# explain select * from metarelcloud_transactionlog order by transactionid; QUERY PLAN - Sort (cost=8408637.34..8534662.95 rows=50410244 width=17) Sort Key: a.transactionid - Seq Scan on metarelcloud_transactionlog a (cost=0.00..925543.44 rows=50410244 width=17) (3 rows) Anyway, I added the INDEX as suggested by Frank, but it's been 20 minutes and it's still running. With the index, EXPLAIN says: tpcc=# explain select * from metarelcloud_transactionlog order by transactionid; QUERY PLAN - Index Scan using i_transactionid on metarelcloud_transactionlog (cost=0.00..4453076.81 rows=50410164 width=44) (1 row) If your work-mem is too low there's a good chance that Postgres has to use your disks for sorting, which will obviously be quite slow. Relative to the non-terminating 80-minute-so-far sort, Unix sort runs much faster (on the order of several minutes). -- Yang Zhang http://www.mit.edu/~y_z/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 2:03 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Yang Zhang escribió: I'm running: select * from metarelcloud_transactionlog order by transactionid; It takes MySQL 6 minutes, but Postgresql is still running after 70 minutes. Is there something like a glaring misconfiguration that I'm overlooking? Thanks in advance. How large is the table, and have you vacuumed it? Did you analyze it? What Pg version is this? The table has 50.4M tuples. It's been vacuumed and analyzed. I'm using PG 8.3.8 on Fedora 10 x86_64. -- Yang Zhang http://www.mit.edu/~y_z/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
Op 22 feb 2010, om 20:07 heeft Yang Zhang het volgende geschreven: On Mon, Feb 22, 2010 at 1:48 PM, Alban Hertroys dal...@solfertje.student.utwente.nl wrote: On 22 Feb 2010, at 19:35, Yang Zhang wrote: I also wouldn't have imagined an external merge-sort as being very Where's that external merge-sort coming from? Can you show an explain analyze? I just assumed that the Sort in the EXPLAIN output meant an external merge-sort, given that the table has over 50 million tuples and is over 3GB, *and* there is no index on the sort key: tpcc=# explain select * from metarelcloud_transactionlog order by transactionid; QUERY PLAN - Sort (cost=8408637.34..8534662.95 rows=50410244 width=17) Sort Key: a.transactionid - Seq Scan on metarelcloud_transactionlog a (cost=0.00..925543.44 rows=50410244 width=17) (3 rows) Anyway, I added the INDEX as suggested by Frank, but it's been 20 minutes and it's still running. With the index, EXPLAIN says: tpcc=# explain select * from metarelcloud_transactionlog order by transactionid; QUERY PLAN - Index Scan using i_transactionid on metarelcloud_transactionlog (cost=0.00..4453076.81 rows=50410164 width=44) (1 row) Use EXPLAIN ANALYZE to see how the query is executed, gives you more details. If your work-mem is too low there's a good chance that Postgres has to use your disks for sorting, which will obviously be quite slow. Relative to the non-terminating 80-minute-so-far sort, Unix sort runs much faster (on the order of several minutes). Make sure your index does fit into memory, what's the size of the index? -- Yang Zhang http://www.mit.edu/~y_z/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general Frank Heikens -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 2:15 PM, Frank Heikens frankheik...@mac.com wrote: Op 22 feb 2010, om 20:07 heeft Yang Zhang het volgende geschreven: On Mon, Feb 22, 2010 at 1:48 PM, Alban Hertroys dal...@solfertje.student.utwente.nl wrote: On 22 Feb 2010, at 19:35, Yang Zhang wrote: I also wouldn't have imagined an external merge-sort as being very Where's that external merge-sort coming from? Can you show an explain analyze? I just assumed that the Sort in the EXPLAIN output meant an external merge-sort, given that the table has over 50 million tuples and is over 3GB, *and* there is no index on the sort key: tpcc=# explain select * from metarelcloud_transactionlog order by transactionid; QUERY PLAN - Sort (cost=8408637.34..8534662.95 rows=50410244 width=17) Sort Key: a.transactionid - Seq Scan on metarelcloud_transactionlog a (cost=0.00..925543.44 rows=50410244 width=17) (3 rows) Anyway, I added the INDEX as suggested by Frank, but it's been 20 minutes and it's still running. With the index, EXPLAIN says: tpcc=# explain select * from metarelcloud_transactionlog order by transactionid; QUERY PLAN - Index Scan using i_transactionid on metarelcloud_transactionlog (cost=0.00..4453076.81 rows=50410164 width=44) (1 row) Use EXPLAIN ANALYZE to see how the query is executed, gives you more details. As mentioned, this would take a very long time to complete running -- I have not yet seen one successful query completion yet. I'd need to let it run overnight. If your work-mem is too low there's a good chance that Postgres has to use your disks for sorting, which will obviously be quite slow. Relative to the non-terminating 80-minute-so-far sort, Unix sort runs much faster (on the order of several minutes). Make sure your index does fit into memory, what's the size of the index? How might I find out the size and whether it's being fit in memory? -- Yang Zhang http://www.mit.edu/~y_z/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
Yang Zhang escribió: I have the exact same table of data in both MySQL and Postgresql. In Postgresql: I just noticed two things: [snip lots of stuff] 1. ) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1 You're doing a comparison to MyISAM. 2. select * from metarelcloud_transactionlog order by transactionid; You're reading the whole table. This is unlikely to fly very far. I suggest you try some query that's actually going to be used in the real world. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 2:27 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Yang Zhang escribió: I have the exact same table of data in both MySQL and Postgresql. In Postgresql: I just noticed two things: [snip lots of stuff] 1. ) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1 You're doing a comparison to MyISAM. We've actually been using innodb as well; it exhibits similar execution times to MyISAM. 2. select * from metarelcloud_transactionlog order by transactionid; You're reading the whole table. This is unlikely to fly very far. I suggest you try some query that's actually going to be used in the real world. This isn't some microbenchmark. This is part of our actual analytical application. We're running large-scale graph partitioning algorithms. -- Yang Zhang http://www.mit.edu/~y_z/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 12:30 PM, Yang Zhang yanghates...@gmail.com wrote: This isn't some microbenchmark. This is part of our actual analytical application. We're running large-scale graph partitioning algorithms. It's important to see how it runs if you can fit more / most of the data set into memory by cranking up work_mem to something really big (like a gigabyte or two) and if the query planner can switch to some sort of hash algorithm. Also, can you cluster the table on transactionid ? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
Op 22 feb 2010, om 20:28 heeft Yang Zhang het volgende geschreven: If your work-mem is too low there's a good chance that Postgres has to use your disks for sorting, which will obviously be quite slow. Relative to the non-terminating 80-minute-so-far sort, Unix sort runs much faster (on the order of several minutes). Make sure your index does fit into memory, what's the size of the index? How might I find out the size and whether it's being fit in memory? SELECT pg_size_pretty(pg_relation_size('i_transactionid')); -- Yang Zhang http://www.mit.edu/~y_z/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general Frank Heikens -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 2:39 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Mon, Feb 22, 2010 at 12:30 PM, Yang Zhang yanghates...@gmail.com wrote: This isn't some microbenchmark. This is part of our actual analytical application. We're running large-scale graph partitioning algorithms. It's important to see how it runs if you can fit more / most of the data set into memory by cranking up work_mem to something really big (like a gigabyte or two) and if the query planner can switch to some sort of hash algorithm. We're actually using a very small dataset right now. Being bounded by memory capacity is not a scalable approach for our application. Also, can you cluster the table on transactionid ? We can, but that's not really addressing the core issue, which matters to us since the sort itself is only for performing a self merge join on transactionid, and the *very next step* is a group by a.tableid, a.tupleid, b.tableid, b.tupleid (i.e. requiring another sort for the group-agg). -- Yang Zhang http://www.mit.edu/~y_z/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang yanghates...@gmail.com wrote: I have the exact same table of data in both MySQL and Postgresql. In Postgresql: Just wondering, are these on the same exact machine? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 2:52 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang yanghates...@gmail.com wrote: I have the exact same table of data in both MySQL and Postgresql. In Postgresql: Just wondering, are these on the same exact machine? Yes, on the same disk. -- Yang Zhang http://www.mit.edu/~y_z/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 12:50 PM, Yang Zhang yanghates...@gmail.com wrote: On Mon, Feb 22, 2010 at 2:39 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Mon, Feb 22, 2010 at 12:30 PM, Yang Zhang yanghates...@gmail.com wrote: This isn't some microbenchmark. This is part of our actual analytical application. We're running large-scale graph partitioning algorithms. It's important to see how it runs if you can fit more / most of the data set into memory by cranking up work_mem to something really big (like a gigabyte or two) and if the query planner can switch to some sort of hash algorithm. We're actually using a very small dataset right now. Being bounded by memory capacity is not a scalable approach for our application. But the more you can fit into work_mem the faster it will go anyway. So it's still worth a try. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 2:41 PM, Frank Heikens frankheik...@mac.com wrote: Op 22 feb 2010, om 20:28 heeft Yang Zhang het volgende geschreven: If your work-mem is too low there's a good chance that Postgres has to use your disks for sorting, which will obviously be quite slow. Relative to the non-terminating 80-minute-so-far sort, Unix sort runs much faster (on the order of several minutes). Make sure your index does fit into memory, what's the size of the index? How might I find out the size and whether it's being fit in memory? SELECT pg_size_pretty(pg_relation_size('i_transactionid')); pg_size_pretty 1080 MB (1 row) -- Yang Zhang http://www.mit.edu/~y_z/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
Scott Marlowe wrote: On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang yanghates...@gmail.com wrote: I have the exact same table of data in both MySQL and Postgresql. In Postgresql: Just wondering, are these on the same exact machine? Just reading up on this interesting thread. WFIW, 2 years ago I and a collegue of mine did a hardware comparison of early Intel and AMD desktop quadcore processors to run postgres database, with most other parts comparable. The intel processor was 20 to 30 % faster in cpu operations to the (first generation) Phenom at almost everything, except at index creation. The test that the AMD finished in a few minutes, we had to stop on the Intel because it simply didn't finish. We double checked configuration settings and could not find explainable differences. I hesitate to post this information here, because its hard to believe that an actual big difference between the processors exists, and it more likely was something in our test setup. Still: the difference was *only* in index creation, which is kindoff like a qsort yes? egards Yeb Havinga -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
Yang Zhang yanghates...@gmail.com writes: On Mon, Feb 22, 2010 at 1:13 PM, Pavel Stehule pavel.steh...@gmail.com wrote: the speed depends on setting of working_memory. Try to increase a working_memory It's already at 2kB According to your original posting, you're trying to sort something like a gigabyte of data. 20MB is peanuts. I wouldn't recommend increasing the value across-the-board, but setting it to several hundred meg for this particular query might help. How much RAM in your machine anyway? Also, the fact that mysql is faster suggests that having an index does help. Possibly the data is nearly ordered by transactionid, in which case an indexscan would not have random-access problems and would be much faster than an explicit sort. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 12:53 PM, Yang Zhang yanghates...@gmail.com wrote: On Mon, Feb 22, 2010 at 2:52 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang yanghates...@gmail.com wrote: I have the exact same table of data in both MySQL and Postgresql. In Postgresql: Just wondering, are these on the same exact machine? Yes, on the same disk. I'm wondering how much of this could be caching effects. Is the MySQL database warmed up before you started, and the pgsql database is cold and no caching has taken place? What do things like vmstat 10 say while the query is running on each db? First time, second time, things like that. Also, just curios, what's shared_buffers set to on the pgsql instance? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 3:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yang Zhang yanghates...@gmail.com writes: On Mon, Feb 22, 2010 at 1:13 PM, Pavel Stehule pavel.steh...@gmail.com wrote: the speed depends on setting of working_memory. Try to increase a working_memory It's already at 2kB According to your original posting, you're trying to sort something like a gigabyte of data. 20MB is peanuts. I wouldn't recommend increasing the value across-the-board, but setting it to several hundred meg for this particular query might help. How much RAM in your machine anyway? We have 16GB of RAM, but again, Unix sort (and even our own hand-rolled merge-sort) can operate zippily while avoiding consuming additional memory. All the same, we increased work_mem to 1GB, and still the query is not completing. Also, the fact that mysql is faster suggests that having an index does help. Possibly the data is nearly ordered by transactionid, in which case an indexscan would not have random-access problems and would be much faster than an explicit sort. Note that earlier in the thread I tried running this query with an index scan, but it's still much slower. -- Yang Zhang http://www.mit.edu/~y_z/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
Yeb Havinga yebhavi...@gmail.com writes: Just reading up on this interesting thread. WFIW, 2 years ago I and a collegue of mine did a hardware comparison of early Intel and AMD desktop quadcore processors to run postgres database, with most other parts comparable. The intel processor was 20 to 30 % faster in cpu operations to the (first generation) Phenom at almost everything, except at index creation. The test that the AMD finished in a few minutes, we had to stop on the Intel because it simply didn't finish. We double checked configuration settings and could not find explainable differences. I hesitate to post this information here, because its hard to believe that an actual big difference between the processors exists, and it more likely was something in our test setup. Still: the difference was *only* in index creation, which is kindoff like a qsort yes? Interesting. Yes, btree index creation is essentially a sort ... regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
When in doubt - test. Why not remove index in MySQL (or create index in PostgreSQL) and see what happens. Why trying compare apples and oranges? Igor Neyman -Original Message- From: Yang Zhang [mailto:yanghates...@gmail.com] Sent: Monday, February 22, 2010 1:37 PM To: Richard Broersma Cc: Frank Heikens; pgsql-general@postgresql.org Subject: Re: Sorting performance vs. MySQL On Mon, Feb 22, 2010 at 1:30 PM, Richard Broersma richard.broer...@gmail.com wrote: On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens frankheik...@mac.com wrote: There is no index on the column transactionid in your PostgreSQL-table, as there is in your MySQL-table. This explains the difference. CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog (transactionid); Does an index help a sort operation in PostgreSQL? I also share the same doubt. An external merge-sort needs to make complete passes over the entire dataset, with no index-directed accesses. -- Yang Zhang http://www.mit.edu/~y_z/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 5:31 PM, Igor Neyman iney...@perceptron.com wrote: When in doubt - test. Why not remove index in MySQL (or create index in PostgreSQL) and see what happens. Why trying compare apples and oranges? Continue reading this thread -- I also tried using an index in Postgresql. -- Yang Zhang http://www.mit.edu/~y_z/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 3:33 PM, Yang Zhang yanghates...@gmail.com wrote: On Mon, Feb 22, 2010 at 5:31 PM, Igor Neyman iney...@perceptron.com wrote: When in doubt - test. Why not remove index in MySQL (or create index in PostgreSQL) and see what happens. Why trying compare apples and oranges? Continue reading this thread -- I also tried using an index in Postgresql. But have you tried cranking up work_mem to say 1G? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 3:33 PM, Yang Zhang yanghates...@gmail.com wrote: On Mon, Feb 22, 2010 at 5:31 PM, Igor Neyman iney...@perceptron.com wrote: When in doubt - test. Why not remove index in MySQL (or create index in PostgreSQL) and see what happens. Why trying compare apples and oranges? Continue reading this thread -- I also tried using an index in Postgresql. And oh yeah, what was shared_buffers set to? I'm not sure we ever got an answer to that. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 11:10, Yang Zhang yanghates...@gmail.com wrote: I have the exact same table of data in both MySQL and Postgresql. In ? Postgresql: FWIW on a stock (unchanged postgresql.conf) 8.3.9 I get (best of 3 runs) 79 seconds, 26 using an index and 27 seconds with it clustered. Now yes it goes a lot faster because im skipping all the overhead of sending the data to the client... But still that means it has almost nothing with the sort or indexs. Leaves pretty much your cpu, disk, filesystem and network... BTW the first time I did it it had to write out the hint bits so that took (a bit) longer... Is this freshly loaded data? --- # create table metarelcould_transactionlog ( id serial primary key, transactionid integer not null, queryid smallint not null, tableid varchar(30) not null, tupleid integer not null, querytype varchar not null, graphpartition smallint, replicatedpartition smallint, justifiedpartition smallint, hashpartition smallint, modeid integer, manualpartition smallint ); # insert into metarelcould_transactionlog (transactionid, queryid, tableid, tupleid, querytype, graphpartition, replicatedpartition, justifiedpartition, hashpartition, modeid, manualpartition) select foo, 1, 'sometable', 1, 's', 1, 1, 1, 1, 1, 1 from generate_series(1, 5000) as foo; # select count(1) from (SELECT * from metarelcould_transactionlog order by transactionid) as foo; count -- 5000 (1 row) Time: 79017.186 ms # create index idx on metarelcould_transactionlog (transactionid); # select count(1) from (SELECT * from metarelcould_transactionlog order by transactionid) as foo; count -- 5000 (1 row) Time: 26230.534 ms # cluster metarelcould_transactionlog USING idx; CLUSTER Time: 342381.535 ms # select count(1) from (SELECT * from metarelcould_transactionlog order by transactionid) as foo; count -- 5000 (1 row) Time: 27704.794 ms -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
nOn Mon, Feb 22, 2010 at 3:45 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Mon, Feb 22, 2010 at 12:53 PM, Yang Zhang yanghates...@gmail.com wrote: On Mon, Feb 22, 2010 at 2:52 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang yanghates...@gmail.com wrote: I have the exact same table of data in both MySQL and Postgresql. In Postgresql: Just wondering, are these on the same exact machine? Yes, on the same disk. I'm wondering how much of this could be caching effects. Is the MySQL database warmed up before you started, and the pgsql database is cold and no caching has taken place? What do things like vmstat 10 say while the query is running on each db? First time, second time, things like that. Awesome -- this actually led me to discover the problem. When running the query in MySQL InnoDB: $ vmstat 10 procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 0 13 13733604 83020 5648 219388433 936 16821 4 2 89 5 0 1 12 13749952 80164 5600 21780320 4354 908 4379 3586 2638 0 1 38 60 0 0 19 13762228 80576 5556 2145220 208 3527 1280 3690 3668 2635 1 1 39 59 0 0 19 13778632 79420 5560 2135228 52 4186 1046 4191 3682 2418 0 1 37 62 0 0 19 13792964 77336 5592 2082520 41 3731 1698 3804 4102 2686 1 1 53 45 0 0 14 13810356 84036 5556 2049836 36 4241 797 4246 3913 2603 0 1 68 31 0 1 14 13825640 81336 5520 20019200 4212 958 4220 3848 2736 1 1 73 25 0 0 17 13844952 78036 5476 19769568 4685 923 4689 3832 2547 0 1 69 29 0 2 13 13863828 79812 5448 19549523 4627 692 4634 3744 2505 0 1 70 28 0 0 15 13883828 77764 5440 1920528 249 4544 972 4548 4345 2506 0 1 70 28 0 1 20 13898900 79132 5456 1890192 28 4341 723 4438 4982 3030 0 3 64 33 0 0 11 13915252 85184 5624 1865260 79 3668 752 3764 4472 2765 0 3 57 40 0 0 12 13933964 78448 5700 1832640 120 4327 1066 4434 4484 2777 1 3 52 45 0 0 19 13951748 77640 5816 1795720 94 4005 1159 4091 4580 2762 1 3 48 49 0 0 16 13972748 79884 5780 17536760 4737 787 4746 4385 2766 1 3 51 45 0 0 25 13988108 78936 5884 1726068 547 3954 1468 4116 4976 3502 0 4 44 52 0 1 20 14011500 77676 5868 1689136 161 4980 843 5506 5218 3131 0 3 34 62 0 0 22 14032472 81348 5816 1647884 270 4198 943 4369 4521 2826 1 3 40 56 0 0 23 14055220 81712 5804 1626872 193 4774 1408 4856 4617 2754 1 3 38 58 0 0 21 14075848 81844 5696 15768360 4738 974 4742 4528 2704 1 3 40 56 0 0 25 14097260 79788 5628 1536336 213 4512 922 4639 4726 2929 1 3 27 69 0 0 24 14123900 80820 5616 1488460 319 5033 1059 5128 4895 2780 2 3 17 78 0 1 26 14142632 77276 5660 1445592 445 4605 1434 4727 5401 3364 1 3 16 79 0 0 31 14165668 83736 5976 1387048 362 4288 1444 4428 4739 2963 2 3 17 78 0 1 28 14180104 77564 6324 1369232 387 4526 4677 5748 3559 1 3 16 80 0 I'm guessing the swap numbers are because MySQL uses mmap? Anyway, when running the query in Postgresql: $ vmstat 10 procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buff cache si sobibo in cs us sy id wa st 1 2 13866336 1574540 25024 787898033 936 16821 4 2 89 5 0 1 3 13861520 1163596 25328 8128360 10460 25429 433 4368 4175 4 2 80 14 0 0 3 13856484 803024 25600 8343220 11170 22062 688 4492 4590 4 2 73 20 0 0 6 13855304 678868 26052 8435540 1600 9239 598 5195 7141 1 5 70 24 0 0 6 13853644 513568 26332 8563680 4010 12480 7100 4775 4248 3 3 68 26 0 2 2 13851804 166228 26624 8775304 6340 21466 1497 4680 4550 6 2 64 28 0 0 5 13861556 81896 26740 8825360 860 3547 6100 3847 5142 3386 6 2 57 35 0 0 6 13867688 91368 26808 8832712 653 3326 1835 3604 4738 2762 5 2 61 32 0 0 5 13870676 88524 26872 8849392 638 3272 2578 3517 4864 2909 4 2 55 39 0 0 5 13872748 79512 27004 8864456 629 1788 2086 2949 4337 2921 1 3 55 41 0 0 7 13876760 83124 27136 8867272 1018 2253 1713 2409 4321 2889 0 3 63 33 0 0 6 13878964 82876 27240 8874540 792 2119 1854 2314 4288 2813 2 2 72 24 0 3 4 13883204 81224 27280 8887068 661 3067 2995 3385 4558 2899 4 2 72 22 0 0 6 13886636 82036 27352 8905628 594 3726 2628 4013 4744 2765 4 2 69 25 0 0 8 13899852 85604 27400 8925800 638 4423 2689 4658 4903 2808 4 2 55 40 0 1 4 13905184 80544 27484 8940040 676 3501 3006 3799 4805 2932 4 2 66 28 0 0 9 13908480 80100 27516 8948476 668 2996 1720 3192 4594 2799 4 2 60 35 0 vmstat showed no swapping-out for a while, and then suddenly it started spilling a lot. Checking psql's memory stats showed that it was huge -- apparently, it's trying to
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 9:30 PM, Alex Hunsaker bada...@gmail.com wrote: On Mon, Feb 22, 2010 at 11:10, Yang Zhang yanghates...@gmail.com wrote: I have the exact same table of data in both MySQL and Postgresql. In ? Postgresql: FWIW on a stock (unchanged postgresql.conf) 8.3.9 I get (best of 3 runs) 79 seconds, 26 using an index and 27 seconds with it clustered. Now yes it goes a lot faster because im skipping all the overhead of sending the data to the client... Last sentence also contributed to my realizing the problem (the client I was using was psql), but there's one oddity # select count(1) from (SELECT * from metarelcould_transactionlog order by transactionid) as foo; Does it strike anyone else that the query optimizer/rewriter should be able to toss out the sort from such a query altogether? -- Yang Zhang http://www.mit.edu/~y_z/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
Yang Zhang yanghates...@gmail.com writes: # select count(1) from (SELECT * from metarelcould_transactionlog order by transactionid) as foo; Does it strike anyone else that the query optimizer/rewriter should be able to toss out the sort from such a query altogether? It could, if it knew that the aggregate function didn't care about input row order. We don't have that knowledge about aggregates ATM. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
Yang Zhang yanghates...@gmail.com writes: I'm relieved that Postgresql itself does not, in fact, suck, but slightly disappointed in the behavior of psql. I suppose it needs to buffer everything in memory to properly format its tabular output, among other possible reasons I could imagine. That's half of it, and the other half is not wanting to present a portion of query output if the query fails partway through. You could certainly write a client that disregarded these issues (as I suppose mysql must be doing). regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 10:51 PM, Yang Zhang yanghates...@gmail.com wrote: nOn Mon, Feb 22, 2010 at 3:45 PM, Scott Marlowe scott.marl...@gmail.com wrote: What do things like vmstat 10 say while the query is running on each db? First time, second time, things like that. Awesome -- this actually led me to discover the problem. vmstat showed no swapping-out for a while, and then suddenly it started spilling a lot. Checking psql's memory stats showed that it was huge -- apparently, it's trying to store its full result set in memory. As soon as I added a LIMIT 1, everything worked beautifully and finished in 4m (I verified that the planner was still issuing a Sort). I'm relieved that Postgresql itself does not, in fact, suck, but slightly disappointed in the behavior of psql. I suppose it needs to buffer everything in memory to properly format its tabular output, among other possible reasons I could imagine. It's best when working with big sets to do so with a cursor and fetch a few thousand rows at a time. It's how we handle really big sets at work and it works like a charm in keeping the client from bogging down with a huge memory footprint. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Mon, Feb 22, 2010 at 22:51, Yang Zhang yanghates...@gmail.com wrote: vmstat showed no swapping-out for a while, and then suddenly it started spilling a lot. Checking psql's memory stats showed that it was huge -- apparently, it's trying to store its full result set in memory. As soon as I added a LIMIT 1, everything worked beautifully and finished in 4m (I verified that the planner was still issuing a Sort). Well im half surprised no one has recommend using a cursor. Have you looked in to that? I bet that would fix most of your problems here. I'm relieved that Postgresql itself does not, in fact, suck, but slightly disappointed in the behavior of psql. I suppose it needs to buffer everything in memory to properly format its tabular output, among other possible reasons I could imagine. Well AFAIK it will dump everything you asked for. So if you said select * from 1G table; It should take at least 1G and potentially quite a bit more formatting and overhead. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting performance vs. MySQL
On Tue, Feb 23, 2010 at 1:48 AM, Scott Marlowe scott.marl...@gmail.com wrote: On Mon, Feb 22, 2010 at 10:51 PM, Yang Zhang yanghates...@gmail.com wrote: nOn Mon, Feb 22, 2010 at 3:45 PM, Scott Marlowe scott.marl...@gmail.com wrote: What do things like vmstat 10 say while the query is running on each db? First time, second time, things like that. Awesome -- this actually led me to discover the problem. vmstat showed no swapping-out for a while, and then suddenly it started spilling a lot. Checking psql's memory stats showed that it was huge -- apparently, it's trying to store its full result set in memory. As soon as I added a LIMIT 1, everything worked beautifully and finished in 4m (I verified that the planner was still issuing a Sort). I'm relieved that Postgresql itself does not, in fact, suck, but slightly disappointed in the behavior of psql. I suppose it needs to buffer everything in memory to properly format its tabular output, among other possible reasons I could imagine. It's best when working with big sets to do so with a cursor and fetch a few thousand rows at a time. It's how we handle really big sets at work and it works like a charm in keeping the client from bogging down with a huge memory footprint. Thing is, this is how I got here: - ran complex query that does SELECT INTO. - that never terminated, so killed it and tried a simpler SELECT (the subject of this thread) from psql to see how long that would take. I.e., my original application doesn't receive the entire dataset. -- Yang Zhang http://www.mit.edu/~y_z/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Sorting JTA survey results
http://www.postgresqlcertification.org/jta/2008/results Having point 4 as an example: For how long have you been a PostgreSQL database administrator? Less than 1 year36 I wish. 15 3 to 5 years42 1 to 3 years78 Over 10 years 7 5 to 7 years33 7 to 10 years 12 -- I think many people would appreciate if this list could be sorted by the a) topic - in this case administrative experience or b) percentage Currently it's really hard to interpret the results. Please put an ORDER BY there :) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Sorting nulls and empty strings together
User interface need to show nulls as empty strings. PostgreSQL sorts nulls after all data. create temp table test ( testcol char(10) ); insert into test values ( null); insert into test values ( 'test'); insert into test values ( ''); select * from test order by testcol; This confuses users who expect that all empty columns are together in sorted data. Select statements are generated dynamically by driver and it is not easy to change them to generate order by coalesce( testcol,''). If there is no other way I can change driver to generate coalesce( testcol,'') as order by expressions. However I'm afraid that those order by expression cannot use regular index like create index test_inx on test(testcol) in it thus too slow for large data. How to force PostgreSQL to sort data so that nulls and empty strings appear together ? Andrus. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting nulls and empty strings together
On Mon, Apr 28, 2008 at 08:05:45PM +0300, Andrus wrote: User interface need to show nulls as empty strings. PostgreSQL sorts nulls after all data. create temp table test ( testcol char(10) ); insert into test values ( null); insert into test values ( 'test'); insert into test values ( ''); select * from test order by testcol; This confuses users who expect that all empty columns are together in sorted data. I'd say users are being confused by the assumption the nulls and empty strings are the same when they clearly aren't. Perhaps you should think which of the two you actually want to mean empty and then get rid of the other possibility. If there is no other way I can change driver to generate coalesce( testcol,'') as order by expressions. However I'm afraid that those order by expression cannot use regular index like create index test_inx on test(testcol) You could do: create index test_inx on test(coalesce(testcol,'')) Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ Please line up in a tree and maintain the heap invariant while boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [GENERAL] Sorting nulls and empty strings together
Andrus [EMAIL PROTECTED] writes: User interface need to show nulls as empty strings. PostgreSQL sorts nulls after all data. ... Select statements are generated dynamically by driver and it is not easy to change them to generate order by coalesce( testcol,''). You could use NULLS FIRST (assuming your collation has '' sorted at the beginning which I think is normally true). But you would have to switch it to NULLS LAST if you sort descending... If there is no other way I can change driver to generate coalesce( testcol,'') as order by expressions. However I'm afraid that those order by expression cannot use regular index like create index test_inx on test(testcol) create index test_inx on test(coalesce(testcol,'')) But I bet you'll have trouble using an index at all for the order by. You'll either be searching on other columns which would have to be leading columns of every index or you'll be reading the whole table anyways and postgres will prefer to sort since it's faster. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sorting nulls and empty strings together
Andrus wrote: User interface need to show nulls as empty strings. PostgreSQL sorts nulls after all data. create temp table test ( testcol char(10) ); insert into test values ( null); insert into test values ( 'test'); insert into test values ( ''); select * from test order by testcol; This confuses users who expect that all empty columns are together in sorted data. Select statements are generated dynamically by driver and it is not easy to change them to generate order by coalesce( testcol,''). If there is no other way I can change driver to generate coalesce( testcol,'') as order by expressions. However I'm afraid that those order by expression cannot use regular index like create index test_inx on test(testcol) in it thus too slow for large data. How to force PostgreSQL to sort data so that nulls and empty strings appear together ? Well, you could use a case statement to change empty strings to NULL in your select: select case when testcol='' then NULL else testcol end as testcol from test order by testcol; There may be a better way, like a rule or something, but this seemed like a quick easy thing to do. -Dennis -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Sorting with DISTINCT ON
Hi there, I have a problem sorting a SQL result if I use DISTINCT ON. I have a table tblcomment with these columns: id (serial) path (varchar) created (timestamp) title (varchar) These records are in the table tblcomment: id pathcreated title 11 /var/black 2007-01-07 22:17:03.001837 Any title 17 /var/blue 2007-01-07 20:35:55.289713 Any title 13 /var/blue 2007-01-15 15:52:58.438375 Any title 12 /var/green 2007-01-08 19:03:50.470073 Any title 18 /var/red2007-01-07 08:41:47.152676 Any title Now, I want to get all results from this table and if there are duplicates, I want the row whose created column has the latest date. In this example, I want to have this result: id pathcreated title 11 /var/black 2007-01-07 22:17:03.001837 Any title 13 /var/blue 2007-01-15 15:52:58.438375 Any title 12 /var/green 2007-01-08 19:03:50.470073 Any title 18 /var/red2007-01-07 08:41:47.152676 Any title My first try was this SQL query: SELECT DISTINCT ON (path) path, comment_id, created, title FROM bewertungen.tblcomments This does not allow me to append ORDER BY created since I can only sort on path because of DISTINCT ON (path). My second try was a sub query like this: SELECT comment_id, path, created, title FROM ( SELECT DISTINCT ON (path) path, comment_id, created, title FROM bewertungen.tblcomments ) foo_alias ORDER BY created DESC But this results into: id pathcreated title 11 /var/black 2007-01-07 22:17:03.001837 Any title 17 /var/blue 2007-01-07 20:35:55.289713 Any title 12 /var/green 2007-01-08 19:03:50.470073 Any title 18 /var/red2007-01-07 08:41:47.152676 Any title No matter, if I user ORDER BY created DESC or ORDER BY created ASC. It seems that postgres always takes the first row of the duplicates. In this example: 17 /var/blue 2007-01-07 20:35:55.289713 Any title. Any idea, how I can solve my problem? Regards, Nico ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Sorting with DISTINCT ON
Nico Grubert [EMAIL PROTECTED] writes: My first try was this SQL query: SELECT DISTINCT ON (path) path, comment_id, created, title FROM bewertungen.tblcomments This does not allow me to append ORDER BY created since I can only sort on path because of DISTINCT ON (path). It does allow you to sort on both columns. SELECT DISTINCT ON (path) path, comment_id, created, title FROM bewertungen.tblcomments ORDER BY path, created Maybe you need a more complex query to get what you want (and use max(created)...). -- Jorge Godoy [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Sorting with DISTINCT ON
It does allow you to sort on both columns. SELECT DISTINCT ON (path) path, comment_id, created, title FROM bewertungen.tblcomments ORDER BY path, created Thank you very much. Works perfect! :-) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Sorting
I have a column that is a varchar(6) I need to sort it by the rows that are integers 1st then the character ones or vice versa, I just need the values that can be converted to integer to sort by their numeric value. i.e 1, 2, 3, 4, 5, 10, 11, A, B, C instead of 1, 10, 11, 2, 3, 4, 5, A, B, C Any suggestions?
Re: [GENERAL] Sorting
am Mon, dem 08.01.2007, um 10:21:38 -0600 mailte Bart McFarling folgendes: I have a column that is a varchar(6) I need to sort it by the rows that are integers 1st then the character ones or vice versa, I just need the values that can be converted to integer to sort by their numeric value. i.e 1, 2, 3, 4, 5, 10, 11, A, B, C instead of 1, 10, 11, 2, 3, 4, 5, A, B, C Any suggestions? perhaps something like this: test=*# select * from foo; w 10 1 A 3 C (5 rows) Time: 1.349 ms test=*# select w, case when w ~ '^[0-9]*$' then w::int else 1 end from foo order by 2,1; w | case +--- 1 | 1 3 | 3 10 |10 A | 1 C | 1 (5 rows) Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: - Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Sorting
On mán, 2007-01-08 at 17:59 +0100, A. Kretschmer wrote: am Mon, dem 08.01.2007, um 10:21:38 -0600 mailte Bart McFarling folgendes: I have a column that is a varchar(6) I need to sort it by the rows that are integers 1st then the character ones or vice versa, I just need the values that can be converted to integer to sort by their numeric value. i.e 1, 2, 3, 4, 5, 10, 11, A, B, C instead of 1, 10, 11, 2, 3, 4, 5, A, B, C Any suggestions? perhaps something like this: test=*# select * from foo; w 10 1 A 3 C (5 rows) Time: 1.349 ms test=*# select w, case when w ~ '^[0-9]*$' then w::int else 1 end from foo order by 2,1; possible improvements: a) w ~ '^[0-9]+$' b) use NULL instead of 1 gnari ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Sorting
Ragnar [EMAIL PROTECTED] schrieb: test=*# select w, case when w ~ '^[0-9]*$' then w::int else 1 end from foo order by 2,1; possible improvements: a) w ~ '^[0-9]+$' b) use NULL instead of 1 Thanks, right. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) If I was god, I would recompile penguin with --enable-fly.(unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Sorting
Thanks, -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andreas Kretschmer Sent: Monday, January 08, 2007 11:19 AM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Sorting Ragnar [EMAIL PROTECTED] schrieb: test=*# select w, case when w ~ '^[0-9]*$' then w::int else 1 end from foo order by 2,1; possible improvements: a) w ~ '^[0-9]+$' b) use NULL instead of 1 Thanks, right. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) If I was god, I would recompile penguin with --enable-fly.(unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] Sorting array field
Hi, Can anyone point me toward an SQL function (whether built-in or an add-on) that will allow me to sort the contents of an array datatype in an SQL query? Something like this: select sort(my_array_field) from my_table; Thanks! Pete
Re: [GENERAL] Sorting array field
On Thu, Dec 22, 2005 at 08:38:46AM -0700, Pete Deffendol wrote: Can anyone point me toward an SQL function (whether built-in or an add-on) that will allow me to sort the contents of an array datatype in an SQL query? For integer arrays see contrib/intarray. SELECT sort('{5,2,3,1,9,7}'::int[]); sort --- {1,2,3,5,7,9} (1 row) I don't recall if any of the contrib modules can sort arrays of other types; if not then look for something at a site like pgfoundry or GBorg. If you have PL/Ruby then it couldn't get much easier: CREATE FUNCTION sort(arg text[]) RETURNS text[] AS $$ arg.sort $$ LANGUAGE plruby IMMUTABLE STRICT; SELECT sort('{zz,xx yy,cc,aa,bb}'::text[]); sort --- {aa,bb,cc,xx yy,zz} (1 row) Another way would be to write a set-returning function that returns each item in the array as a separate row, and another function that uses an array constructor to put the rows back together in order (this example should work in 7.4 and later): CREATE FUNCTION array2rows(anyarray) RETURNS SETOF anyelement AS ' BEGIN FOR i IN array_lower($1, 1) .. array_upper($1, 1) LOOP RETURN NEXT $1[i]; END LOOP; RETURN; END; ' LANGUAGE plpgsql IMMUTABLE STRICT; CREATE FUNCTION sort(anyarray) RETURNS anyarray AS ' SELECT array(SELECT * FROM array2rows($1) ORDER BY 1) ' LANGUAGE sql IMMUTABLE STRICT; SELECT data, sort(data) FROM foo; data | sort ---+--- {dd,cc,bb,aa} | {aa,bb,cc,dd} {zz,xx yy,cc,aa,bb} | {aa,bb,cc,xx yy,zz} (2 rows) I'm not sure if there are easier ways; these are what first came to mind. -- Michael Fuhr ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Sorting array field
On Thu, Dec 22, 2005 at 08:38:46AM -0700, Pete Deffendol wrote: Hi, Can anyone point me toward an SQL function (whether built-in or an add-on) that will allow me to sort the contents of an array datatype in an SQL query? Something like this: select sort(my_array_field) from my_table; Here's one way using only SQL. I do not make any guarantees about its performance, though ;) CREATE TABLE my_table (my_array text[]); INSERT INTO my_table VALUES('{r,e,d,q}'); INSERT INTO my_table VALUES('{c,b,a}'); INSERT INTO my_table VALUES('{one,two,three,four}'); SELECT ARRAY( SELECT t.my_array[s.i] FROM generate_series( array_lower(my_array,1), /* usually 1 */ array_upper(my_array,1) ) AS s(i) ORDER BY t.my_array[s.i] ) AS sorted_array FROM my_table t ORDER BY sorted_array DESC; HTH :) Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 415 235 3778 Remember to vote! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Re: [GENERAL] Sorting problems with SELECT * FROM table WHERE name LIKE 'Ö %'
I have a problem when sorting records with: SELECT * FROM table WHERE name LIKE 'Ö%' I am running Postgres 8.02 with a database whose character encoding is UNICODE. The SQL Query SELECT * FROM member WHERE name LIKE 'O%' OR name like 'Ö%' ORDER BY name returns this: Öhlmann Öhmann Obenaus Ochoa O'Donovan Oehme Oklant Oltub Oltüch Oltutz Oltüwer According to german sorting rules the result is fine except the both first entries Öhlmann and Öhmann. Why do appear these records at the beginning of the list? The proper result should read like this: Obenaus Ochoa O'Donovan Oehme Öhlmann Öhmann Oklant Oltub Oltüch Oltutz Oltüwer The same problem accours when using E where my result is this: Élie de Beaumont Eberer Ecü Edding Emmer The proper result should be: Eberer Ecü Edding Élie de Beaumont Emmer Any idea how I can solve this problem? Thank you very much in advance, Nico To complete the missing information, here are the variables set for the databases: add_missing_fromon archive_command unset australian_timezonesoff authentication_timeout 60 bgwriter_delay 200 bgwriter_maxpages 100 bgwriter_percent1 block_size 8192 check_function_bodies on checkpoint_segments 3 checkpoint_timeout 300 checkpoint_warning 30 client_encoding UNICODE client_min_messages notice commit_delay0 commit_siblings 5 cpu_index_tuple_cost0.001 cpu_operator_cost 0.0025 cpu_tuple_cost 0.01 custom_variable_classes unset DateStyle ISO, MDY db_user_namespace off deadlock_timeout1000 debug_pretty_print off debug_print_parse off debug_print_planoff debug_print_rewritten off debug_shared_buffers0 default_statistics_target 10 default_tablespace unset default_transaction_isolation read committed default_transaction_read_only off default_with_oids on effective_cache_size1000 enable_hashagg on enable_hashjoin on enable_indexscanon enable_mergejoinon enable_nestloop on enable_seqscan on enable_sort on enable_tidscan on explain_pretty_printon extra_float_digits 0 from_collapse_limit 8 fsync on geqoon geqo_effort 5 geqo_generations0 geqo_pool_size 0 geqo_selection_bias 2 geqo_threshold 12 integer_datetimes on join_collapse_limit 8 lc_collate [EMAIL PROTECTED] lc_ctype[EMAIL PROTECTED] lc_messages [EMAIL PROTECTED] lc_monetary [EMAIL PROTECTED] lc_numeric [EMAIL PROTECTED] lc_time [EMAIL PROTECTED] listen_addresseslocalhost log_connections off log_destination stderr log_disconnections off log_durationoff log_error_verbosity default log_executor_stats off log_hostnameoff log_line_prefix unset log_min_duration_statement -1 log_min_error_statement panic log_min_messagesnotice log_parser_statsoff log_planner_stats off log_rotation_age1440 log_rotation_size 10240 log_statement none log_statement_stats off log_truncate_on_rotationoff maintenance_work_mem16384 max_connections 100 max_files_per_process 1000 max_fsm_pages 2 max_fsm_relations 1000 max_function_args 32 max_identifier_length 63 max_index_keys 32 max_locks_per_transaction 64 max_stack_depth 2048 password_encryption on port5432 pre_auth_delay 0 random_page_cost4 redirect_stderr off regex_flavoradvanced rendezvous_name unset search_path $user,public server_encoding UNICODE server_version 8.0.2 shared_buffers 1000 silent_mode off sql_inheritance on ssl off statement_timeout 0 stats_block_level off stats_command_stringoff stats_reset_on_server_start on stats_row_level off stats_start_collector on superuser_reserved_connections 2 syslog_facility LOCAL0 syslog_identpostgres TimeZoneEurope/Berlin trace_notifyoff transaction_isolation read committed transaction_read_only off transform_null_equals off unix_socket_group unset unix_socket_permissions 511 vacuum_cost_delay 0 vacuum_cost_limit 200 vacuum_cost_page_dirty 20 vacuum_cost_page_hit1 vacuum_cost_page_miss 10 wal_buffers 8 wal_sync_method fdatasync work_mem1024 zero_damaged_pages off Is there any explaination why the result is not sorted properly? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Re: [GENERAL] Sorting problems with SELECT * FROM
Nico Grubert wrote: Ah, I found it: lc_collate: [EMAIL PROTECTED] lc_ctype: [EMAIL PROTECTED] This is an iso-8859-15 locale, isn't it? If your database encoding is UNICODE, I believe you'd have more success using an UTF8 locale, such as de_DE.UTF-8 in your case. -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Sorting problems with SELECT * FROM t able WHERE name LIKE 'Ö%'
Hi there, I have a problem when sorting records with: SELECT * FROM table WHERE name LIKE 'Ö%' I am running Postgres 8.02 with a database whose character encoding is UNICODE. The SQL Query SELECT * FROM member WHERE name LIKE 'O%' OR name like 'Ö%' ORDER BY name returns this: Öhlmann Öhmann Obenaus Ochoa O'Donovan Oehme Oklant Oltub Oltüch Oltutz Oltüwer According to german sorting rules the result is fine except the both first entries Öhlmann and Öhmann. Why do appear these records at the beginning of the list? The proper result should read like this: Obenaus Ochoa O'Donovan Oehme Öhlmann Öhmann Oklant Oltub Oltüch Oltutz Oltüwer The same problem accours when using E where my result is this: Élie de Beaumont Eberer Ecü Edding Emmer The proper result should be: Eberer Ecü Edding Élie de Beaumont Emmer Any idea how I can solve this problem? Thank you very much in advance, Nico ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Sorting problems with SELECT * FROM table WHERE name LIKE 'Ö%'
Nico Grubert [EMAIL PROTECTED] writes: I have a problem when sorting records with: SELECT * FROM table WHERE name LIKE 'Ö%' I am running Postgres 8.02 with a database whose character encoding is UNICODE. ... but what locale is it using? (See LC_COLLATE and LC_CTYPE.) regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] Re: [GENERAL] Sorting problems with SELECT * FROM table WHERE name LIKE 'Ö %'
Ah, I found it: lc_collate: [EMAIL PROTECTED] lc_ctype: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Sorting by related tables
On Sat, Aug 13, 2005 at 06:44:09PM +0200, Andreas Seltenreich wrote: 3) Oh, and I have also this for checking IF there are items in region that are above the item in question -- to see IF an item can or cannot be moved up in the sort order relative to others. SELECT id FROM __TABLE__ WHERE sort_order = (SELECT sort_order FROM __TABLE__ WHERE id = ?) AND id != ?; If that returns any rows then I know I can call the UPDATE to move the item up. I guess you want a boolean value here? SELECT EXISTS around your above query as a subselect should do the trick. You also want to use LIMIT 1 in the statement, to avoid fetching unnecessary records. Is there much of a difference between using LIMIT 1 and using an EXISTS subselect? Frankly, I'm not clear what you are specifically suggestion with EXISTS. I'm using Perl's Class::DBI object mapping module so returning a single row is an easy way to check this as a boolean result in Perl. Again, a very basic question: What method should be used to be sure that nothing changes between the SELECT and the UPDATE? You can achieve that using transactions. Concurrency control is explained here: http://www.postgresql.org/docs/8.0/static/mvcc.html. My comment was that I want to do the above SELECT and then *only* do an UPDATE if the SELECT returns at least one row. So, I should do: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; Before the SELECT. And when I UPDATE I need to be prepared to do a ROLLBACK if I get an error and repeat the process. (And, I assume, take some precaution to give up after some number of tries.) Does that seem reasonable? -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Sorting by related tables
Bill Moseley schrob: On Sat, Aug 13, 2005 at 06:44:09PM +0200, Andreas Seltenreich wrote: 3) Oh, and I have also this for checking IF there are items in region that are above the item in question -- to see IF an item can or cannot be moved up in the sort order relative to others. SELECT id FROM __TABLE__ WHERE sort_order = (SELECT sort_order FROM __TABLE__ WHERE id = ?) AND id != ?; If that returns any rows then I know I can call the UPDATE to move the item up. I guess you want a boolean value here? SELECT EXISTS around your above query as a subselect should do the trick. You also want to use LIMIT 1 in the statement, to avoid fetching unnecessary records. Is there much of a difference between using LIMIT 1 and using an EXISTS subselect? LIMIT 1 does reduce the cost, EXISTS AFAIK only makes the result boolean and doesn't stop the execution of the subselect by itself when the first record is found. Frankly, I'm not clear what you are specifically suggestion with EXISTS. I'm using Perl's Class::DBI object mapping module so returning a single row is an easy way to check this as a boolean result in Perl. Uups, this wasn't question number three yet, and I wrongly inferred from your uppercase-ifs that you wanted a boolean result here :-/ Again, a very basic question: What method should be used to be sure that nothing changes between the SELECT and the UPDATE? You can achieve that using transactions. Concurrency control is explained here: http://www.postgresql.org/docs/8.0/static/mvcc.html. My comment was that I want to do the above SELECT and then *only* do an UPDATE if the SELECT returns at least one row. So, I should do: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; Before the SELECT. And when I UPDATE I need to be prepared to do a ROLLBACK if I get an error and repeat the process. (And, I assume, take some precaution to give up after some number of tries.) Does that seem reasonable? This would be one possibility. If you don't want your application to deal with transactions being aborted because of non-serializable transactions, you could alternatively use explicit locking (SELECT ... FOR UPDATE) combined with the Read Committed isolation level (the default). Explicit locking is documented here: http://www.postgresql.org/docs/8.0/static/explicit-locking.html#LOCKING-ROWS regards Andreas -- ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Sorting by related tables
On Mon, Aug 15, 2005 at 11:30:32PM +0200, Andreas Seltenreich wrote: This would be one possibility. If you don't want your application to deal with transactions being aborted because of non-serializable transactions, you could alternatively use explicit locking (SELECT ... FOR UPDATE) combined with the Read Committed isolation level (the default). SELECT FOR UPDATE locks just the rows that are selected, right? If I understand correctly, that would not work for my case because I'm updating different rows than I'm selecting. My tables are small, so I'm thinking of just manually updating all the rows in sequence to adjust the order when needed -- to make things a bit more simple. But it is a problem that I am curious about how best to solve in a scalable way. Thanks very much for your feedback. -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] Sorting by related tables
I have a few beginner questions about using related tables for sorting. create table region { id SERIAL PRIMARY KEY, nametext, -- order this table should be sorted in -- a 1 is the top sort level sort_order integer ); create table city { id SERIAL PRIMARY KEY, nametext, region integer REFERENCES region ); I want a way to adjust the sort order of the region table (move item up or move item down in a web interface) without requiring knowledge of the existing sort_order for the rows in the region table. (i.e. requiring them to already be in an order). Here's my move up (which is a lower sort_order value) statement. (__TABLE__ is region and ? are $\d bind parameters) UPDATE __TABLE__ SET sort_order = CASE -- subtract one from the item's sort, unless it's already 1 WHEN id = ? AND sort_order 1 THEN sort_order-1 -- for other items that are greater or equal to sort-1 WHEN id != ? AND sort_order = (select sort_order from __TABLE__ where id = ?)-1 THEN sort_order+1 -- all others, leave alone ELSE sort_order END; This works reasonably well for small tables, but doesn't scale and the logic likely has holes. And behavior when adding new rows to the region table is not defined. 1) How do most people do this? Use linked lists? create table region { id SERIAL PRIMARY KEY nametext, list_head boolean, -- flag if this is the head of the linked list nextinteger REFERENCES region ); 2) As a SQL beginner, I'm not seeing how to display rows from city sorted in order based on the order in the region table. 3) Oh, and I have also this for checking IF there are items in region that are above the item in question -- to see IF an item can or cannot be moved up in the sort order relative to others. SELECT id FROM __TABLE__ WHERE sort_order = (SELECT sort_order FROM __TABLE__ WHERE id = ?) AND id != ?; If that returns any rows then I know I can call the UPDATE to move the item up. Again, a very basic question: What method should be used to be sure that nothing changes between the SELECT and the UPDATE? -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Sorting by related tables
Bill Moseley schrob: create table region { id SERIAL PRIMARY KEY, nametext, -- order this table should be sorted in -- a 1 is the top sort level sort_order integer ); create table city { id SERIAL PRIMARY KEY, nametext, region integer REFERENCES region ); I want a way to adjust the sort order of the region table (move item up or move item down in a web interface) without requiring knowledge of the existing sort_order for the rows in the region table. (i.e. requiring them to already be in an order). Here's my move up (which is a lower sort_order value) statement. (__TABLE__ is region and ? are $\d bind parameters) UPDATE __TABLE__ SET sort_order = CASE -- subtract one from the item's sort, unless it's already 1 WHEN id = ? AND sort_order 1 THEN sort_order-1 -- for other items that are greater or equal to sort-1 WHEN id != ? AND sort_order = (select sort_order from __TABLE__ where id = ?)-1 THEN sort_order+1 -- all others, leave alone ELSE sort_order END; This works reasonably well for small tables, but doesn't scale and the logic likely has holes. And behavior when adding new rows to the region table is not defined. I guess your approach of maintaining a special attribute for the custom sort order could be quite fast when using floating point numbers instead of integers. You then could easily move a record around without having to update _every_ other record by using proper fractions. E.g. to move a record A between B and C, just update its sort_order to (B.sort_order + C.sort_order) / 2. However, with IEEE754-floats this is only guaranteed to work 1023 times in the worst case when using double precision and seeding the sort_order with integers. So one would have to be careful and normalize the column from time to time. 1) How do most people do this? Use linked lists? I haven't had this problem yet, so I don't know if there's a standard answer. create table region { id SERIAL PRIMARY KEY nametext, list_head boolean, -- flag if this is the head of the linked list nextinteger REFERENCES region ); The problem with linked lists is that they don't fit into the relational model well, and since SQL isn't turing-complete by design, you'd have to write a proper function with one of the procedural languages PostgreSQL offers to iterate the list. Searching the list archives should yield some examples. 2) As a SQL beginner, I'm not seeing how to display rows from city sorted in order based on the order in the region table. IMHO a simple join should do here, e.g. select city.name from city, region where region.id = city.region order by region.sort_order Joining is explained here: http://www.postgresql.org/docs/8.0/static/queries-table-expressions.html 3) Oh, and I have also this for checking IF there are items in region that are above the item in question -- to see IF an item can or cannot be moved up in the sort order relative to others. SELECT id FROM __TABLE__ WHERE sort_order = (SELECT sort_order FROM __TABLE__ WHERE id = ?) AND id != ?; If that returns any rows then I know I can call the UPDATE to move the item up. I guess you want a boolean value here? SELECT EXISTS around your above query as a subselect should do the trick. You also want to use LIMIT 1 in the statement, to avoid fetching unnecessary records. Again, a very basic question: What method should be used to be sure that nothing changes between the SELECT and the UPDATE? You can achieve that using transactions. Concurrency control is explained here: http://www.postgresql.org/docs/8.0/static/mvcc.html. regards Andreas -- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] Sorting by constant values
I have a column that I want to sort by certain values. The values are Unit, Exterior and Common. I want all the records with Unit first, Common second and Exterior last in the sort order. These are the only 3 possible values, is there a way to sort manually like that with the alphanumeric values? -- Robert ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Sorting by constant values
You can order by conditions, lets say column='Unit'. The evaluation of a conditions will give you 't' or 'f', and alfabetically 'f' 't'... you should use DESC to get the matches first. So, it would be more or less like this: ORDER BY column='Unit' DESC, column='Exterior' DESC, column='Common' DESC I don't think this is performant though. If you have many rows to evaluate, you could create a funtion like this: CREATE FUNCTION evaluate(TEXT) RETURNS TEXT LANGUAGE 'sql' AS ' SELECT $1='Unit' || $1='Exterior' || $1='Common'; ' This function would return something like 'tff', 'ftf', 'fft', and you should be able to create an index on that function. Then you can use the index to order your rows. Hope it helps ;) 2005/5/3, Robert Fitzpatrick [EMAIL PROTECTED]: I have a column that I want to sort by certain values. The values areUnit, Exterior and Common. I want all the records with Unit first,Common second and Exterior last in the sort order. These are the only 3possible values, is there a way to sort manually like that with the alphanumeric values?--Robert---(end of broadcast)---TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Sorting by constant values
On Tue, 2005-05-03 at 12:29, Robert Fitzpatrick wrote: I have a column that I want to sort by certain values. The values are Unit, Exterior and Common. I want all the records with Unit first, Common second and Exterior last in the sort order. These are the only 3 possible values, is there a way to sort manually like that with the alphanumeric values? case statement should work. http://www.postgresql.org/docs/8.0/static/functions-conditional.html ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Sorting by constant values
On Tue, 2005-05-03 at 13:29 -0400, Robert Fitzpatrick wrote: I have a column that I want to sort by certain values. The values are Unit, Exterior and Common. I want all the records with Unit first, Common second and Exterior last in the sort order. These are the only 3 possible values, is there a way to sort manually like that with the alphanumeric values? ... ORDER BY length(col); :-) gnari ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[GENERAL] sorting Chinese varchar field
Hi, I installed postgres 8.0 for windows on my win xp (Simplified Chinese version). The encoding is unicode. When I set pgsql client encoding to gb18030, I could insert Chinese text from the command line to postgres. However, I could not get the sort order of Chinese varchar field to work properly. What I tried are as follows: 1) installed postgres for windows and used the C locale. 2) installed postgres for windows and used the Chinese, PRC locale. Again, in both cases, the backend encoding is unicode. The other interesting thing I observed was, when setting to the C locale, the following sql worked fine: select * from user where name = 'x'; x is a Chinese text However, if I set the locale to Chinese, PRC during installation, the above select did not get the any matching rows, where it should have got. In this case, the following worked fine: select * from user where name ilike 'x%'; Could anyone let me know the best practice for using postgres to store Chinese text? (This should be the same problem I guess, for using postgres to store other languages than English.) Thanks a lot, Jian ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [GENERAL] sorting Chinese varchar field
Hi, I installed postgres 8.0 for windows on my win xp (Simplified Chinese version). The encoding is unicode. When I set pgsql client encoding to gb18030, I could insert Chinese text from the command line to postgres. However, I could not get the sort order of Chinese varchar field to work properly. What I tried are as follows: 1) installed postgres for windows and used the C locale. 2) installed postgres for windows and used the Chinese, PRC locale. Again, in both cases, the backend encoding is unicode. Sorting is not currently supported in the UNICODE/UTF-8 encoding on Win32. Sorry. You need to pick a specific non-unicode encoding/locale combination. (Or run the server on a platform that supports it, if that is an option) //Magnus ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] sorting Chinese varchar field
Hi, I installed postgres 8.0 for windows on my win xp (Simplified Chinese version). The encoding is unicode. When I set pgsql client encoding to gb18030, I could insert Chinese text from the command line to postgres. However, I could not get the sort order of Chinese varchar field to work properly. What I tried are as follows: 1) installed postgres for windows and used the C locale. 2) installed postgres for windows and used the Chinese, PRC locale. Again, in both cases, the backend encoding is unicode. The other interesting thing I observed was, when setting to the C locale, the following sql worked fine: select * from user where name = 'x'; x is a Chinese text However, if I set the locale to Chinese, PRC during installation, the above select did not get the any matching rows, where it should have got. In this case, the following worked fine: select * from user where name ilike 'x%'; Could anyone let me know the best practice for using postgres to store Chinese text? (This should be the same problem I guess, for using postgres to store other languages than English.) I would suggest to use UNICODE/C locale combo. On most systems the locale database for multibyte encodings are broken as far as I know. For the sorting problem, probably you could get the right sort order by using convert. i.e. SELECT * FROM t1 WHERE ... ORDER BY CONVERT(your_chinese_character, utf_8_to_gb_18030); If above does not work, you cannot get the right sort order even if you use GB18030 anyway. -- Tatsuo Ishii ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [GENERAL] sorting Chinese varchar field
Great, that works out fine! So, the SQL I tested with is: select * from mytable order by convert(name, 'utf8', 'gb18030'); It produces the correct output. Thanks Tatsuo! Jian On Tue, 29 Mar 2005 10:25:58 +0900 (JST), Tatsuo Ishii [EMAIL PROTECTED] wrote: Hi, I installed postgres 8.0 for windows on my win xp (Simplified Chinese version). The encoding is unicode. When I set pgsql client encoding to gb18030, I could insert Chinese text from the command line to postgres. However, I could not get the sort order of Chinese varchar field to work properly. What I tried are as follows: 1) installed postgres for windows and used the C locale. 2) installed postgres for windows and used the Chinese, PRC locale. Again, in both cases, the backend encoding is unicode. The other interesting thing I observed was, when setting to the C locale, the following sql worked fine: select * from user where name = 'x'; x is a Chinese text However, if I set the locale to Chinese, PRC during installation, the above select did not get the any matching rows, where it should have got. In this case, the following worked fine: select * from user where name ilike 'x%'; Could anyone let me know the best practice for using postgres to store Chinese text? (This should be the same problem I guess, for using postgres to store other languages than English.) I would suggest to use UNICODE/C locale combo. On most systems the locale database for multibyte encodings are broken as far as I know. For the sorting problem, probably you could get the right sort order by using convert. i.e. SELECT * FROM t1 WHERE ... ORDER BY CONVERT(your_chinese_character, utf_8_to_gb_18030); If above does not work, you cannot get the right sort order even if you use GB18030 anyway. -- Tatsuo Ishii ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] sorting Chinese varchar field
Great, that works out fine! So, the SQL I tested with is: select * from mytable order by convert(name, 'utf8', 'gb18030'); Sorry, what I wanted to say was: SELECT * FROM t1 WHERE ... ORDER BY CONVERT(your_chinese_character using utf_8_to_gb_18030); Of course your example is fine too (actually they are internally identical). -- Tatsuo Ishii It produces the correct output. Thanks Tatsuo! Jian On Tue, 29 Mar 2005 10:25:58 +0900 (JST), Tatsuo Ishii [EMAIL PROTECTED] wrote: Hi, I installed postgres 8.0 for windows on my win xp (Simplified Chinese version). The encoding is unicode. When I set pgsql client encoding to gb18030, I could insert Chinese text from the command line to postgres. However, I could not get the sort order of Chinese varchar field to work properly. What I tried are as follows: 1) installed postgres for windows and used the C locale. 2) installed postgres for windows and used the Chinese, PRC locale. Again, in both cases, the backend encoding is unicode. The other interesting thing I observed was, when setting to the C locale, the following sql worked fine: select * from user where name = 'x'; x is a Chinese text However, if I set the locale to Chinese, PRC during installation, the above select did not get the any matching rows, where it should have got. In this case, the following worked fine: select * from user where name ilike 'x%'; Could anyone let me know the best practice for using postgres to store Chinese text? (This should be the same problem I guess, for using postgres to store other languages than English.) I would suggest to use UNICODE/C locale combo. On most systems the locale database for multibyte encodings are broken as far as I know. For the sorting problem, probably you could get the right sort order by using convert. i.e. SELECT * FROM t1 WHERE ... ORDER BY CONVERT(your_chinese_character, utf_8_to_gb_18030); If above does not work, you cannot get the right sort order even if you use GB18030 anyway. -- Tatsuo Ishii ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Sorting when * is the initial character
On Tue, 8 Feb 2005 01:10 pm, CoL wrote: hi, Berend Tober wrote, On 2/7/2005 22:20: I encountered what looks like unusually sorting behavior, and I'm wondering if anyone can tell me if this is supposted to happen (and then if so, why) or if this is a bug: SELECT * FROM sample_table ORDER BY 1; account_id,account_name 100,First account 110,Second account *115,Fifth account 120,Third account *125,Fourth account I would expect to see account_id,account_name *115,Fifth account *125,Fourth account 100,First account 110,Second account 120,Third account With 8.0.0 C local, SQL_ASCII Database, I get the expected output. Regards Russell Smith ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] Sorting when '*' is the initial character - solved
On Tue, 8 Feb 2005 01:10 pm, CoL wrote: hi, Berend Tober wrote, On 2/7/2005 22:20: I encountered what looks like unusually sorting behavior, and I'm wondering if anyone can tell me if this is supposted to happen (and then if so, why) or if this is a bug: -- With 8.0.0 C local, SQL_ASCII Database, I get the expected output. Russell Smith -- order by case when account_id like '*%' then 0 else 1 end C. Thanks. It was pointed out to me that this behavior is normal and is dependent on the locale setting. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[GENERAL] Sorting when * is the initial character
I encountered what looks like unusually sorting behavior, and I'm wondering if anyone can tell me if this is supposted to happen (and then if so, why) or if this is a bug: CREATE TABLE sample_table ( account_id varchar(4), account_name varchar(25) ) WITHOUT OIDS; INSERT INTO sample_table VALUES ('100', 'First account'); INSERT INTO sample_table VALUES ('110', 'Second account'); INSERT INTO sample_table VALUES ('120', 'Third account'); INSERT INTO sample_table VALUES ('*125', 'Fourth account'); INSERT INTO sample_table VALUES ('*115', 'Fifth account'); SELECT * FROM sample_table ORDER BY 1; account_id,account_name 100,First account 110,Second account *115,Fifth account 120,Third account *125,Fourth account I would expect to see account_id,account_name *115,Fifth account *125,Fourth account 100,First account 110,Second account 120,Third account ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] Sorting when * is the initial character
On Mon, Feb 07, 2005 at 16:20:36 -0500, Berend Tober [EMAIL PROTECTED] wrote: SELECT * FROM sample_table ORDER BY 1; account_id,account_name 100,First account 110,Second account *115,Fifth account 120,Third account *125,Fourth account I would expect to see account_id,account_name *115,Fifth account *125,Fourth account 100,First account 110,Second account 120,Third account This depends on your locale. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Sorting when * is the initial character
On Mon, 7 Feb 2005, Berend Tober wrote: I encountered what looks like unusually sorting behavior, and I'm wondering if anyone can tell me if this is supposted to happen (and then if so, why) or if this is a bug: If you ran initdb with a locale such as en_US, a result like what you got is expected. AFAIR, the collation rules for the locale are defined to not use symbols and spaces in the first pass comparison so '110' '*115' '120'. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Sorting when * is the initial character
hi, Berend Tober wrote, On 2/7/2005 22:20: I encountered what looks like unusually sorting behavior, and I'm wondering if anyone can tell me if this is supposted to happen (and then if so, why) or if this is a bug: CREATE TABLE sample_table ( account_id varchar(4), account_name varchar(25) ) WITHOUT OIDS; INSERT INTO sample_table VALUES ('100', 'First account'); INSERT INTO sample_table VALUES ('110', 'Second account'); INSERT INTO sample_table VALUES ('120', 'Third account'); INSERT INTO sample_table VALUES ('*125', 'Fourth account'); INSERT INTO sample_table VALUES ('*115', 'Fifth account'); SELECT * FROM sample_table ORDER BY 1; account_id,account_name 100,First account 110,Second account *115,Fifth account 120,Third account *125,Fourth account I would expect to see account_id,account_name *115,Fifth account *125,Fourth account 100,First account 110,Second account 120,Third account order by case when account_id like '*%' then 0 else 1 end C. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly