Re: [GENERAL] Sorting CSV string and removing Duplicates

2015-07-27 Thread Yves Dorfsman
 
 I have a csv string in a text field that is unsorted and contains duplicates.
 Is there a simple way to remove these and sort the string.
 
 E.g 
 2,18,20,23,1,27,1,2,8,16,17,18,20,22,23,27 
 

Do you need to eventually load the data in Postgres?

I'd personally use python to deal with this, we're talking 4 or 5 lines here,
if even. I suspect you can do the same with perl or ruby or whatever is your
weapon of choice.

How columns does your csv file has? Is it a one-line file?

-- 
http://yves.zioup.com
gpg: 4096R/32B0F416



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting CSV string and removing Duplicates

2015-07-27 Thread dinesh kumar
On Mon, Jul 27, 2015 at 12:57 PM, dinesh kumar dineshkuma...@gmail.com
wrote:

 On Mon, Jul 27, 2015 at 12:53 PM, Alex Magnum magnum11...@gmail.com
 wrote:

 Hello,

 I have a csv string in a text field that is unsorted and contains
 duplicates.
 Is there a simple way to remove these and sort the string.

 E.g
 2,18,20,23,1,27,1,2,8,16,17,18,20,22,23,27

 i tried string to array and unique but that did not work...
 Any suggestions on how to do this without writing a function?

 Any help is appreciated.


 Are you looking for this.

 postgres=# SELECT unnest(string_to_array(t, ',')) from test group by 1;
  unnest
 
  2
  18
  8
  20
  22
  16
  27
  17
  23
  1
 (10 rows)


OR

Might be something like this

postgres=# WITH sortedstring as
postgres-# (
postgres(# SELECT unnest(string_to_array(t, ','))::int from test group by 1
ORDER BY 1
postgres(# ) SELECT array_agg(unnest) FROM sortedstring;
  array_agg
--
 {1,2,8,16,17,18,20,22,23,27}
(1 row)


Regards,
Dinesh
manojadinesh.blogspot.com



 Regards,
 Dinesh
 manojadinesh.blogspot.com

 Thanks
 A





Re: [GENERAL] Sorting CSV string and removing Duplicates

2015-07-27 Thread Chris Mair
 Hello,
 
 I have a csv string in a text field that is unsorted and contains
 duplicates.
 Is there a simple way to remove these and sort the string.
 
 E.g 
 2,18,20,23,1,27,1,2,8,16,17,18,20,22,23,27 
 
 i tried string to array and unique but that did not work...
 Any suggestions on how to do this without writing a function?
 
 Any help is appreciated.
 
 Thanks
 A


chris=# SELECT distinct x::int from
unnest(string_to_array('2,18,20,23,1,27,1,2,8,16,17,18,20,22,23,27',
',')) x order by x::int;

 x

  1
  2
  8
 16
 17
 18
 20
 22
 23
 27
(10 rows)

Bye,
Chris.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting CSV string and removing Duplicates

2015-07-27 Thread dinesh kumar
On Mon, Jul 27, 2015 at 12:53 PM, Alex Magnum magnum11...@gmail.com wrote:

 Hello,

 I have a csv string in a text field that is unsorted and contains
 duplicates.
 Is there a simple way to remove these and sort the string.

 E.g
 2,18,20,23,1,27,1,2,8,16,17,18,20,22,23,27

 i tried string to array and unique but that did not work...
 Any suggestions on how to do this without writing a function?

 Any help is appreciated.


Are you looking for this.

postgres=# SELECT unnest(string_to_array(t, ',')) from test group by 1;
 unnest

 2
 18
 8
 20
 22
 16
 27
 17
 23
 1
(10 rows)


Regards,
Dinesh
manojadinesh.blogspot.com

Thanks
 A



[GENERAL] Sorting CSV string and removing Duplicates

2015-07-27 Thread Alex Magnum
Hello,

I have a csv string in a text field that is unsorted and contains
duplicates.
Is there a simple way to remove these and sort the string.

E.g
2,18,20,23,1,27,1,2,8,16,17,18,20,22,23,27

i tried string to array and unique but that did not work...
Any suggestions on how to do this without writing a function?

Any help is appreciated.

Thanks
A


Re: [GENERAL] Sorting CSV string and removing Duplicates

2015-07-27 Thread Alex Magnum
Hi Danish, yes thats the one I was looking for. Thanks a lot!!!


On Tue, Jul 28, 2015 at 1:32 AM, dinesh kumar dineshkuma...@gmail.com
wrote:



 On Mon, Jul 27, 2015 at 12:57 PM, dinesh kumar dineshkuma...@gmail.com
 wrote:

 On Mon, Jul 27, 2015 at 12:53 PM, Alex Magnum magnum11...@gmail.com
 wrote:

 Hello,

 I have a csv string in a text field that is unsorted and contains
 duplicates.
 Is there a simple way to remove these and sort the string.

 E.g
 2,18,20,23,1,27,1,2,8,16,17,18,20,22,23,27

 i tried string to array and unique but that did not work...
 Any suggestions on how to do this without writing a function?

 Any help is appreciated.


 Are you looking for this.

 postgres=# SELECT unnest(string_to_array(t, ',')) from test group by 1;
  unnest
 
  2
  18
  8
  20
  22
  16
  27
  17
  23
  1
 (10 rows)


 OR

 Might be something like this

 postgres=# WITH sortedstring as
 postgres-# (
 postgres(# SELECT unnest(string_to_array(t, ','))::int from test group by
 1 ORDER BY 1
 postgres(# ) SELECT array_agg(unnest) FROM sortedstring;
   array_agg
 --
  {1,2,8,16,17,18,20,22,23,27}
 (1 row)


 Regards,
 Dinesh
 manojadinesh.blogspot.com



 Regards,
 Dinesh
 manojadinesh.blogspot.com

 Thanks
 A






[GENERAL] Sorting rows by a column and storing a row number

2010-10-31 Thread Alexander Farber
Hello,

I have a card game for each I'd like to introduce weekly tournaments.
I'm going to save the score (virtual money) won by each player into:

create table pref_money (
id varchar(32) references pref_users,
yw char(7) default to_char(current_timestamp, '-WW'),
money real
);
create index pref_money_yw_index on pref_money(yw);

This way I don't have to perform any special calculations at the end
of a week to find the weekly winner - just select all records for the
current year-week, sort them by money column and take the 1st one.

But I wonder, if there is a nice way in SQL to tell an interested user
his current rank in the table? I.e. given a user id, sort all records
by the money column and then let him know his rank.

I'm sure I can implement this in Perl, but then this will have to be
a cronjob, because I'll have to perform somewhat complex
calculations for each user id and store them into another table.

But a cronjob isn't nice, because it won't show the rank in real time.

Is there maybe an elegant and quick way for this in SQL?

Thank you for any ideas
Alex

P.S.: Using postgresql-8.4.5-1PGDG.rhel5 with CentOS 5.5

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting rows by a column and storing a row number

2010-10-31 Thread Darren Duncan

Use the Postgres window functions like rank(); this is what they're for.

http://www.postgresql.org/docs/8.4/interactive/queries-table-expressions.html#QUERIES-WINDOW

-- Darren Duncan

Alexander Farber wrote:

Hello,

I have a card game for each I'd like to introduce weekly tournaments.
I'm going to save the score (virtual money) won by each player into:

create table pref_money (
id varchar(32) references pref_users,
yw char(7) default to_char(current_timestamp, '-WW'),
money real
);
create index pref_money_yw_index on pref_money(yw);

This way I don't have to perform any special calculations at the end
of a week to find the weekly winner - just select all records for the
current year-week, sort them by money column and take the 1st one.

But I wonder, if there is a nice way in SQL to tell an interested user
his current rank in the table? I.e. given a user id, sort all records
by the money column and then let him know his rank.

I'm sure I can implement this in Perl, but then this will have to be
a cronjob, because I'll have to perform somewhat complex
calculations for each user id and store them into another table.

But a cronjob isn't nice, because it won't show the rank in real time.

Is there maybe an elegant and quick way for this in SQL?

Thank you for any ideas
Alex

P.S.: Using postgresql-8.4.5-1PGDG.rhel5 with CentOS 5.5



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting with materialized paths

2010-05-10 Thread Peter Hunsberger
On Sun, May 9, 2010 at 8:33 AM, Ovid curtis_ovid_...@yahoo.com wrote:
 My apologies. This isn't PG-specific, but since this is running on PostgreSQL 
 8.4, maybe there are specific features which might help.

 I have a tree structure in a table and it uses materialized paths to allow me 
 to find children quickly. However, I also need to sort the results 
 depth-first, as one would expect with threaded forum replies.

  id | parent_id | matpath |          created
 +---+-+
  2 |         1 | 1       | 2010-05-08 15:18:37.987544
  3 |         1 | 1       | 2010-05-08 17:38:14.125377
  4 |         1 | 1       | 2010-05-08 17:38:57.26743
  5 |         1 | 1       | 2010-05-08 17:43:28.211708
  7 |         1 | 1       | 2010-05-08 18:18:11.849735
  6 |         2 | 1.2     | 2010-05-08 17:50:43.288759
  9 |         5 | 1.5     | 2010-05-09 14:02:43.818646
  8 |         6 | 1.2.6   | 2010-05-09 14:01:17.632695

 So the final results should actually be sorted like this:

  id | parent_id | matpath |          created
 +---+-+
  2 |         1 | 1       | 2010-05-08 15:18:37.987544
  6 |         2 | 1.2     | 2010-05-08 17:50:43.288759
  8 |         6 | 1.2.6   | 2010-05-09 14:01:17.632695
  3 |         1 | 1       | 2010-05-08 17:38:14.125377
  4 |         1 | 1       | 2010-05-08 17:38:57.26743
  5 |         1 | 1       | 2010-05-08 17:43:28.211708
  9 |         5 | 1.5     | 2010-05-09 14:02:43.818646
  7 |         1 | 1       | 2010-05-08 18:18:11.849735

 Rationale:  this is for a threaded forum and id 6 is a reply to id 2, so it 
 needs to show up after that one.  Here's the rough structure of what the 
 output would look like (imagine an HTML forum):

 * id 1 (root post)
    * id 2
        * id 6
            * id 8
    * id 3
    * id 4
    * id 5
        * id 9
    * id 7

 How would I work that out? Can I do that in straight SQL or should additional 
 information be added to this table?


This is (once more) a flat query if you use a set / subset tree
implementation.  Joe Celko's book Trees and Hierarchies in SQL for
Smarties might be the fastest way to get up to speed on this, but you
can also figure it out if you spend a bit of time with Google
Basically, every node in the tree is a table row with two columns, say
left and right. All children are contained within the left and right
of the parent.  Pre-order tree traversal gives the algorithm for
assigning left and right.  Once done, your problem is solved by
ordering on left.



-- 
Peter Hunsberger

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting with materialized paths

2010-05-10 Thread Greg Stark
On Sun, May 9, 2010 at 4:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Ovid curtis_ovid_...@yahoo.com writes:
 My apologies. This isn't PG-specific, but since this is running on 
 PostgreSQL 8.4, maybe there are specific features which might help.
 I have a tree structure in a table and it uses materialized paths to allow 
 me to find children quickly. However, I also need to sort the results 
 depth-first, as one would expect with threaded forum replies.

 I think contrib/ltree might help you here.  However, it seems to sort
 node names textually rather than numerically, so you might need to
 change it a bit for your own purposes.


That's rather unfortunate. Ltree is awfully convenient and it would be
nice to be able to use it.

If you just used plain Postgres arrays of integers you would get the
sorting you want. But you lose all the useful ltree operators for
trees.

-- 
greg

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting with materialized paths

2010-05-10 Thread Alban Hertroys
On 10 May 2010, at 20:06, Greg Stark wrote:

 On Sun, May 9, 2010 at 4:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Ovid curtis_ovid_...@yahoo.com writes:
 My apologies. This isn't PG-specific, but since this is running on 
 PostgreSQL 8.4, maybe there are specific features which might help.
 I have a tree structure in a table and it uses materialized paths to allow 
 me to find children quickly. However, I also need to sort the results 
 depth-first, as one would expect with threaded forum replies.
 
 I think contrib/ltree might help you here.  However, it seems to sort
 node names textually rather than numerically, so you might need to
 change it a bit for your own purposes.
 
 
 That's rather unfortunate. Ltree is awfully convenient and it would be
 nice to be able to use it.
 
 If you just used plain Postgres arrays of integers you would get the
 sorting you want. But you lose all the useful ltree operators for
 trees.


I recall from the docs that you can create arrays of ltrees. It has some 
special operators for that. I couldn't figure out what the use case for those 
ltree-arrays was (the docs are rather sparse), but it might just be what you're 
looking for.

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4be8791c10411720337464!



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting with materialized paths

2010-05-10 Thread Thomas Kellerer

Ovid wrote on 09.05.2010 15:33:

My apologies. This isn't PG-specific, but since this is running on
PostgreSQL 8.4, maybe there are specific features which might help.

I have a tree structure in a table and it uses materialized paths to
allow me to find children quickly. However, I also need to sort the
results depth-first, as one would expect with threaded forum
replies.

  id | parent_id | matpath |  created
+---+-+
   2 | 1 | 1   | 2010-05-08 15:18:37.987544
   3 | 1 | 1   | 2010-05-08 17:38:14.125377
   4 | 1 | 1   | 2010-05-08 17:38:57.26743
   5 | 1 | 1   | 2010-05-08 17:43:28.211708
   7 | 1 | 1   | 2010-05-08 18:18:11.849735
   6 | 2 | 1.2 | 2010-05-08 17:50:43.288759
   9 | 5 | 1.5 | 2010-05-09 14:02:43.818646
   8 | 6 | 1.2.6   | 2010-05-09 14:01:17.632695

So the final results should actually be sorted like this:

  id | parent_id | matpath |  created
+---+-+
   2 | 1 | 1   | 2010-05-08 15:18:37.987544
   6 | 2 | 1.2 | 2010-05-08 17:50:43.288759
   8 | 6 | 1.2.6   | 2010-05-09 14:01:17.632695
   3 | 1 | 1   | 2010-05-08 17:38:14.125377
   4 | 1 | 1   | 2010-05-08 17:38:57.26743
   5 | 1 | 1   | 2010-05-08 17:43:28.211708
   9 | 5 | 1.5 | 2010-05-09 14:02:43.818646
   7 | 1 | 1   | 2010-05-08 18:18:11.849735



Try this:

with recursive thread_display (id, parent_id, matpath, created, sort_key)
as
(
   select id, parent_id, matpath, created, array[id] as sort_key
   from threads
   where id = 1
   union all
   select c.id, c.parent_id, c.matpath, c.created, p.sort_key||array[c.id]
   from threads c
 join thread_display p on c.parent_id = p.id
)
select id, parent_id, matpath, created
from thread_display
order by sort_key;

Thomas




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Sorting with materialized paths

2010-05-09 Thread Ovid
My apologies. This isn't PG-specific, but since this is running on PostgreSQL 
8.4, maybe there are specific features which might help.

I have a tree structure in a table and it uses materialized paths to allow me 
to find children quickly. However, I also need to sort the results depth-first, 
as one would expect with threaded forum replies.

 id | parent_id | matpath |  created   
+---+-+
  2 | 1 | 1   | 2010-05-08 15:18:37.987544
  3 | 1 | 1   | 2010-05-08 17:38:14.125377
  4 | 1 | 1   | 2010-05-08 17:38:57.26743
  5 | 1 | 1   | 2010-05-08 17:43:28.211708
  7 | 1 | 1   | 2010-05-08 18:18:11.849735
  6 | 2 | 1.2 | 2010-05-08 17:50:43.288759
  9 | 5 | 1.5 | 2010-05-09 14:02:43.818646
  8 | 6 | 1.2.6   | 2010-05-09 14:01:17.632695

So the final results should actually be sorted like this:

 id | parent_id | matpath |  created
+---+-+
  2 | 1 | 1   | 2010-05-08 15:18:37.987544
  6 | 2 | 1.2 | 2010-05-08 17:50:43.288759
  8 | 6 | 1.2.6   | 2010-05-09 14:01:17.632695
  3 | 1 | 1   | 2010-05-08 17:38:14.125377
  4 | 1 | 1   | 2010-05-08 17:38:57.26743
  5 | 1 | 1   | 2010-05-08 17:43:28.211708
  9 | 5 | 1.5 | 2010-05-09 14:02:43.818646
  7 | 1 | 1   | 2010-05-08 18:18:11.849735

Rationale:  this is for a threaded forum and id 6 is a reply to id 2, so it 
needs to show up after that one.  Here's the rough structure of what the output 
would look like (imagine an HTML forum):

* id 1 (root post)
* id 2
* id 6
* id 8
* id 3
* id 4
* id 5
* id 9
* id 7

How would I work that out? Can I do that in straight SQL or should additional 
information be added to this table?

Cheers,
Ovid
--
Buy the book - http://www.oreilly.com/catalog/perlhks/
Tech blog- http://blogs.perl.org/users/ovid/
Twitter  - http://twitter.com/OvidPerl
Official Perl 6 Wiki - http://www.perlfoundation.org/perl6


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting with materialized paths

2010-05-09 Thread Tom Lane
Ovid curtis_ovid_...@yahoo.com writes:
 My apologies. This isn't PG-specific, but since this is running on PostgreSQL 
 8.4, maybe there are specific features which might help.
 I have a tree structure in a table and it uses materialized paths to allow me 
 to find children quickly. However, I also need to sort the results 
 depth-first, as one would expect with threaded forum replies.

I think contrib/ltree might help you here.  However, it seems to sort
node names textually rather than numerically, so you might need to
change it a bit for your own purposes.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-23 Thread Greg Stark
On Tue, Feb 23, 2010 at 6:48 AM, Scott Marlowe scott.marl...@gmail.com wrote:
 I'm relieved that Postgresql itself does not, in fact, suck, but
 slightly disappointed in the behavior of psql. I suppose it needs to
 buffer everything in memory to properly format its tabular output,
 among other possible reasons I could imagine.

 It's best when working with big sets to do so with a cursor and fetch
 a few thousand rows at a time.  It's how we handle really big sets at
 work and it works like a charm in keeping the client from bogging down
 with a huge memory footprint.


You can do \set FETCH_COUNT to have psql use a cursor automatically.


-- 
greg

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-23 Thread John Gage
I am under the impression that MySQL does not have anything resembling  
Postgres' support for regular expressions.  Though some might think  
that regular expressions are a sort of poor man's SQL, in any  
application which manages large amounts of text they are crucial.   
Postgres definitely does not suck.


Is this the authoratative webpage for Snowball (which I never  
realized was a play on Snobol)?


http://snowball.tartarus.org/

Thanks,

John


On Feb 23, 2010, at 6:51 AM, Yang Zhang wrote:


I'm relieved that Postgresql itself does not, in fact, suck,



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-23 Thread Yeb Havinga

Greg Stark wrote:


You can do \set FETCH_COUNT to have psql use a cursor automatically.
  
It seems like a big win in this case. What would be the downside of 
having a fetch_count set default in psql?


regards
Yeb Havinga





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-23 Thread Greg Stark
On Tue, Feb 23, 2010 at 9:48 AM, Yeb Havinga yebhavi...@gmail.com wrote:
 Greg Stark wrote:

 You can do \set FETCH_COUNT to have psql use a cursor automatically.


 It seems like a big win in this case. What would be the downside of having a
 fetch_count set default in psql?

They were mentioned previously in this thread:

1) If an error occurs partway through the execution of the query you
might receive part of the result set.

2) psql won't be able to align the columns properly

-- 
greg

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-23 Thread Alex Hunsaker
On Tue, Feb 23, 2010 at 00:02, Yang Zhang yanghates...@gmail.com wrote:
 Thing is, this is how I got here:

 - ran complex query that does SELECT INTO.
 - that never terminated, so killed it and tried a simpler SELECT (the
 subject of this thread) from psql to see how long that would take.

You might have better luck if you paste the EXPLAIN ANALYZE of the
SELECT INTO here (Maybe a new thread? Maybe on -performance? use your
judgement...).  But I bet if its doing something with transactionid
like your straight select was, an index would help.  If you are just
using SELECT INTO to copy all of the data into a new table... COPY
might be faster or CREATE TABLE AS.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-23 Thread Baron Schwartz
Hi,

On Tue, Feb 23, 2010 at 12:51 AM, Yang Zhang yanghates...@gmail.com wrote:
 When running the query in MySQL InnoDB:

 $ vmstat 10
 procs ---memory-- ---swap-- -io --system--
 -cpu--
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa 
 st
  0 13 13733604  83020   5648 2193884    3    3   936   168    2    1
 4  2 89  5  0
  1 12 13749952  80164   5600 2178032    0 4354   908  4379 3586 2638
 0  1 38 60  0
  0 19 13762228  80576   5556 2145220  208 3527  1280  3690 3668 2635
 1  1 39 59  0
  0 19 13778632  79420   5560 2135228   52 4186  1046  4191 3682 2418
 0  1 37 62  0
[snip]
 I'm guessing the swap numbers are because MySQL uses mmap?

InnoDB doesn't use mmap.

Baron

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
I have the exact same table of data in both MySQL and Postgresql. In Postgresql:

tpcc=# \d metarelcloud_transactionlog
   Table
public.metarelcloud_transactionlog
   Column| Type  |
   Modifiers
-+---+--
 id  | integer   | not null default
nextval('metarelcloud_transactionlog_id_seq'::regclass)
 transactionid   | integer   | not null
 queryid | smallint  | not null
 tableid | character varying(30) | not null
 tupleid | integer   | not null
 querytype   | character varying | not null
 graphpartition  | smallint  |
 replicatedpartition | smallint  |
 justifiedpartition  | smallint  |
 hashpartition   | smallint  |
 nodeid  | integer   |
 manualpartition | smallint  |
Indexes:
metarelcloud_transactionlog_pkey PRIMARY KEY, btree (id)
Check constraints:
metarelcloud_transactionlog_graphpartition_check CHECK
(graphpartition = 0)
metarelcloud_transactionlog_hashpartition_check CHECK (hashpartition = 0)
metarelcloud_transactionlog_justifiedpartition_check CHECK
(justifiedpartition = 0)
metarelcloud_transactionlog_manualpartition_check CHECK
(manualpartition = 0)
metarelcloud_transactionlog_querytype_check CHECK
(querytype::text = ANY (ARRAY['select'::character varying,
'insert'::character varying, 'delete'::character varying,
'update'::character varying]::text[]))
metarelcloud_transactionlog_replicatedpartition_check CHECK
(replicatedpartition = 0)

In MySQL:

CREATE TABLE `metarelcloud_transactionlog` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `transactionid` int(11) NOT NULL,
  `queryid` tinyint(4) NOT NULL,
  `tableid` varchar(30) NOT NULL,
  `tupleid` int(11) NOT NULL,
  `querytype` enum('select','insert','delete','update') NOT NULL,
  `graphpartition` tinyint(3) unsigned DEFAULT NULL,
  `replicatedpartition` tinyint(3) unsigned DEFAULT NULL,
  `justifiedpartition` tinyint(3) unsigned DEFAULT NULL,
  `hashpartition` tinyint(3) unsigned DEFAULT NULL,
  `nodeid` int(11) DEFAULT NULL,
  `manualpartition` tinyint(3) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `transactionid` (`transactionid`),
  KEY `tableid` (`tableid`,`tupleid`),
  KEY `nodeid` (`nodeid`)
) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1

I'm running:

  select * from metarelcloud_transactionlog order by transactionid;

It takes MySQL 6 minutes, but Postgresql is still running after 70
minutes. Is there something like a glaring misconfiguration that I'm
overlooking? Thanks in advance.
--
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Pavel Stehule
hello

the speed depends on setting of working_memory. Try to increase a working_memory

set working_memory to '10MB';

Regards
Pavel Stehule

2010/2/22 Yang Zhang yanghates...@gmail.com:
 I have the exact same table of data in both MySQL and Postgresql. In 
 Postgresql:

 tpcc=# \d metarelcloud_transactionlog
                                       Table
 public.metarelcloud_transactionlog
       Column        |         Type          |
       Modifiers
 -+---+--
  id                  | integer               | not null default
 nextval('metarelcloud_transactionlog_id_seq'::regclass)
  transactionid       | integer               | not null
  queryid             | smallint              | not null
  tableid             | character varying(30) | not null
  tupleid             | integer               | not null
  querytype           | character varying     | not null
  graphpartition      | smallint              |
  replicatedpartition | smallint              |
  justifiedpartition  | smallint              |
  hashpartition       | smallint              |
  nodeid              | integer               |
  manualpartition     | smallint              |
 Indexes:
    metarelcloud_transactionlog_pkey PRIMARY KEY, btree (id)
 Check constraints:
    metarelcloud_transactionlog_graphpartition_check CHECK
 (graphpartition = 0)
    metarelcloud_transactionlog_hashpartition_check CHECK (hashpartition = 
 0)
    metarelcloud_transactionlog_justifiedpartition_check CHECK
 (justifiedpartition = 0)
    metarelcloud_transactionlog_manualpartition_check CHECK
 (manualpartition = 0)
    metarelcloud_transactionlog_querytype_check CHECK
 (querytype::text = ANY (ARRAY['select'::character varying,
 'insert'::character varying, 'delete'::character varying,
 'update'::character varying]::text[]))
    metarelcloud_transactionlog_replicatedpartition_check CHECK
 (replicatedpartition = 0)

 In MySQL:

 CREATE TABLE `metarelcloud_transactionlog` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `transactionid` int(11) NOT NULL,
  `queryid` tinyint(4) NOT NULL,
  `tableid` varchar(30) NOT NULL,
  `tupleid` int(11) NOT NULL,
  `querytype` enum('select','insert','delete','update') NOT NULL,
  `graphpartition` tinyint(3) unsigned DEFAULT NULL,
  `replicatedpartition` tinyint(3) unsigned DEFAULT NULL,
  `justifiedpartition` tinyint(3) unsigned DEFAULT NULL,
  `hashpartition` tinyint(3) unsigned DEFAULT NULL,
  `nodeid` int(11) DEFAULT NULL,
  `manualpartition` tinyint(3) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `transactionid` (`transactionid`),
  KEY `tableid` (`tableid`,`tupleid`),
  KEY `nodeid` (`nodeid`)
 ) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1

 I'm running:

  select * from metarelcloud_transactionlog order by transactionid;

 It takes MySQL 6 minutes, but Postgresql is still running after 70
 minutes. Is there something like a glaring misconfiguration that I'm
 overlooking? Thanks in advance.
 --
 Yang Zhang
 http://www.mit.edu/~y_z/

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Frank Heikens
There is no index on the column transactionid in your PostgreSQL- 
table, as there is in your MySQL-table. This explains the difference.


CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog  
(transactionid);



Op 22 feb 2010, om 19:10 heeft Yang Zhang het volgende geschreven:

I have the exact same table of data in both MySQL and Postgresql. In  
Postgresql:


tpcc=# \d metarelcloud_transactionlog
  Table
public.metarelcloud_transactionlog
  Column| Type  |
  Modifiers
-+--- 
+ 
--

id  | integer   | not null default
nextval('metarelcloud_transactionlog_id_seq'::regclass)
transactionid   | integer   | not null
queryid | smallint  | not null
tableid | character varying(30) | not null
tupleid | integer   | not null
querytype   | character varying | not null
graphpartition  | smallint  |
replicatedpartition | smallint  |
justifiedpartition  | smallint  |
hashpartition   | smallint  |
nodeid  | integer   |
manualpartition | smallint  |
Indexes:
   metarelcloud_transactionlog_pkey PRIMARY KEY, btree (id)
Check constraints:
   metarelcloud_transactionlog_graphpartition_check CHECK
(graphpartition = 0)
   metarelcloud_transactionlog_hashpartition_check CHECK  
(hashpartition = 0)

   metarelcloud_transactionlog_justifiedpartition_check CHECK
(justifiedpartition = 0)
   metarelcloud_transactionlog_manualpartition_check CHECK
(manualpartition = 0)
   metarelcloud_transactionlog_querytype_check CHECK
(querytype::text = ANY (ARRAY['select'::character varying,
'insert'::character varying, 'delete'::character varying,
'update'::character varying]::text[]))
   metarelcloud_transactionlog_replicatedpartition_check CHECK
(replicatedpartition = 0)

In MySQL:

CREATE TABLE `metarelcloud_transactionlog` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `transactionid` int(11) NOT NULL,
 `queryid` tinyint(4) NOT NULL,
 `tableid` varchar(30) NOT NULL,
 `tupleid` int(11) NOT NULL,
 `querytype` enum('select','insert','delete','update') NOT NULL,
 `graphpartition` tinyint(3) unsigned DEFAULT NULL,
 `replicatedpartition` tinyint(3) unsigned DEFAULT NULL,
 `justifiedpartition` tinyint(3) unsigned DEFAULT NULL,
 `hashpartition` tinyint(3) unsigned DEFAULT NULL,
 `nodeid` int(11) DEFAULT NULL,
 `manualpartition` tinyint(3) unsigned DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `transactionid` (`transactionid`),
 KEY `tableid` (`tableid`,`tupleid`),
 KEY `nodeid` (`nodeid`)
) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1

I'm running:

 select * from metarelcloud_transactionlog order by transactionid;

It takes MySQL 6 minutes, but Postgresql is still running after 70
minutes. Is there something like a glaring misconfiguration that I'm
overlooking? Thanks in advance.
--
Yang Zhang
http://www.mit.edu/~y_z/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Frank Heikens
frankheik...@mac.com




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Richard Broersma
On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens frankheik...@mac.com wrote:

 There is no index on the column transactionid in your PostgreSQL-table, as
 there is in your MySQL-table. This explains the difference.

 CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
 (transactionid);

Does an index help a sort operation in PostgreSQL?


-- 
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 1:13 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 hello

 the speed depends on setting of working_memory. Try to increase a 
 working_memory

 set working_memory to '10MB';

It's already at

tpcc=# show work_mem;
 work_mem
--
 2kB
(1 row)

I also wouldn't have imagined an external merge-sort as being very
memory-intensive--wouldn't it only enough buffer space to read 2x and
write 1x in big-enough chunks for mostly-sequential access?
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 1:30 PM, Richard Broersma
richard.broer...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens frankheik...@mac.com wrote:

 There is no index on the column transactionid in your PostgreSQL-table, as
 there is in your MySQL-table. This explains the difference.

 CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
 (transactionid);

 Does an index help a sort operation in PostgreSQL?

I also share the same doubt. An external merge-sort needs to make
complete passes over the entire dataset, with no index-directed
accesses.
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Frank Heikens


Op 22 feb 2010, om 19:30 heeft Richard Broersma het volgende geschreven:

On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens  
frankheik...@mac.com wrote:


There is no index on the column transactionid in your PostgreSQL- 
table, as

there is in your MySQL-table. This explains the difference.

CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
(transactionid);


Does an index help a sort operation in PostgreSQL?


Yes it does, see the manual: 
http://www.postgresql.org/docs/8.4/interactive/indexes-ordering.html

Example without index:
Sort  (cost=804.39..829.39 rows=1 width=4) (actual  
time=16.006..17.171 rows=1 loops=1)

  Sort Key: bar
  Sort Method:  quicksort  Memory: 491kB
  -  Seq Scan on bla  (cost=0.00..140.00 rows=1 width=4) (actual  
time=0.015..2.236 rows=1 loops=1)

Total runtime: 18.098 ms

Same query with index (btree):
Index Scan Backward using i_bar on bla  (cost=0.00..406.25 rows=1  
width=4) (actual time=0.093..4.408 rows=1 loops=1)

Total runtime: 5.381 ms




--
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug



Regards,
Frank Heikens




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Alban Hertroys
On 22 Feb 2010, at 19:35, Yang Zhang wrote:

 I also wouldn't have imagined an external merge-sort as being very


Where's that external merge-sort coming from? Can you show an explain analyze?

If your work-mem is too low there's a good chance that Postgres has to use your 
disks for sorting, which will obviously be quite slow.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4b82d18510442035320951!



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Alvaro Herrera
Yang Zhang escribió:

 I'm running:
 
   select * from metarelcloud_transactionlog order by transactionid;
 
 It takes MySQL 6 minutes, but Postgresql is still running after 70
 minutes. Is there something like a glaring misconfiguration that I'm
 overlooking? Thanks in advance.

How large is the table, and have you vacuumed it?  Did you analyze it?
What Pg version is this?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 1:48 PM, Alban Hertroys
dal...@solfertje.student.utwente.nl wrote:
 On 22 Feb 2010, at 19:35, Yang Zhang wrote:

 I also wouldn't have imagined an external merge-sort as being very


 Where's that external merge-sort coming from? Can you show an explain analyze?

I just assumed that the Sort in the EXPLAIN output meant an external
merge-sort, given that the table has over 50 million tuples and is
over 3GB, *and* there is no index on the sort key:

tpcc=# explain select * from metarelcloud_transactionlog order by transactionid;
   QUERY PLAN
-
 Sort  (cost=8408637.34..8534662.95 rows=50410244 width=17)
   Sort Key: a.transactionid
   -  Seq Scan on metarelcloud_transactionlog a
(cost=0.00..925543.44 rows=50410244 width=17)
(3 rows)

Anyway, I added the INDEX as suggested by Frank, but it's been 20
minutes and it's still running. With the index, EXPLAIN says:

tpcc=# explain select * from metarelcloud_transactionlog order by transactionid;
   QUERY PLAN
-
 Index Scan using i_transactionid on metarelcloud_transactionlog
(cost=0.00..4453076.81 rows=50410164 width=44)
(1 row)

 If your work-mem is too low there's a good chance that Postgres has to use 
 your disks for sorting, which will obviously be quite slow.

Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
much faster (on the order of several minutes).
--
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 2:03 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Yang Zhang escribió:

 I'm running:

   select * from metarelcloud_transactionlog order by transactionid;

 It takes MySQL 6 minutes, but Postgresql is still running after 70
 minutes. Is there something like a glaring misconfiguration that I'm
 overlooking? Thanks in advance.

 How large is the table, and have you vacuumed it?  Did you analyze it?
 What Pg version is this?

The table has 50.4M tuples. It's been vacuumed and analyzed. I'm using
PG 8.3.8 on Fedora 10 x86_64.
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Frank Heikens


Op 22 feb 2010, om 20:07 heeft Yang Zhang het volgende geschreven:


On Mon, Feb 22, 2010 at 1:48 PM, Alban Hertroys
dal...@solfertje.student.utwente.nl wrote:

On 22 Feb 2010, at 19:35, Yang Zhang wrote:


I also wouldn't have imagined an external merge-sort as being very



Where's that external merge-sort coming from? Can you show an  
explain analyze?


I just assumed that the Sort in the EXPLAIN output meant an external
merge-sort, given that the table has over 50 million tuples and is
over 3GB, *and* there is no index on the sort key:

tpcc=# explain select * from metarelcloud_transactionlog order by  
transactionid;

  QUERY PLAN
-
Sort  (cost=8408637.34..8534662.95 rows=50410244 width=17)
  Sort Key: a.transactionid
  -  Seq Scan on metarelcloud_transactionlog a
(cost=0.00..925543.44 rows=50410244 width=17)
(3 rows)

Anyway, I added the INDEX as suggested by Frank, but it's been 20
minutes and it's still running. With the index, EXPLAIN says:

tpcc=# explain select * from metarelcloud_transactionlog order by  
transactionid;

  QUERY PLAN
-
Index Scan using i_transactionid on metarelcloud_transactionlog
(cost=0.00..4453076.81 rows=50410164 width=44)
(1 row)



Use EXPLAIN ANALYZE to see how the query is executed, gives you more  
details.



If your work-mem is too low there's a good chance that Postgres has  
to use your disks for sorting, which will obviously be quite slow.


Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
much faster (on the order of several minutes).


Make sure your index does fit into memory, what's the size of the index?



--
Yang Zhang
http://www.mit.edu/~y_z/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Frank Heikens




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 2:15 PM, Frank Heikens frankheik...@mac.com wrote:

 Op 22 feb 2010, om 20:07 heeft Yang Zhang het volgende geschreven:

 On Mon, Feb 22, 2010 at 1:48 PM, Alban Hertroys
 dal...@solfertje.student.utwente.nl wrote:

 On 22 Feb 2010, at 19:35, Yang Zhang wrote:

 I also wouldn't have imagined an external merge-sort as being very


 Where's that external merge-sort coming from? Can you show an explain
 analyze?

 I just assumed that the Sort in the EXPLAIN output meant an external
 merge-sort, given that the table has over 50 million tuples and is
 over 3GB, *and* there is no index on the sort key:

 tpcc=# explain select * from metarelcloud_transactionlog order by
 transactionid;
                                                  QUERY PLAN

 -
 Sort  (cost=8408637.34..8534662.95 rows=50410244 width=17)
  Sort Key: a.transactionid
  -  Seq Scan on metarelcloud_transactionlog a
 (cost=0.00..925543.44 rows=50410244 width=17)
 (3 rows)

 Anyway, I added the INDEX as suggested by Frank, but it's been 20
 minutes and it's still running. With the index, EXPLAIN says:

 tpcc=# explain select * from metarelcloud_transactionlog order by
 transactionid;
                                                  QUERY PLAN

 -
 Index Scan using i_transactionid on metarelcloud_transactionlog
 (cost=0.00..4453076.81 rows=50410164 width=44)
 (1 row)


 Use EXPLAIN ANALYZE to see how the query is executed, gives you more
 details.

As mentioned, this would take a very long time to complete running --
I have not yet seen one successful query completion yet. I'd need to
let it run overnight.



 If your work-mem is too low there's a good chance that Postgres has to
 use your disks for sorting, which will obviously be quite slow.

 Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
 much faster (on the order of several minutes).

 Make sure your index does fit into memory, what's the size of the index?

How might I find out the size and whether it's being fit in memory?
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Alvaro Herrera
Yang Zhang escribió:
 I have the exact same table of data in both MySQL and Postgresql. In 
 Postgresql:

I just noticed two things:

[snip lots of stuff]

1.

 ) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1

You're doing a comparison to MyISAM.


2.

   select * from metarelcloud_transactionlog order by transactionid;

You're reading the whole table.

This is unlikely to fly very far.  I suggest you try some query that's
actually going to be used in the real world.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 2:27 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Yang Zhang escribió:
 I have the exact same table of data in both MySQL and Postgresql. In 
 Postgresql:

 I just noticed two things:

 [snip lots of stuff]

 1.

 ) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1

 You're doing a comparison to MyISAM.

We've actually been using innodb as well; it exhibits similar
execution times to MyISAM.



 2.

   select * from metarelcloud_transactionlog order by transactionid;

 You're reading the whole table.

 This is unlikely to fly very far.  I suggest you try some query that's
 actually going to be used in the real world.

This isn't some microbenchmark. This is part of our actual analytical
application. We're running large-scale graph partitioning algorithms.
--
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Scott Marlowe
On Mon, Feb 22, 2010 at 12:30 PM, Yang Zhang yanghates...@gmail.com wrote:
 This isn't some microbenchmark. This is part of our actual analytical
 application. We're running large-scale graph partitioning algorithms.

It's important to see how it runs if you can fit more / most of the
data set into memory by cranking up work_mem to something really big
(like a gigabyte or two) and if the query planner can switch to some
sort of hash algorithm.

Also, can you cluster the table on transactionid ?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Frank Heikens


Op 22 feb 2010, om 20:28 heeft Yang Zhang het volgende geschreven:






If your work-mem is too low there's a good chance that Postgres  
has to

use your disks for sorting, which will obviously be quite slow.


Relative to the non-terminating 80-minute-so-far sort, Unix sort  
runs

much faster (on the order of several minutes).


Make sure your index does fit into memory, what's the size of the  
index?


How might I find out the size and whether it's being fit in memory?


SELECT pg_size_pretty(pg_relation_size('i_transactionid'));



--
Yang Zhang
http://www.mit.edu/~y_z/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Frank Heikens




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 2:39 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 12:30 PM, Yang Zhang yanghates...@gmail.com wrote:
 This isn't some microbenchmark. This is part of our actual analytical
 application. We're running large-scale graph partitioning algorithms.

 It's important to see how it runs if you can fit more / most of the
 data set into memory by cranking up work_mem to something really big
 (like a gigabyte or two) and if the query planner can switch to some
 sort of hash algorithm.

We're actually using a very small dataset right now. Being bounded by
memory capacity is not a scalable approach for our application.


 Also, can you cluster the table on transactionid ?


We can, but that's not really addressing the core issue, which matters
to us since the sort itself is only for performing a self merge join
on transactionid, and the *very next step* is a group by a.tableid,
a.tupleid, b.tableid, b.tupleid (i.e. requiring another sort for the
group-agg).
--
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Scott Marlowe
On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang yanghates...@gmail.com wrote:
 I have the exact same table of data in both MySQL and Postgresql. In 
 Postgresql:

Just wondering, are these on the same exact machine?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 2:52 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang yanghates...@gmail.com wrote:
 I have the exact same table of data in both MySQL and Postgresql. In 
 Postgresql:

 Just wondering, are these on the same exact machine?


Yes, on the same disk.
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Scott Marlowe
On Mon, Feb 22, 2010 at 12:50 PM, Yang Zhang yanghates...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 2:39 PM, Scott Marlowe scott.marl...@gmail.com 
 wrote:
 On Mon, Feb 22, 2010 at 12:30 PM, Yang Zhang yanghates...@gmail.com wrote:
 This isn't some microbenchmark. This is part of our actual analytical
 application. We're running large-scale graph partitioning algorithms.

 It's important to see how it runs if you can fit more / most of the
 data set into memory by cranking up work_mem to something really big
 (like a gigabyte or two) and if the query planner can switch to some
 sort of hash algorithm.

 We're actually using a very small dataset right now. Being bounded by
 memory capacity is not a scalable approach for our application.

But the more you can fit into work_mem the faster it will go anyway.
So it's still worth a try.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 2:41 PM, Frank Heikens frankheik...@mac.com wrote:

 Op 22 feb 2010, om 20:28 heeft Yang Zhang het volgende geschreven:




 If your work-mem is too low there's a good chance that Postgres has to
 use your disks for sorting, which will obviously be quite slow.

 Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
 much faster (on the order of several minutes).

 Make sure your index does fit into memory, what's the size of the index?

 How might I find out the size and whether it's being fit in memory?

 SELECT pg_size_pretty(pg_relation_size('i_transactionid'));

 pg_size_pretty

 1080 MB
(1 row)
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yeb Havinga

Scott Marlowe wrote:

On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang yanghates...@gmail.com wrote:
  

I have the exact same table of data in both MySQL and Postgresql. In Postgresql:



Just wondering, are these on the same exact machine?

  
Just reading up on this interesting thread. WFIW, 2 years ago I and a 
collegue of mine did a hardware comparison of early Intel and AMD 
desktop quadcore processors to run postgres database, with most other 
parts comparable. The intel processor was 20 to 30 % faster in cpu 
operations to the (first generation) Phenom at almost everything, except 
at index creation. The test that the AMD finished in a few minutes, we 
had to stop on the Intel because it simply didn't finish. We double 
checked configuration settings and could not find explainable 
differences. I hesitate to post this information here, because its hard 
to believe that an actual big difference between the processors exists, 
and it more likely was something in our test setup. Still: the 
difference was *only* in index creation, which is kindoff like a qsort yes?


egards
Yeb Havinga





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Tom Lane
Yang Zhang yanghates...@gmail.com writes:
 On Mon, Feb 22, 2010 at 1:13 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 the speed depends on setting of working_memory. Try to increase a 
 working_memory

 It's already at
  2kB

According to your original posting, you're trying to sort something like
a gigabyte of data.  20MB is peanuts.  I wouldn't recommend increasing
the value across-the-board, but setting it to several hundred meg for
this particular query might help.  How much RAM in your machine anyway?

Also, the fact that mysql is faster suggests that having an index does help.
Possibly the data is nearly ordered by transactionid, in which case an
indexscan would not have random-access problems and would be much faster
than an explicit sort.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Scott Marlowe
On Mon, Feb 22, 2010 at 12:53 PM, Yang Zhang yanghates...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 2:52 PM, Scott Marlowe scott.marl...@gmail.com 
 wrote:
 On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang yanghates...@gmail.com wrote:
 I have the exact same table of data in both MySQL and Postgresql. In 
 Postgresql:

 Just wondering, are these on the same exact machine?


 Yes, on the same disk.

I'm wondering how much of this could be caching effects.  Is the MySQL
database warmed up before you started, and the pgsql database is
cold and no caching has taken place?

What do things like vmstat 10 say while the query is running on each
db?  First time, second time, things like that.

Also, just curios, what's shared_buffers set to on the pgsql instance?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 3:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yang Zhang yanghates...@gmail.com writes:
 On Mon, Feb 22, 2010 at 1:13 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 the speed depends on setting of working_memory. Try to increase a 
 working_memory

 It's already at
  2kB

 According to your original posting, you're trying to sort something like
 a gigabyte of data.  20MB is peanuts.  I wouldn't recommend increasing
 the value across-the-board, but setting it to several hundred meg for
 this particular query might help.  How much RAM in your machine anyway?

We have 16GB of RAM, but again, Unix sort (and even our own
hand-rolled merge-sort) can operate zippily while avoiding consuming
additional memory.

All the same, we increased work_mem to 1GB, and still the query is not
completing.


 Also, the fact that mysql is faster suggests that having an index does help.
 Possibly the data is nearly ordered by transactionid, in which case an
 indexscan would not have random-access problems and would be much faster
 than an explicit sort.

Note that earlier in the thread I tried running this query with an
index scan, but it's still much slower.
--
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Tom Lane
Yeb Havinga yebhavi...@gmail.com writes:
 Just reading up on this interesting thread. WFIW, 2 years ago I and a 
 collegue of mine did a hardware comparison of early Intel and AMD 
 desktop quadcore processors to run postgres database, with most other 
 parts comparable. The intel processor was 20 to 30 % faster in cpu 
 operations to the (first generation) Phenom at almost everything, except 
 at index creation. The test that the AMD finished in a few minutes, we 
 had to stop on the Intel because it simply didn't finish. We double 
 checked configuration settings and could not find explainable 
 differences. I hesitate to post this information here, because its hard 
 to believe that an actual big difference between the processors exists, 
 and it more likely was something in our test setup. Still: the 
 difference was *only* in index creation, which is kindoff like a qsort yes?

Interesting.  Yes, btree index creation is essentially a sort ...

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Igor Neyman
When in doubt - test.
Why not remove index in MySQL (or create index in PostgreSQL) and see
what happens.
Why trying compare apples and oranges?

Igor Neyman 

 -Original Message-
 From: Yang Zhang [mailto:yanghates...@gmail.com] 
 Sent: Monday, February 22, 2010 1:37 PM
 To: Richard Broersma
 Cc: Frank Heikens; pgsql-general@postgresql.org
 Subject: Re: Sorting performance vs. MySQL
 
 On Mon, Feb 22, 2010 at 1:30 PM, Richard Broersma 
 richard.broer...@gmail.com wrote:
  On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens 
 frankheik...@mac.com wrote:
 
  There is no index on the column transactionid in your 
  PostgreSQL-table, as there is in your MySQL-table. This 
 explains the difference.
 
  CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
  (transactionid);
 
  Does an index help a sort operation in PostgreSQL?
 
 I also share the same doubt. An external merge-sort needs to 
 make complete passes over the entire dataset, with no 
 index-directed accesses.
 --
 Yang Zhang
 http://www.mit.edu/~y_z/
 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 5:31 PM, Igor Neyman iney...@perceptron.com wrote:
 When in doubt - test.
 Why not remove index in MySQL (or create index in PostgreSQL) and see
 what happens.
 Why trying compare apples and oranges?

Continue reading this thread -- I also tried using an index in Postgresql.
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Scott Marlowe
On Mon, Feb 22, 2010 at 3:33 PM, Yang Zhang yanghates...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 5:31 PM, Igor Neyman iney...@perceptron.com wrote:
 When in doubt - test.
 Why not remove index in MySQL (or create index in PostgreSQL) and see
 what happens.
 Why trying compare apples and oranges?

 Continue reading this thread -- I also tried using an index in Postgresql.

But have you tried cranking up work_mem to say 1G?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Scott Marlowe
On Mon, Feb 22, 2010 at 3:33 PM, Yang Zhang yanghates...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 5:31 PM, Igor Neyman iney...@perceptron.com wrote:
 When in doubt - test.
 Why not remove index in MySQL (or create index in PostgreSQL) and see
 what happens.
 Why trying compare apples and oranges?

 Continue reading this thread -- I also tried using an index in Postgresql.

And oh yeah, what was shared_buffers set to?  I'm not sure we ever got
an answer to that.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Alex Hunsaker
On Mon, Feb 22, 2010 at 11:10, Yang Zhang yanghates...@gmail.com wrote:
 I have the exact same table of data in both MySQL and Postgresql. In ? 
 Postgresql:

FWIW on a stock (unchanged postgresql.conf) 8.3.9 I get (best of 3
runs) 79 seconds, 26 using an index and 27 seconds with it clustered.
Now yes it goes a lot faster because im skipping all the overhead of
sending the data to the client...  But still that means it has almost
nothing with the sort or indexs.  Leaves pretty much your cpu, disk,
filesystem and network...  BTW the first time I did it it had to write
out the hint bits so that took (a bit) longer... Is this freshly
loaded data?

---

# create table metarelcould_transactionlog (
id serial primary key,
transactionid integer not null,
queryid smallint not null,
tableid varchar(30) not null,
tupleid integer not null,
querytype varchar not null,
graphpartition smallint,
replicatedpartition smallint,
justifiedpartition smallint,
hashpartition smallint,
modeid integer,
manualpartition smallint
);

# insert into metarelcould_transactionlog (transactionid, queryid,
tableid, tupleid, querytype, graphpartition, replicatedpartition,
justifiedpartition, hashpartition, modeid, manualpartition) select
foo,  1, 'sometable', 1, 's', 1, 1, 1, 1, 1, 1 from generate_series(1,
5000) as foo;

# select count(1) from (SELECT * from metarelcould_transactionlog
order by transactionid) as foo;
  count
--
 5000
(1 row)

Time: 79017.186 ms

# create index idx on metarelcould_transactionlog (transactionid);
# select count(1) from (SELECT * from metarelcould_transactionlog
order by transactionid) as foo;
  count
--
 5000
(1 row)

Time: 26230.534 ms

# cluster metarelcould_transactionlog USING  idx;
CLUSTER
Time: 342381.535 ms

# select count(1) from (SELECT * from metarelcould_transactionlog
order by transactionid) as foo;
  count
--
 5000
(1 row)

Time: 27704.794 ms

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
nOn Mon, Feb 22, 2010 at 3:45 PM, Scott Marlowe
scott.marl...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 12:53 PM, Yang Zhang yanghates...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 2:52 PM, Scott Marlowe scott.marl...@gmail.com 
 wrote:
 On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang yanghates...@gmail.com wrote:
 I have the exact same table of data in both MySQL and Postgresql. In 
 Postgresql:

 Just wondering, are these on the same exact machine?


 Yes, on the same disk.

 I'm wondering how much of this could be caching effects.  Is the MySQL
 database warmed up before you started, and the pgsql database is
 cold and no caching has taken place?

 What do things like vmstat 10 say while the query is running on each
 db?  First time, second time, things like that.

Awesome -- this actually led me to discover the problem.

When running the query in MySQL InnoDB:

$ vmstat 10
procs ---memory-- ---swap-- -io --system--
-cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 0 13 13733604  83020   5648 219388433   936   16821
4  2 89  5  0
 1 12 13749952  80164   5600 21780320 4354   908  4379 3586 2638
0  1 38 60  0
 0 19 13762228  80576   5556 2145220  208 3527  1280  3690 3668 2635
1  1 39 59  0
 0 19 13778632  79420   5560 2135228   52 4186  1046  4191 3682 2418
0  1 37 62  0
 0 19 13792964  77336   5592 2082520   41 3731  1698  3804 4102 2686
1  1 53 45  0
 0 14 13810356  84036   5556 2049836   36 4241   797  4246 3913 2603
0  1 68 31  0
 1 14 13825640  81336   5520 20019200 4212   958  4220 3848 2736
1  1 73 25  0
 0 17 13844952  78036   5476 19769568 4685   923  4689 3832 2547
0  1 69 29  0
 2 13 13863828  79812   5448 19549523 4627   692  4634 3744 2505
0  1 70 28  0
 0 15 13883828  77764   5440 1920528  249 4544   972  4548 4345 2506
0  1 70 28  0
 1 20 13898900  79132   5456 1890192   28 4341   723  4438 4982 3030
0  3 64 33  0
 0 11 13915252  85184   5624 1865260   79 3668   752  3764 4472 2765
0  3 57 40  0
 0 12 13933964  78448   5700 1832640  120 4327  1066  4434 4484 2777
1  3 52 45  0
 0 19 13951748  77640   5816 1795720   94 4005  1159  4091 4580 2762
1  3 48 49  0
 0 16 13972748  79884   5780 17536760 4737   787  4746 4385 2766
1  3 51 45  0
 0 25 13988108  78936   5884 1726068  547 3954  1468  4116 4976 3502
0  4 44 52  0
 1 20 14011500  77676   5868 1689136  161 4980   843  5506 5218 3131
0  3 34 62  0
 0 22 14032472  81348   5816 1647884  270 4198   943  4369 4521 2826
1  3 40 56  0
 0 23 14055220  81712   5804 1626872  193 4774  1408  4856 4617 2754
1  3 38 58  0
 0 21 14075848  81844   5696 15768360 4738   974  4742 4528 2704
1  3 40 56  0
 0 25 14097260  79788   5628 1536336  213 4512   922  4639 4726 2929
1  3 27 69  0
 0 24 14123900  80820   5616 1488460  319 5033  1059  5128 4895 2780
2  3 17 78  0
 1 26 14142632  77276   5660 1445592  445 4605  1434  4727 5401 3364
1  3 16 79  0
 0 31 14165668  83736   5976 1387048  362 4288  1444  4428 4739 2963
2  3 17 78  0
 1 28 14180104  77564   6324 1369232  387 4526    4677 5748 3559
1  3 16 80  0

I'm guessing the swap numbers are because MySQL uses mmap?

Anyway, when running the query in Postgresql:

$ vmstat 10
procs ---memory-- ---swap-- -io --system--
-cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 1  2 13866336 1574540  25024 787898033   936   16821
4  2 89  5  0
 1  3 13861520 1163596  25328 8128360 10460 25429   433 4368 4175
4  2 80 14  0
 0  3 13856484 803024  25600 8343220 11170 22062   688 4492 4590
4  2 73 20  0
 0  6 13855304 678868  26052 8435540  1600  9239   598 5195 7141
1  5 70 24  0
 0  6 13853644 513568  26332 8563680  4010 12480  7100 4775 4248
3  3 68 26  0
 2  2 13851804 166228  26624 8775304  6340 21466  1497 4680 4550
6  2 64 28  0
 0  5 13861556  81896  26740 8825360  860 3547  6100  3847 5142 3386
6  2 57 35  0
 0  6 13867688  91368  26808 8832712  653 3326  1835  3604 4738 2762
5  2 61 32  0
 0  5 13870676  88524  26872 8849392  638 3272  2578  3517 4864 2909
4  2 55 39  0
 0  5 13872748  79512  27004 8864456  629 1788  2086  2949 4337 2921
1  3 55 41  0
 0  7 13876760  83124  27136 8867272 1018 2253  1713  2409 4321 2889
0  3 63 33  0
 0  6 13878964  82876  27240 8874540  792 2119  1854  2314 4288 2813
2  2 72 24  0
 3  4 13883204  81224  27280 8887068  661 3067  2995  3385 4558 2899
4  2 72 22  0
 0  6 13886636  82036  27352 8905628  594 3726  2628  4013 4744 2765
4  2 69 25  0
 0  8 13899852  85604  27400 8925800  638 4423  2689  4658 4903 2808
4  2 55 40  0
 1  4 13905184  80544  27484 8940040  676 3501  3006  3799 4805 2932
4  2 66 28  0
 0  9 13908480  80100  27516 8948476  668 2996  1720  3192 4594 2799
4  2 60 35  0

vmstat showed no swapping-out for a while, and then suddenly it
started spilling a lot. Checking psql's memory stats showed that it
was huge -- apparently, it's trying to 

Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 9:30 PM, Alex Hunsaker bada...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 11:10, Yang Zhang yanghates...@gmail.com wrote:
 I have the exact same table of data in both MySQL and Postgresql. In ? 
 Postgresql:

 FWIW on a stock (unchanged postgresql.conf) 8.3.9 I get (best of 3
 runs) 79 seconds, 26 using an index and 27 seconds with it clustered.
 Now yes it goes a lot faster because im skipping all the overhead of
 sending the data to the client...

Last sentence also contributed to my realizing the problem (the client
I was using was psql), but there's one oddity

 # select count(1) from (SELECT * from metarelcould_transactionlog
 order by transactionid) as foo;

Does it strike anyone else that the query optimizer/rewriter should be
able to toss out the sort from such a query altogether?
--
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Tom Lane
Yang Zhang yanghates...@gmail.com writes:
 # select count(1) from (SELECT * from metarelcould_transactionlog
 order by transactionid) as foo;

 Does it strike anyone else that the query optimizer/rewriter should be
 able to toss out the sort from such a query altogether?

It could, if it knew that the aggregate function didn't care about input
row order.  We don't have that knowledge about aggregates ATM.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Tom Lane
Yang Zhang yanghates...@gmail.com writes:
 I'm relieved that Postgresql itself does not, in fact, suck, but
 slightly disappointed in the behavior of psql. I suppose it needs to
 buffer everything in memory to properly format its tabular output,
 among other possible reasons I could imagine.

That's half of it, and the other half is not wanting to present a
portion of query output if the query fails partway through.  You could
certainly write a client that disregarded these issues (as I suppose
mysql must be doing).

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Scott Marlowe
On Mon, Feb 22, 2010 at 10:51 PM, Yang Zhang yanghates...@gmail.com wrote:
 nOn Mon, Feb 22, 2010 at 3:45 PM, Scott Marlowe
 scott.marl...@gmail.com wrote:

 What do things like vmstat 10 say while the query is running on each
 db?  First time, second time, things like that.

 Awesome -- this actually led me to discover the problem.

 vmstat showed no swapping-out for a while, and then suddenly it
 started spilling a lot. Checking psql's memory stats showed that it
 was huge -- apparently, it's trying to store its full result set in
 memory. As soon as I added a LIMIT 1, everything worked
 beautifully and finished in 4m (I verified that the planner was still
 issuing a Sort).

 I'm relieved that Postgresql itself does not, in fact, suck, but
 slightly disappointed in the behavior of psql. I suppose it needs to
 buffer everything in memory to properly format its tabular output,
 among other possible reasons I could imagine.

It's best when working with big sets to do so with a cursor and fetch
a few thousand rows at a time.  It's how we handle really big sets at
work and it works like a charm in keeping the client from bogging down
with a huge memory footprint.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Alex Hunsaker
On Mon, Feb 22, 2010 at 22:51, Yang Zhang yanghates...@gmail.com wrote:
 vmstat showed no swapping-out for a while, and then suddenly it
 started spilling a lot. Checking psql's memory stats showed that it
 was huge -- apparently, it's trying to store its full result set in
 memory. As soon as I added a LIMIT 1, everything worked
 beautifully and finished in 4m (I verified that the planner was still
 issuing a Sort).

Well im half surprised no one has recommend using a cursor.  Have you
looked in to that?  I bet that would fix most of your problems here.

 I'm relieved that Postgresql itself does not, in fact, suck, but
 slightly disappointed in the behavior of psql. I suppose it needs to
 buffer everything in memory to properly format its tabular output,
 among other possible reasons I could imagine.

Well AFAIK it will dump everything you asked for.  So if you said
select * from 1G table;  It should take at least 1G and potentially
quite a bit more formatting and overhead.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Tue, Feb 23, 2010 at 1:48 AM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 10:51 PM, Yang Zhang yanghates...@gmail.com wrote:
 nOn Mon, Feb 22, 2010 at 3:45 PM, Scott Marlowe
 scott.marl...@gmail.com wrote:

 What do things like vmstat 10 say while the query is running on each
 db?  First time, second time, things like that.

 Awesome -- this actually led me to discover the problem.

 vmstat showed no swapping-out for a while, and then suddenly it
 started spilling a lot. Checking psql's memory stats showed that it
 was huge -- apparently, it's trying to store its full result set in
 memory. As soon as I added a LIMIT 1, everything worked
 beautifully and finished in 4m (I verified that the planner was still
 issuing a Sort).

 I'm relieved that Postgresql itself does not, in fact, suck, but
 slightly disappointed in the behavior of psql. I suppose it needs to
 buffer everything in memory to properly format its tabular output,
 among other possible reasons I could imagine.

 It's best when working with big sets to do so with a cursor and fetch
 a few thousand rows at a time.  It's how we handle really big sets at
 work and it works like a charm in keeping the client from bogging down
 with a huge memory footprint.


Thing is, this is how I got here:

- ran complex query that does SELECT INTO.
- that never terminated, so killed it and tried a simpler SELECT (the
subject of this thread) from psql to see how long that would take.

I.e., my original application doesn't receive the entire dataset.
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Sorting JTA survey results

2008-11-20 Thread wstrzalka
http://www.postgresqlcertification.org/jta/2008/results

Having point 4 as an example:

For how long have you been a PostgreSQL database administrator?
Less than 1 year36
I wish. 15
3 to 5 years42
1 to 3 years78
Over 10 years 7
5 to 7 years33
7 to 10 years   12
--

I think many people would appreciate if this list could be sorted by
the
a) topic - in this case administrative experience
or
b) percentage

Currently it's really hard to interpret the results. Please put an
ORDER BY there :)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Sorting nulls and empty strings together

2008-04-29 Thread Andrus
User interface need to show nulls as empty strings.
PostgreSQL sorts nulls after all data.

create temp table test ( testcol char(10) );
insert into test values ( null);
insert into test values ( 'test');
insert into test values ( '');
select * from test order by testcol;

This confuses users who expect that all empty columns are together in sorted
data.

Select statements are generated dynamically by driver and it is not easy
to change them to generate order by coalesce( testcol,'').
If there is no other way I can change driver to generate coalesce( 
testcol,'') as order by expressions.
However I'm afraid that those order by expression cannot use regular index 
like

create index test_inx on test(testcol)

in it thus too slow for large data.

How to force PostgreSQL to sort data so that nulls and empty strings appear
together ?


Andrus. 



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting nulls and empty strings together

2008-04-29 Thread Martijn van Oosterhout
On Mon, Apr 28, 2008 at 08:05:45PM +0300, Andrus wrote:
 User interface need to show nulls as empty strings.
 PostgreSQL sorts nulls after all data.
 
 create temp table test ( testcol char(10) );
 insert into test values ( null);
 insert into test values ( 'test');
 insert into test values ( '');
 select * from test order by testcol;
 
 This confuses users who expect that all empty columns are together in sorted
 data.

I'd say users are being confused by the assumption the nulls and empty
strings are the same when they clearly aren't. Perhaps you should think
which of the two you actually want to mean empty and then get rid of
the other possibility.

 If there is no other way I can change driver to generate coalesce( 
 testcol,'') as order by expressions.
 However I'm afraid that those order by expression cannot use regular index 
 like
 
 create index test_inx on test(testcol)

You could do: create index test_inx on test(coalesce(testcol,''))

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [GENERAL] Sorting nulls and empty strings together

2008-04-29 Thread Gregory Stark
Andrus [EMAIL PROTECTED] writes:

 User interface need to show nulls as empty strings.
 PostgreSQL sorts nulls after all data.
...
 Select statements are generated dynamically by driver and it is not easy
 to change them to generate order by coalesce( testcol,'').

You could use NULLS FIRST (assuming your collation has '' sorted at the
beginning which I think is normally true). But you would have to switch it to
NULLS LAST if you sort descending...

 If there is no other way I can change driver to generate coalesce( 
 testcol,'') as order by expressions.

 However I'm afraid that those order by expression cannot use regular index 
 like

 create index test_inx on test(testcol)

create index test_inx on test(coalesce(testcol,''))

But I bet you'll have trouble using an index at all for the order by. You'll
either be searching on other columns which would have to be leading columns of
every index or you'll be reading the whole table anyways and postgres will
prefer to sort since it's faster.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting nulls and empty strings together

2008-04-28 Thread Dennis Muhlestein

Andrus wrote:

User interface need to show nulls as empty strings.
PostgreSQL sorts nulls after all data.

create temp table test ( testcol char(10) );
insert into test values ( null);
insert into test values ( 'test');
insert into test values ( '');
select * from test order by testcol;

This confuses users who expect that all empty columns are together in sorted
data.

Select statements are generated dynamically by driver and it is not easy
to change them to generate order by coalesce( testcol,'').
If there is no other way I can change driver to generate coalesce( 
testcol,'') as order by expressions.
However I'm afraid that those order by expression cannot use regular index 
like


create index test_inx on test(testcol)

in it thus too slow for large data.

How to force PostgreSQL to sort data so that nulls and empty strings appear
together ?



Well, you could use a case statement to change empty strings to NULL in 
your select:


select case when testcol='' then NULL else testcol end as testcol from 
test order by testcol;


There may be a better way, like a rule or something, but this seemed 
like a quick easy thing to do.


-Dennis

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Sorting with DISTINCT ON

2007-01-08 Thread Nico Grubert

Hi there,

I have a problem sorting a SQL result if I use DISTINCT ON.

I have a table tblcomment with these columns:
  id (serial)
  path (varchar)
  created (timestamp)
  title (varchar)

These records are in the table tblcomment:

id  pathcreated title

11  /var/black  2007-01-07 22:17:03.001837  Any title
17  /var/blue   2007-01-07 20:35:55.289713  Any title
13  /var/blue   2007-01-15 15:52:58.438375  Any title
12  /var/green  2007-01-08 19:03:50.470073  Any title
18  /var/red2007-01-07 08:41:47.152676  Any title

Now, I want to get all results from this table and if there are 
duplicates, I want the row whose created column has the latest date.

In this example, I want to have this result:

id  pathcreated title

11  /var/black  2007-01-07 22:17:03.001837  Any title
13  /var/blue   2007-01-15 15:52:58.438375  Any title
12  /var/green  2007-01-08 19:03:50.470073  Any title
18  /var/red2007-01-07 08:41:47.152676  Any title


My first try was this SQL query:

  SELECT DISTINCT ON (path) path, comment_id, created, title
FROM bewertungen.tblcomments

This does not allow me to append ORDER BY created since I can only 
sort on path because of  DISTINCT ON (path).


My second try was a sub query like this:

SELECT comment_id, path, created, title
 FROM
  ( SELECT DISTINCT ON (path) path, comment_id, created, title
FROM bewertungen.tblcomments
  ) foo_alias
ORDER BY created DESC

But this results into:

id  pathcreated title

11  /var/black  2007-01-07 22:17:03.001837  Any title
17  /var/blue   2007-01-07 20:35:55.289713  Any title
12  /var/green  2007-01-08 19:03:50.470073  Any title
18  /var/red2007-01-07 08:41:47.152676  Any title

No matter, if I user  ORDER BY created DESC or
ORDER BY created ASC. It seems that postgres always takes the first row 
of the duplicates. In this example:

  17  /var/blue   2007-01-07 20:35:55.289713  Any title.


Any idea, how I can solve my problem?


Regards,
   Nico

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Sorting with DISTINCT ON

2007-01-08 Thread Jorge Godoy
Nico Grubert [EMAIL PROTECTED] writes:

 My first try was this SQL query:

   SELECT DISTINCT ON (path) path, comment_id, created, title
 FROM bewertungen.tblcomments

 This does not allow me to append ORDER BY created since I can only sort on
 path because of  DISTINCT ON (path).

It does allow you to sort on both columns.  

   SELECT DISTINCT ON (path) path, comment_id, created, title
 FROM bewertungen.tblcomments ORDER BY path, created

Maybe you need a more complex query to get what you want (and use
max(created)...). 

-- 
Jorge Godoy  [EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Sorting with DISTINCT ON

2007-01-08 Thread Nico Grubert


It does allow you to sort on both columns.  


   SELECT DISTINCT ON (path) path, comment_id, created, title
 FROM bewertungen.tblcomments ORDER BY path, created


Thank you very much. Works perfect! :-)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[GENERAL] Sorting

2007-01-08 Thread Bart McFarling
I have a column that is a varchar(6) I need to sort it by the rows that
are integers 1st then the character ones or vice versa, I just need the
values that can be converted to integer to sort by their numeric value.
 
i.e
1, 2, 3, 4, 5, 10, 11, A, B, C
instead of
1, 10, 11, 2, 3, 4, 5, A, B, C
 
Any suggestions?


Re: [GENERAL] Sorting

2007-01-08 Thread A. Kretschmer
am  Mon, dem 08.01.2007, um 10:21:38 -0600 mailte Bart McFarling folgendes:
 I have a column that is a varchar(6) I need to sort it by the rows that are
 integers 1st then the character ones or vice versa, I just need the values 
 that
 can be converted to integer to sort by their numeric value.
  
 i.e
 1, 2, 3, 4, 5, 10, 11, A, B, C
 instead of
 1, 10, 11, 2, 3, 4, 5, A, B, C
  
 Any suggestions?

perhaps something like this:

test=*# select * from foo;
 w

 10
 1
 A
 3
 C
(5 rows)

Time: 1.349 ms
test=*# select w, case when w ~ '^[0-9]*$' then w::int else 1 end from foo 
order by 2,1;
 w  | case
+---
 1  | 1
 3  | 3
 10 |10
 A  | 1
 C  | 1
(5 rows)


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Sorting

2007-01-08 Thread Ragnar
On mán, 2007-01-08 at 17:59 +0100, A. Kretschmer wrote:
 am  Mon, dem 08.01.2007, um 10:21:38 -0600 mailte Bart McFarling folgendes:
  I have a column that is a varchar(6) I need to sort it by the rows that are
  integers 1st then the character ones or vice versa, I just need the values 
  that
  can be converted to integer to sort by their numeric value.
   
  i.e
  1, 2, 3, 4, 5, 10, 11, A, B, C
  instead of
  1, 10, 11, 2, 3, 4, 5, A, B, C
   
  Any suggestions?
 
 perhaps something like this:
 
 test=*# select * from foo;
  w
 
  10
  1
  A
  3
  C
 (5 rows)
 
 Time: 1.349 ms
 test=*# select w, case when w ~ '^[0-9]*$' then w::int else 1 end from 
 foo order by 2,1;

possible improvements:
  a) w ~ '^[0-9]+$'
  b) use NULL instead of 1


gnari



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Sorting

2007-01-08 Thread Andreas Kretschmer
Ragnar [EMAIL PROTECTED] schrieb:
  test=*# select w, case when w ~ '^[0-9]*$' then w::int else 1 end from 
  foo order by 2,1;
 
 possible improvements:
   a) w ~ '^[0-9]+$'
   b) use NULL instead of 1

Thanks, right.


Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.  (Linus Torvalds)
If I was god, I would recompile penguin with --enable-fly.(unknow)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Sorting

2007-01-08 Thread Bart McFarling
Thanks, 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andreas 
Kretschmer
Sent: Monday, January 08, 2007 11:19 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Sorting

Ragnar [EMAIL PROTECTED] schrieb:
  test=*# select w, case when w ~ '^[0-9]*$' then w::int else 1 
  end from foo order by 2,1;
 
 possible improvements:
   a) w ~ '^[0-9]+$'
   b) use NULL instead of 1

Thanks, right.


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.  (Linus Torvalds)
If I was god, I would recompile penguin with --enable-fly.(unknow)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°

---(end of broadcast)---
TIP 6: explain analyze is your friend

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[GENERAL] Sorting array field

2005-12-22 Thread Pete Deffendol
Hi,

Can anyone point me toward an SQL function (whether built-in or an
add-on) that will allow me to sort the contents of an array datatype in
an SQL query?

Something like this:

select sort(my_array_field) from my_table;

Thanks!

Pete


Re: [GENERAL] Sorting array field

2005-12-22 Thread Michael Fuhr
On Thu, Dec 22, 2005 at 08:38:46AM -0700, Pete Deffendol wrote:
 Can anyone point me toward an SQL function (whether built-in or an add-on)
 that will allow me to sort the contents of an array datatype in an SQL
 query?

For integer arrays see contrib/intarray.

SELECT sort('{5,2,3,1,9,7}'::int[]);
 sort  
---
 {1,2,3,5,7,9}
(1 row)

I don't recall if any of the contrib modules can sort arrays of
other types; if not then look for something at a site like pgfoundry
or GBorg.  If you have PL/Ruby then it couldn't get much easier:

CREATE FUNCTION sort(arg text[]) RETURNS text[] AS $$
arg.sort
$$ LANGUAGE plruby IMMUTABLE STRICT;

SELECT sort('{zz,xx yy,cc,aa,bb}'::text[]);
 sort  
---
 {aa,bb,cc,xx yy,zz}
(1 row)

Another way would be to write a set-returning function that returns
each item in the array as a separate row, and another function that
uses an array constructor to put the rows back together in order
(this example should work in 7.4 and later):

CREATE FUNCTION array2rows(anyarray) RETURNS SETOF anyelement AS '
BEGIN
FOR i IN array_lower($1, 1) .. array_upper($1, 1) LOOP
RETURN NEXT $1[i];
END LOOP;
RETURN;
END;
' LANGUAGE plpgsql IMMUTABLE STRICT;

CREATE FUNCTION sort(anyarray) RETURNS anyarray AS '
SELECT array(SELECT * FROM array2rows($1) ORDER BY 1)
' LANGUAGE sql IMMUTABLE STRICT;

SELECT data, sort(data) FROM foo;
 data  | sort  
---+---
 {dd,cc,bb,aa} | {aa,bb,cc,dd}
 {zz,xx yy,cc,aa,bb} | {aa,bb,cc,xx yy,zz}
(2 rows)

I'm not sure if there are easier ways; these are what first came
to mind.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Sorting array field

2005-12-22 Thread David Fetter
On Thu, Dec 22, 2005 at 08:38:46AM -0700, Pete Deffendol wrote:
 Hi,
 
 Can anyone point me toward an SQL function (whether built-in or an
 add-on) that will allow me to sort the contents of an array datatype
 in an SQL query?
 
 Something like this:
 
 select sort(my_array_field) from my_table;

Here's one way using only SQL.  I do not make any guarantees about its
performance, though ;)

CREATE TABLE my_table (my_array text[]);
INSERT INTO my_table VALUES('{r,e,d,q}');
INSERT INTO my_table VALUES('{c,b,a}');
INSERT INTO my_table VALUES('{one,two,three,four}');

SELECT
ARRAY(
SELECT t.my_array[s.i]
FROM generate_series(
array_lower(my_array,1), /* usually 1 */
array_upper(my_array,1)
) AS s(i)
ORDER BY t.my_array[s.i]
) AS sorted_array
FROM my_table t
ORDER BY sorted_array DESC;

HTH :)

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778

Remember to vote!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[GENERAL] Re: [GENERAL] Sorting problems with SELECT * FROM table WHERE name LIKE 'Ö %'

2005-10-31 Thread Nico Grubert



I have a problem when sorting records with:
SELECT * FROM table WHERE name LIKE 'Ö%'

I am running Postgres 8.02 with a database whose character encoding is 
UNICODE.


The SQL Query

  SELECT *
FROM member
WHERE name LIKE 'O%'
  OR
  name like 'Ö%'
ORDER BY name


returns this:
 Öhlmann
 Öhmann
 Obenaus
 Ochoa
 O'Donovan
 Oehme
 Oklant
 Oltub
 Oltüch
 Oltutz
 Oltüwer

According to german sorting rules the result is fine except the both 
first entries Öhlmann and Öhmann.

Why do appear these records at the beginning of the list?
The proper result should read like this:
 Obenaus
 Ochoa
 O'Donovan
 Oehme
 Öhlmann
 Öhmann
 Oklant
 Oltub
 Oltüch
 Oltutz
 Oltüwer



The same problem accours when using E where my result is this:
  Élie de Beaumont
  Eberer
  Ecü
  Edding
  Emmer

The proper result should be:
  Eberer
  Ecü
  Edding
  Élie de Beaumont
  Emmer


Any idea how I can solve this problem?


Thank you very much in advance,
Nico


To complete the missing information, here are the variables set for the 
databases:

add_missing_fromon
archive_command unset
australian_timezonesoff
authentication_timeout  60
bgwriter_delay  200
bgwriter_maxpages   100
bgwriter_percent1
block_size  8192
check_function_bodies   on
checkpoint_segments 3
checkpoint_timeout  300
checkpoint_warning  30
client_encoding UNICODE
client_min_messages notice
commit_delay0
commit_siblings 5
cpu_index_tuple_cost0.001
cpu_operator_cost   0.0025
cpu_tuple_cost  0.01
custom_variable_classes unset
DateStyle   ISO, MDY
db_user_namespace   off
deadlock_timeout1000
debug_pretty_print  off
debug_print_parse   off
debug_print_planoff
debug_print_rewritten   off
debug_shared_buffers0
default_statistics_target   10
default_tablespace  unset
default_transaction_isolation   read committed
default_transaction_read_only   off
default_with_oids   on
effective_cache_size1000
enable_hashagg  on
enable_hashjoin on
enable_indexscanon
enable_mergejoinon
enable_nestloop on
enable_seqscan  on
enable_sort on
enable_tidscan  on
explain_pretty_printon
extra_float_digits  0
from_collapse_limit 8
fsync   on
geqoon
geqo_effort 5
geqo_generations0
geqo_pool_size  0
geqo_selection_bias 2
geqo_threshold  12
integer_datetimes   on
join_collapse_limit 8
lc_collate  [EMAIL PROTECTED]
lc_ctype[EMAIL PROTECTED]
lc_messages [EMAIL PROTECTED]
lc_monetary [EMAIL PROTECTED]
lc_numeric  [EMAIL PROTECTED]
lc_time [EMAIL PROTECTED]
listen_addresseslocalhost
log_connections off
log_destination stderr
log_disconnections  off
log_durationoff
log_error_verbosity default
log_executor_stats  off
log_hostnameoff
log_line_prefix unset
log_min_duration_statement  -1
log_min_error_statement panic
log_min_messagesnotice
log_parser_statsoff
log_planner_stats   off
log_rotation_age1440
log_rotation_size   10240
log_statement   none
log_statement_stats off
log_truncate_on_rotationoff
maintenance_work_mem16384
max_connections 100
max_files_per_process   1000
max_fsm_pages   2
max_fsm_relations   1000
max_function_args   32
max_identifier_length   63
max_index_keys  32
max_locks_per_transaction   64
max_stack_depth 2048
password_encryption on
port5432
pre_auth_delay  0
random_page_cost4
redirect_stderr off
regex_flavoradvanced
rendezvous_name unset
search_path $user,public
server_encoding UNICODE
server_version  8.0.2
shared_buffers  1000
silent_mode off
sql_inheritance on
ssl off
statement_timeout   0
stats_block_level   off
stats_command_stringoff
stats_reset_on_server_start on
stats_row_level off
stats_start_collector   on
superuser_reserved_connections  2
syslog_facility LOCAL0
syslog_identpostgres
TimeZoneEurope/Berlin
trace_notifyoff
transaction_isolation   read committed
transaction_read_only   off
transform_null_equals   off
unix_socket_group   unset
unix_socket_permissions 511
vacuum_cost_delay   0
vacuum_cost_limit   200
vacuum_cost_page_dirty  20
vacuum_cost_page_hit1
vacuum_cost_page_miss   10
wal_buffers 8
wal_sync_method fdatasync
work_mem1024
zero_damaged_pages  off


Is there any explaination why the result is not sorted properly?

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Re: [GENERAL] Sorting problems with SELECT * FROM

2005-10-31 Thread Daniel Verite
Nico Grubert wrote:

 Ah, I found it:
 
 lc_collate:   [EMAIL PROTECTED]
 lc_ctype: [EMAIL PROTECTED]

This is an iso-8859-15 locale, isn't it?
If your database encoding is UNICODE, I believe you'd have more success
using an  UTF8 locale, such as de_DE.UTF-8 in your case.

-- 
 Daniel
 PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[GENERAL] Sorting problems with SELECT * FROM t able WHERE name LIKE 'Ö%'

2005-10-30 Thread Nico Grubert

Hi there,

I have a problem when sorting records with:
SELECT * FROM table WHERE name LIKE 'Ö%'

I am running Postgres 8.02 with a database whose character encoding is 
UNICODE.


The SQL Query

  SELECT *
FROM member
WHERE name LIKE 'O%'
  OR
  name like 'Ö%'
ORDER BY name


returns this:
 Öhlmann
 Öhmann
 Obenaus
 Ochoa
 O'Donovan
 Oehme
 Oklant
 Oltub
 Oltüch
 Oltutz
 Oltüwer

According to german sorting rules the result is fine except the both 
first entries Öhlmann and Öhmann.

Why do appear these records at the beginning of the list?
The proper result should read like this:
 Obenaus
 Ochoa
 O'Donovan
 Oehme
 Öhlmann
 Öhmann
 Oklant
 Oltub
 Oltüch
 Oltutz
 Oltüwer



The same problem accours when using E where my result is this:
  Élie de Beaumont
  Eberer
  Ecü
  Edding
  Emmer

The proper result should be:
  Eberer
  Ecü
  Edding
  Élie de Beaumont
  Emmer


Any idea how I can solve this problem?


Thank you very much in advance,
Nico

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Sorting problems with SELECT * FROM table WHERE name LIKE 'Ö%'

2005-10-30 Thread Tom Lane
Nico Grubert [EMAIL PROTECTED] writes:
 I have a problem when sorting records with:
 SELECT * FROM table WHERE name LIKE 'Ö%'

 I am running Postgres 8.02 with a database whose character encoding is 
 UNICODE.

... but what locale is it using?  (See LC_COLLATE and LC_CTYPE.)

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[GENERAL] Re: [GENERAL] Sorting problems with SELECT * FROM table WHERE name LIKE 'Ö %'

2005-10-30 Thread Nico Grubert

Ah, I found it:

lc_collate: [EMAIL PROTECTED]
lc_ctype:   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Sorting by related tables

2005-08-15 Thread Bill Moseley
On Sat, Aug 13, 2005 at 06:44:09PM +0200, Andreas Seltenreich wrote:
  3) Oh, and I have also this for checking IF there are items in
  region that are above the item in question -- to see IF an item
  can or cannot be moved up in the sort order relative to others.
 
  SELECT id FROM __TABLE__
  WHERE
  sort_order = (SELECT sort_order FROM __TABLE__ WHERE id = ?)
   AND id != ?;
 
  If that returns any rows then I know I can call the UPDATE to move the
  item up.
 
 I guess you want a boolean value here? SELECT EXISTS around your above
 query as a subselect should do the trick. You also want to use LIMIT 1
 in the statement, to avoid fetching unnecessary records.

Is there much of a difference between using LIMIT 1 and using an
EXISTS subselect?  Frankly, I'm not clear what you are specifically
suggestion with EXISTS.  I'm using Perl's Class::DBI object mapping module so
returning a single row is an easy way to check this as a boolean
result in Perl.

  Again, a very basic question: What method should be used to be sure
  that nothing changes between the SELECT and the UPDATE?
 
 You can achieve that using transactions. Concurrency control is
 explained here: http://www.postgresql.org/docs/8.0/static/mvcc.html.

My comment was that I want to do the above SELECT and then *only* do
an UPDATE if the SELECT returns at least one row.

So, I should do:

  SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

Before the SELECT.   And when I UPDATE I need to be prepared to do a
ROLLBACK if I get an error and repeat the process.  (And, I assume,
take some precaution to give up after some number of tries.)

Does that seem reasonable?

-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Sorting by related tables

2005-08-15 Thread Andreas Seltenreich
Bill Moseley schrob:

 On Sat, Aug 13, 2005 at 06:44:09PM +0200, Andreas Seltenreich wrote:
  3) Oh, and I have also this for checking IF there are items in
  region that are above the item in question -- to see IF an item
  can or cannot be moved up in the sort order relative to others.
 
  SELECT id FROM __TABLE__
  WHERE
  sort_order = (SELECT sort_order FROM __TABLE__ WHERE id = ?)
   AND id != ?;
 
  If that returns any rows then I know I can call the UPDATE to move the
  item up.
 
 I guess you want a boolean value here? SELECT EXISTS around your above
 query as a subselect should do the trick. You also want to use LIMIT 1
 in the statement, to avoid fetching unnecessary records.

 Is there much of a difference between using LIMIT 1 and using an
 EXISTS subselect?

LIMIT 1 does reduce the cost, EXISTS AFAIK only makes the result
boolean and doesn't stop the execution of the subselect by itself when
the first record is found.

 Frankly, I'm not clear what you are specifically
 suggestion with EXISTS.  I'm using Perl's Class::DBI object mapping module so
 returning a single row is an easy way to check this as a boolean
 result in Perl.

Uups, this wasn't question number three yet, and I wrongly inferred
from your uppercase-ifs that you wanted a boolean result here :-/

  Again, a very basic question: What method should be used to be sure
  that nothing changes between the SELECT and the UPDATE?
 
 You can achieve that using transactions. Concurrency control is
 explained here: http://www.postgresql.org/docs/8.0/static/mvcc.html.

 My comment was that I want to do the above SELECT and then *only* do
 an UPDATE if the SELECT returns at least one row.

 So, I should do:

   SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

 Before the SELECT.   And when I UPDATE I need to be prepared to do a
 ROLLBACK if I get an error and repeat the process.  (And, I assume,
 take some precaution to give up after some number of tries.)

 Does that seem reasonable?

This would be one possibility. If you don't want your application to
deal with transactions being aborted because of non-serializable
transactions, you could alternatively use explicit locking (SELECT ...
FOR UPDATE) combined with the Read Committed isolation level (the
default).

Explicit locking is documented here:
http://www.postgresql.org/docs/8.0/static/explicit-locking.html#LOCKING-ROWS

regards
Andreas
-- 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Sorting by related tables

2005-08-15 Thread Bill Moseley
On Mon, Aug 15, 2005 at 11:30:32PM +0200, Andreas Seltenreich wrote:
 
 This would be one possibility. If you don't want your application to
 deal with transactions being aborted because of non-serializable
 transactions, you could alternatively use explicit locking (SELECT ...
 FOR UPDATE) combined with the Read Committed isolation level (the
 default).

SELECT FOR UPDATE locks just the rows that are selected, right?  If I
understand correctly, that would not work for my case because I'm
updating different rows than I'm selecting.

My tables are small, so I'm thinking of just manually updating all the
rows in sequence to adjust the order when needed -- to make things a
bit more simple.  But it is a problem that I am curious about how best
to solve in a scalable way.

Thanks very much for your feedback.

-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[GENERAL] Sorting by related tables

2005-08-13 Thread Bill Moseley
I have a few beginner questions about using related tables for
sorting.

create table region {
id  SERIAL PRIMARY KEY,
nametext,
-- order this table should be sorted in
-- a 1 is the top sort level
sort_order  integer
);

create table city {
id  SERIAL PRIMARY KEY,
nametext,
region  integer REFERENCES region
);

I want a way to adjust the sort order of the region table (move
item up or move item down in a web interface) without requiring
knowledge of the existing sort_order for the rows in the region
table.  (i.e.  requiring them to already be in an order).

Here's my move up (which is a lower sort_order value) statement.
(__TABLE__ is region and ? are $\d bind parameters)

UPDATE __TABLE__
SET sort_order =
CASE
-- subtract one from the item's sort, unless it's already 1
WHEN id = ? AND sort_order  1 THEN sort_order-1

-- for other items that are greater or equal to sort-1
WHEN id != ? AND sort_order = (select sort_order from __TABLE__ 
where id = ?)-1
THEN sort_order+1

-- all others, leave alone
ELSE sort_order
END;

This works reasonably well for small tables, but doesn't scale and the
logic likely has holes.  And behavior when adding new rows to the
region table is not defined.

1) How do most people do this?  Use linked lists?

create table region {
id  SERIAL PRIMARY KEY
nametext,
list_head   boolean, -- flag if this is the head of the linked list
nextinteger REFERENCES region
);

2) As a SQL beginner, I'm not seeing how to display rows from city
sorted in order based on the order in the region table.

3) Oh, and I have also this for checking IF there are items in
region that are above the item in question -- to see IF an item
can or cannot be moved up in the sort order relative to others.

SELECT id FROM __TABLE__
WHERE
sort_order = (SELECT sort_order FROM __TABLE__ WHERE id = ?)
 AND id != ?;

If that returns any rows then I know I can call the UPDATE to move the
item up.

Again, a very basic question: What method should be used to be sure
that nothing changes between the SELECT and the UPDATE?



-- 
Bill Moseley
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Sorting by related tables

2005-08-13 Thread Andreas Seltenreich
Bill Moseley schrob:

 create table region {
 id  SERIAL PRIMARY KEY,
 nametext,
 -- order this table should be sorted in
 -- a 1 is the top sort level
 sort_order  integer
 );

 create table city {
 id  SERIAL PRIMARY KEY,
 nametext,
 region  integer REFERENCES region
 );

 I want a way to adjust the sort order of the region table (move
 item up or move item down in a web interface) without requiring
 knowledge of the existing sort_order for the rows in the region
 table.  (i.e.  requiring them to already be in an order).

 Here's my move up (which is a lower sort_order value) statement.
 (__TABLE__ is region and ? are $\d bind parameters)

 UPDATE __TABLE__
 SET sort_order =
 CASE
 -- subtract one from the item's sort, unless it's already 1
 WHEN id = ? AND sort_order  1 THEN sort_order-1

 -- for other items that are greater or equal to sort-1
 WHEN id != ? AND sort_order = (select sort_order from __TABLE__ 
 where id = ?)-1
 THEN sort_order+1

 -- all others, leave alone
 ELSE sort_order
 END;

 This works reasonably well for small tables, but doesn't scale and the
 logic likely has holes.  And behavior when adding new rows to the
 region table is not defined.

I guess your approach of maintaining a special attribute for the
custom sort order could be quite fast when using floating point
numbers instead of integers. You then could easily move a record
around without having to update _every_ other record by using proper
fractions. E.g. to move a record A between B and C, just update its
sort_order to (B.sort_order + C.sort_order) / 2.

However, with IEEE754-floats this is only guaranteed to work 1023
times in the worst case when using double precision and seeding the
sort_order with integers. So one would have to be careful and
normalize the column from time to time.

 1) How do most people do this?  Use linked lists?

I haven't had this problem yet, so I don't know if there's a standard
answer.

 create table region {
 id  SERIAL PRIMARY KEY
 nametext,
 list_head   boolean, -- flag if this is the head of the linked list
 nextinteger REFERENCES region
 );

The problem with linked lists is that they don't fit into the
relational model well, and since SQL isn't turing-complete by design,
you'd have to write a proper function with one of the procedural
languages PostgreSQL offers to iterate the list. Searching the list
archives should yield some examples.

 2) As a SQL beginner, I'm not seeing how to display rows from city
 sorted in order based on the order in the region table.

IMHO a simple join should do here, e.g.

  select city.name from city, region
 where region.id = city.region
 order by region.sort_order

Joining is explained here:
http://www.postgresql.org/docs/8.0/static/queries-table-expressions.html

 3) Oh, and I have also this for checking IF there are items in
 region that are above the item in question -- to see IF an item
 can or cannot be moved up in the sort order relative to others.

 SELECT id FROM __TABLE__
 WHERE
 sort_order = (SELECT sort_order FROM __TABLE__ WHERE id = ?)
  AND id != ?;

 If that returns any rows then I know I can call the UPDATE to move the
 item up.

I guess you want a boolean value here? SELECT EXISTS around your above
query as a subselect should do the trick. You also want to use LIMIT 1
in the statement, to avoid fetching unnecessary records.

 Again, a very basic question: What method should be used to be sure
 that nothing changes between the SELECT and the UPDATE?

You can achieve that using transactions. Concurrency control is
explained here: http://www.postgresql.org/docs/8.0/static/mvcc.html.

regards
Andreas
-- 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[GENERAL] Sorting by constant values

2005-05-03 Thread Robert Fitzpatrick
I have a column that I want to sort by certain values. The values are
Unit, Exterior and Common. I want all the records with Unit first,
Common second and Exterior last in the sort order. These are the only 3
possible values, is there a way to sort manually like that with the
alphanumeric values?

-- 
Robert


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] Sorting by constant values

2005-05-03 Thread Franco Bruno Borghesi
You can order by conditions, lets say column='Unit'. The evaluation of
a conditions will give you 't' or 'f', and alfabetically 'f' 
't'... you should use DESC to get the matches first. So, it would be
more or less like this:

ORDER BY
 column='Unit' DESC,
 column='Exterior' DESC,
 column='Common' DESC

I don't think this is performant though. If you have many rows to evaluate, you could create a funtion like this:
CREATE FUNCTION evaluate(TEXT) RETURNS TEXT LANGUAGE 'sql' AS '
 SELECT $1='Unit' || $1='Exterior' || $1='Common';
'
This function would return something like 'tff', 'ftf', 'fft', and you
should be able to create an index on that function. Then you can use
the index to order your rows.

Hope it helps ;)
2005/5/3, Robert Fitzpatrick [EMAIL PROTECTED]:
I have a column that I want to sort by certain values. The values areUnit, Exterior and Common. I want all the records with Unit first,Common second and Exterior last in the sort order. These are the only 3possible values, is there a way to sort manually like that with the
alphanumeric values?--Robert---(end of broadcast)---TIP 4: Don't 'kill -9' the postmaster

Re: [GENERAL] Sorting by constant values

2005-05-03 Thread Scott Marlowe
On Tue, 2005-05-03 at 12:29, Robert Fitzpatrick wrote:
 I have a column that I want to sort by certain values. The values are
 Unit, Exterior and Common. I want all the records with Unit first,
 Common second and Exterior last in the sort order. These are the only 3
 possible values, is there a way to sort manually like that with the
 alphanumeric values?

case statement should work.

http://www.postgresql.org/docs/8.0/static/functions-conditional.html

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] Sorting by constant values

2005-05-03 Thread Ragnar Hafstað
On Tue, 2005-05-03 at 13:29 -0400, Robert Fitzpatrick wrote:
 I have a column that I want to sort by certain values. The values are
 Unit, Exterior and Common. I want all the records with Unit first,
 Common second and Exterior last in the sort order. These are the only 3
 possible values, is there a way to sort manually like that with the
 alphanumeric values?

... ORDER BY length(col);

:-)
gnari




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[GENERAL] sorting Chinese varchar field

2005-03-28 Thread jian chen
Hi,

I installed postgres 8.0 for windows on my win xp (Simplified Chinese
version). The encoding is unicode. When I set pgsql client encoding to
gb18030, I could insert Chinese text from the command line to
postgres.

However, I could not get the sort order of Chinese varchar field to
work properly.

What I tried are as follows:

1)  installed postgres for windows and used the C locale. 
2)  installed postgres for windows and used the Chinese, PRC locale.

Again, in both cases, the backend encoding is unicode.

The other interesting thing I observed was, when setting to the C
locale, the following sql worked fine:
select * from user where  name = 'x';
x is a Chinese text
However, if I set the locale to Chinese, PRC during installation,
the above select did not get the any matching rows, where it should
have got. In this case, the following worked fine:
select * from user where name ilike 'x%';

Could anyone let me know the best practice for using postgres to store
Chinese text? (This should be the same problem I guess, for using
postgres to store other languages than English.)

Thanks a lot,

Jian

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [GENERAL] sorting Chinese varchar field

2005-03-28 Thread Magnus Hagander
Hi,

I installed postgres 8.0 for windows on my win xp (Simplified Chinese
version). The encoding is unicode. When I set pgsql client encoding to
gb18030, I could insert Chinese text from the command line to
postgres.

However, I could not get the sort order of Chinese varchar field to
work properly.

What I tried are as follows:

1)  installed postgres for windows and used the C locale. 
2)  installed postgres for windows and used the Chinese, PRC locale.

Again, in both cases, the backend encoding is unicode.

Sorting is not currently supported in the UNICODE/UTF-8 encoding on
Win32. Sorry. You need to pick a specific non-unicode encoding/locale
combination. (Or run the server on a platform that supports it, if that
is an option)

//Magnus

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] sorting Chinese varchar field

2005-03-28 Thread Tatsuo Ishii
 Hi,
 
 I installed postgres 8.0 for windows on my win xp (Simplified Chinese
 version). The encoding is unicode. When I set pgsql client encoding to
 gb18030, I could insert Chinese text from the command line to
 postgres.
 
 However, I could not get the sort order of Chinese varchar field to
 work properly.
 
 What I tried are as follows:
 
 1)  installed postgres for windows and used the C locale. 
 2)  installed postgres for windows and used the Chinese, PRC locale.
 
 Again, in both cases, the backend encoding is unicode.
 
 The other interesting thing I observed was, when setting to the C
 locale, the following sql worked fine:
 select * from user where  name = 'x';
 x is a Chinese text
 However, if I set the locale to Chinese, PRC during installation,
 the above select did not get the any matching rows, where it should
 have got. In this case, the following worked fine:
 select * from user where name ilike 'x%';
 
 Could anyone let me know the best practice for using postgres to store
 Chinese text? (This should be the same problem I guess, for using
 postgres to store other languages than English.)

I would suggest to use UNICODE/C locale combo. On most systems the
locale database for multibyte encodings are broken as far as I
know. For the sorting problem, probably you could get the right sort
order by using convert. i.e.

SELECT * FROM t1 WHERE ... ORDER BY CONVERT(your_chinese_character, 
utf_8_to_gb_18030);

If above does not work, you cannot get the right sort order even if
you use GB18030 anyway.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [GENERAL] sorting Chinese varchar field

2005-03-28 Thread jian chen
Great, that works out fine!

So, the SQL I tested with is:
select * from mytable order by convert(name, 'utf8', 'gb18030');

It produces the correct output.

Thanks Tatsuo!

Jian

On Tue, 29 Mar 2005 10:25:58 +0900 (JST), Tatsuo Ishii
[EMAIL PROTECTED] wrote:
  Hi,
 
  I installed postgres 8.0 for windows on my win xp (Simplified Chinese
  version). The encoding is unicode. When I set pgsql client encoding to
  gb18030, I could insert Chinese text from the command line to
  postgres.
 
  However, I could not get the sort order of Chinese varchar field to
  work properly.
 
  What I tried are as follows:
 
  1)  installed postgres for windows and used the C locale.
  2)  installed postgres for windows and used the Chinese, PRC locale.
 
  Again, in both cases, the backend encoding is unicode.
 
  The other interesting thing I observed was, when setting to the C
  locale, the following sql worked fine:
  select * from user where  name = 'x';
  x is a Chinese text
  However, if I set the locale to Chinese, PRC during installation,
  the above select did not get the any matching rows, where it should
  have got. In this case, the following worked fine:
  select * from user where name ilike 'x%';
 
  Could anyone let me know the best practice for using postgres to store
  Chinese text? (This should be the same problem I guess, for using
  postgres to store other languages than English.)
 
 I would suggest to use UNICODE/C locale combo. On most systems the
 locale database for multibyte encodings are broken as far as I
 know. For the sorting problem, probably you could get the right sort
 order by using convert. i.e.
 
 SELECT * FROM t1 WHERE ... ORDER BY CONVERT(your_chinese_character, 
 utf_8_to_gb_18030);
 
 If above does not work, you cannot get the right sort order even if
 you use GB18030 anyway.
 --
 Tatsuo Ishii


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] sorting Chinese varchar field

2005-03-28 Thread Tatsuo Ishii
 Great, that works out fine!
 
 So, the SQL I tested with is:
 select * from mytable order by convert(name, 'utf8', 'gb18030');

Sorry, what I wanted to say was:

SELECT * FROM t1 WHERE ... ORDER BY CONVERT(your_chinese_character using 
utf_8_to_gb_18030);

Of course your example is fine too (actually they are internally
identical).
--
Tatsuo Ishii

 It produces the correct output.
 
 Thanks Tatsuo!
 
 Jian
 
 On Tue, 29 Mar 2005 10:25:58 +0900 (JST), Tatsuo Ishii
 [EMAIL PROTECTED] wrote:
   Hi,
  
   I installed postgres 8.0 for windows on my win xp (Simplified Chinese
   version). The encoding is unicode. When I set pgsql client encoding to
   gb18030, I could insert Chinese text from the command line to
   postgres.
  
   However, I could not get the sort order of Chinese varchar field to
   work properly.
  
   What I tried are as follows:
  
   1)  installed postgres for windows and used the C locale.
   2)  installed postgres for windows and used the Chinese, PRC locale.
  
   Again, in both cases, the backend encoding is unicode.
  
   The other interesting thing I observed was, when setting to the C
   locale, the following sql worked fine:
   select * from user where  name = 'x';
   x is a Chinese text
   However, if I set the locale to Chinese, PRC during installation,
   the above select did not get the any matching rows, where it should
   have got. In this case, the following worked fine:
   select * from user where name ilike 'x%';
  
   Could anyone let me know the best practice for using postgres to store
   Chinese text? (This should be the same problem I guess, for using
   postgres to store other languages than English.)
  
  I would suggest to use UNICODE/C locale combo. On most systems the
  locale database for multibyte encodings are broken as far as I
  know. For the sorting problem, probably you could get the right sort
  order by using convert. i.e.
  
  SELECT * FROM t1 WHERE ... ORDER BY CONVERT(your_chinese_character, 
  utf_8_to_gb_18030);
  
  If above does not work, you cannot get the right sort order even if
  you use GB18030 anyway.
  --
  Tatsuo Ishii
 
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
http://archives.postgresql.org
 

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Sorting when * is the initial character

2005-02-08 Thread Russell Smith
On Tue, 8 Feb 2005 01:10 pm, CoL wrote:
 hi,
 
 Berend Tober wrote, On 2/7/2005 22:20:
  I encountered what looks like unusually sorting behavior, and I'm wondering 
  if
  anyone can tell me if this is supposted to happen (and then if so, why) or 
  if
  this is a bug:
  
  
  SELECT * FROM sample_table ORDER BY 1;
  
  account_id,account_name
  100,First account
  110,Second account
  *115,Fifth account
  120,Third account
  *125,Fourth account
  
  I would expect to see
  
  account_id,account_name
  *115,Fifth account
  *125,Fourth account
  100,First account
  110,Second account
  120,Third account

With 8.0.0  C local, SQL_ASCII Database, I get the expected output.

Regards

Russell Smith

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [GENERAL] Sorting when '*' is the initial character - solved

2005-02-08 Thread Berend Tober
 On Tue, 8 Feb 2005 01:10 pm, CoL wrote:
 hi,

 Berend Tober wrote, On 2/7/2005 22:20:
  I encountered what looks like unusually sorting behavior, and I'm
 wondering if
  anyone can tell me if this is supposted to happen (and then if so, why) or
 if
  this is a bug:

--
 With 8.0.0  C local, SQL_ASCII Database, I get the expected output.
 Russell Smith
--
 order by case when account_id like '*%' then 0 else 1 end
 C.

Thanks. It was pointed out to me that this behavior is normal and is dependent
on the locale setting.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[GENERAL] Sorting when * is the initial character

2005-02-07 Thread Berend Tober
I encountered what looks like unusually sorting behavior, and I'm wondering if
anyone can tell me if this is supposted to happen (and then if so, why) or if
this is a bug:

CREATE TABLE sample_table
(
  account_id varchar(4),
  account_name varchar(25)
)
WITHOUT OIDS;

INSERT INTO sample_table VALUES ('100', 'First account');
INSERT INTO sample_table VALUES ('110', 'Second account');
INSERT INTO sample_table VALUES ('120', 'Third account');
INSERT INTO sample_table VALUES ('*125', 'Fourth account');
INSERT INTO sample_table VALUES ('*115', 'Fifth account');

SELECT * FROM sample_table ORDER BY 1;

account_id,account_name
100,First account
110,Second account
*115,Fifth account
120,Third account
*125,Fourth account

I would expect to see

account_id,account_name
*115,Fifth account
*125,Fourth account
100,First account
110,Second account
120,Third account




---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [GENERAL] Sorting when * is the initial character

2005-02-07 Thread Bruno Wolff III
On Mon, Feb 07, 2005 at 16:20:36 -0500,
  Berend Tober [EMAIL PROTECTED] wrote:
 
 SELECT * FROM sample_table ORDER BY 1;
 
 account_id,account_name
 100,First account
 110,Second account
 *115,Fifth account
 120,Third account
 *125,Fourth account
 
 I would expect to see
 
 account_id,account_name
 *115,Fifth account
 *125,Fourth account
 100,First account
 110,Second account
 120,Third account

This depends on your locale.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [GENERAL] Sorting when * is the initial character

2005-02-07 Thread Stephan Szabo
On Mon, 7 Feb 2005, Berend Tober wrote:

 I encountered what looks like unusually sorting behavior, and I'm wondering if
 anyone can tell me if this is supposted to happen (and then if so, why) or if
 this is a bug:

If you ran initdb with a locale such as en_US, a result like what you got
is expected.  AFAIR, the collation rules for the locale are defined to not
use symbols and spaces in the first pass comparison so '110'  '*115' 
'120'.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [GENERAL] Sorting when * is the initial character

2005-02-07 Thread CoL
hi,
Berend Tober wrote, On 2/7/2005 22:20:
I encountered what looks like unusually sorting behavior, and I'm wondering if
anyone can tell me if this is supposted to happen (and then if so, why) or if
this is a bug:
CREATE TABLE sample_table
(
  account_id varchar(4),
  account_name varchar(25)
)
WITHOUT OIDS;
INSERT INTO sample_table VALUES ('100', 'First account');
INSERT INTO sample_table VALUES ('110', 'Second account');
INSERT INTO sample_table VALUES ('120', 'Third account');
INSERT INTO sample_table VALUES ('*125', 'Fourth account');
INSERT INTO sample_table VALUES ('*115', 'Fifth account');
SELECT * FROM sample_table ORDER BY 1;
account_id,account_name
100,First account
110,Second account
*115,Fifth account
120,Third account
*125,Fourth account
I would expect to see
account_id,account_name
*115,Fifth account
*125,Fourth account
100,First account
110,Second account
120,Third account
order by case when account_id like '*%' then 0 else 1 end
C.
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


  1   2   >