Re: [GENERAL] is this a bug or I am blind?

2005-12-22 Thread Bruce Momjian
Tom has applied a patch to fix this and backpatched it to all relivant branches. He might be preparing a summary email about this. --- Mage wrote: Martijn van Oosterhout wrote: On Sat, Dec 17, 2005 at 05:01:15PM -0500,

Re: [GENERAL] is this a bug or I am blind?

2005-12-21 Thread Mage
Martijn van Oosterhout wrote: On Sat, Dec 17, 2005 at 05:01:15PM -0500, Tom Lane wrote: Martijn van Oosterhout kleptog@svana.org writes: I think the real solution is to implement COLLATE support. Maybe so, but we still need to figure out what we're doing for the back branches,

Re: [GENERAL] is this a bug or I am blind?

2005-12-20 Thread Karsten Hilbert
On Mon, Dec 19, 2005 at 08:37:26PM +0100, Martijn van Oosterhout wrote: I don't know which locales are affected. It just can't be that widespread because we're not getting similar reports for 99% of the locales out there. Not getting reports doesn't mean the problem is rare. Perhaps people

Re: [GENERAL] is this a bug or I am blind?

2005-12-20 Thread Tom Lane
Karsten Hilbert [EMAIL PROTECTED] writes: On Mon, Dec 19, 2005 at 08:37:26PM +0100, Martijn van Oosterhout wrote: I don't know which locales are affected. It just can't be that widespread because we're not getting similar reports for 99% of the locales out there. Not getting reports doesn't

Re: [GENERAL] is this a bug or I am blind?

2005-12-19 Thread Csaba Nagy
[snip] Trouble is, you can never guarantee that you're dealing with actual words. What of you're comparing someone's password that happens to contain combination of letters that act in this way? Well, in this case why would you care about how passwords are sorted ? :-) I think if the

Re: [GENERAL] is this a bug or I am blind?

2005-12-19 Thread Martijn van Oosterhout
On Sat, Dec 17, 2005 at 05:01:15PM -0500, Tom Lane wrote: Martijn van Oosterhout kleptog@svana.org writes: I think the real solution is to implement COLLATE support. Maybe so, but we still need to figure out what we're doing for the back branches, and that won't be it ... To be honest,

Re: [GENERAL] is this a bug or I am blind?

2005-12-19 Thread Tom Lane
Csaba Nagy [EMAIL PROTECTED] writes: Trouble is, you can never guarantee that you're dealing with actual words. What of you're comparing someone's password that happens to contain combination of letters that act in this way? Well, in this case why would you care about how passwords are

Re: [GENERAL] is this a bug or I am blind?

2005-12-19 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes: On Sat, Dec 17, 2005 at 05:01:15PM -0500, Tom Lane wrote: Maybe so, but we still need to figure out what we're doing for the back branches, and that won't be it ... To be honest, there are really only a handful of locales that suffer from this

Re: [GENERAL] is this a bug or I am blind?

2005-12-19 Thread Martijn van Oosterhout
On Mon, Dec 19, 2005 at 10:08:12AM -0500, Tom Lane wrote: Let's not fix it is really not an acceptable answer, because the behavior in the affected locales is entirely broken (inconsistent, etc). And how do you know which locales are affected, anyway? I don't know which locales are affected.

Re: [GENERAL] is this a bug or I am blind?

2005-12-18 Thread Karsten Hilbert
On Sat, Dec 17, 2005 at 10:30:04PM +0100, Martijn van Oosterhout wrote: I think the real solution is to implement COLLATE support. Then we can define all sorts of collations, like: ... My patch was halfway there (the grammer was sorted, as were the SQL propgation rules) but there was some

Re: [GENERAL] is this a bug or I am blind?

2005-12-17 Thread Karsten Hilbert
At 01:40 PM 12/16/2005 -0500, Tom Lane wrote: Nobody's said anything about giving up locale-sensitive sorting. The question is about locale-sensitive equality: does it really make sense that 'tty' = 'tyty'? Would your answer change in the context '/dev/tty' = '/dev/tyty'? Are you willing

Re: [GENERAL] is this a bug or I am blind?

2005-12-17 Thread Lincoln Yeoh
At 11:49 AM 12/17/2005 +0800, Lincoln Yeoh wrote: But in a column for license owner names, one might want tty and tyty to be the same - one might have to have a multicolumn index depending on the owner's locale of choice. To make myself clear, one might want to store a person's name in one

Re: [GENERAL] is this a bug or I am blind?

2005-12-17 Thread Bruce Momjian
Where are we on this? Given the original report: online=# select * from common_logins where username = 'potyty'; uid | username | password | lastlogin | status | usertype | loginnum -+--+--+---++--+-- (0 rows)

Re: [GENERAL] is this a bug or I am blind?

2005-12-17 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes: Where are we on this? Waiting to see if there's any input on what the behavior needs to be. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner

Re: [GENERAL] is this a bug or I am blind?

2005-12-17 Thread Peter Eisentraut
Tom Lane wrote: Perhaps the fast-path check is a bad idea, but fixing this is not just a matter of removing that. If we subscribe to strcoll's worldview then we have to conclude that *text strings are not hashable*, because strings that should be equal may have different hash codes. By the

Re: [GENERAL] is this a bug or I am blind?

2005-12-17 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes: By the way, I have always been concerned about the feature of Unicode that you can write logically equivalent strings using different code-point sequences. Namely, you often have the option of writing an accented letter using the legacy single

Re: [GENERAL] is this a bug or I am blind?

2005-12-17 Thread Mage
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Where are we on this? Waiting to see if there's any input on what the behavior needs to be. Actually, potyty and potty are not equal in Hungarian language. We use the tyty form in complex words. Only data sorting

Re: [GENERAL] is this a bug or I am blind?

2005-12-17 Thread Martijn van Oosterhout
On Sat, Dec 17, 2005 at 02:08:06PM -0500, Tom Lane wrote: The weight of opinion later in the thread seems to be leaning towards the idea that we do not want to accept the word of strcoll/strxfrm about whether two strings are equal: there are too many scenarios where lax equality behavior would

Re: [GENERAL] is this a bug or I am blind?

2005-12-17 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes: I think the real solution is to implement COLLATE support. Maybe so, but we still need to figure out what we're doing for the back branches, and that won't be it ... regards, tom lane ---(end of

Re: [GENERAL] is this a bug or I am blind?

2005-12-17 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes: Peter Eisentraut [EMAIL PROTECTED] writes: By the way, I have always been concerned about the feature of Unicode that you can write logically equivalent strings using different code-point sequences. Namely, you often have the option of writing an

Re: [GENERAL] is this a bug or I am blind?

2005-12-17 Thread Peter Eisentraut
Greg Stark wrote: Using iso-8859-1 to encode é as a single byte versus using UTF8 which would take two bytes to encode it is an issue of using two *different* encodings. But that's not what we are discussing. There is a separate issue that some characters could theoretically have multiple

Re: [GENERAL] is this a bug or I am blind?

2005-12-17 Thread Greg Stark
Peter Eisentraut [EMAIL PROTECTED] writes: Greg Stark wrote: Using iso-8859-1 to encode é as a single byte versus using UTF8 which would take two bytes to encode it is an issue of using two *different* encodings. But that's not what we are discussing. The poster to which Tom was

Re: [GENERAL] is this a bug or I am blind?

2005-12-16 Thread Tom Lane
Mage [EMAIL PROTECTED] writes: Tom, I can send you the data I dumped and you can try it out with same settings. I trust you. Thanks. After digging through it, I can exhibit the problem: in hu_HU locale and LATIN2 encoding, on a Linux box (Fedora Core 4, but probably all glibc installations

Re: [GENERAL] is this a bug or I am blind?

2005-12-16 Thread Michael Fuhr
On Fri, Dec 16, 2005 at 12:12:08PM -0500, Tom Lane wrote: Thanks. After digging through it, I can exhibit the problem: in hu_HU locale and LATIN2 encoding, on a Linux box (Fedora Core 4, but probably all glibc installations will do this), I get I don't know if this is related or not, but did

Re: [GENERAL] is this a bug or I am blind?

2005-12-16 Thread Csaba Nagy
On Fri, 2005-12-16 at 18:12, Tom Lane wrote: [snip] I don't know anything about hu_HU comparison rules, but it appears that strcoll() believes that these two strings should be considered equal. Is that sane? It is sane in a way, as the ty combination is pronounced together as a single

Re: [GENERAL] is this a bug or I am blind?

2005-12-16 Thread Tom Lane
Michael Fuhr [EMAIL PROTECTED] writes: I don't know if this is related or not, but did the following issue from a year ago ever get resolved? We were wondering then if the Hungarian locale on some platforms might be causing problems. AFAIR we never did figure that one out. I wasn't able to

Re: [GENERAL] is this a bug or I am blind?

2005-12-16 Thread Tom Lane
Csaba Nagy [EMAIL PROTECTED] writes: ... So tyty and tty could be arguably both taken as double ty, except that the official form is tty... but from a pronunciation point of view they ARE equivalent in hungarian. That's fair enough, but the question is should they be taken as equivalent for

Re: [GENERAL] is this a bug or I am blind?

2005-12-16 Thread Martijn van Oosterhout
On Fri, Dec 16, 2005 at 12:12:08PM -0500, Tom Lane wrote: Perhaps the fast-path check is a bad idea, but fixing this is not just a matter of removing that. If we subscribe to strcoll's worldview then we have to conclude that *text strings are not hashable*, because strings that should be

Re: [GENERAL] is this a bug or I am blind?

2005-12-16 Thread Csaba Nagy
On Fri, 2005-12-16 at 18:52, Tom Lane wrote: Csaba Nagy [EMAIL PROTECTED] writes: ... So tyty and tty could be arguably both taken as double ty, except that the official form is tty... but from a pronunciation point of view they ARE equivalent in hungarian. That's fair enough, but the

Re: [GENERAL] is this a bug or I am blind?

2005-12-16 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes: On Fri, Dec 16, 2005 at 12:12:08PM -0500, Tom Lane wrote: Perhaps the fast-path check is a bad idea, but fixing this is not just a matter of removing that. If we subscribe to strcoll's worldview then we have to conclude that *text strings are

Re: [GENERAL] is this a bug or I am blind?

2005-12-16 Thread Martijn van Oosterhout
On Fri, Dec 16, 2005 at 06:54:15PM +0100, Martijn van Oosterhout wrote: That's true, in the sense that unconverted strings are not hashable. This is what strxfrm was created for, to return the sorting key for a string. A quick C program demonstrates that indeed in that locale these two strings

Re: [GENERAL] is this a bug or I am blind?

2005-12-16 Thread Martijn van Oosterhout
On Fri, Dec 16, 2005 at 01:06:58PM -0500, Tom Lane wrote: Ah. So we could redefine hashtext() to return the hash of the strxfrm value. Slow, but a lot better than giving up hash join and hash aggregation altogether... Not to put too fine a point on it, but either you want locale-sensetive

Re: [GENERAL] is this a bug or I am blind?

2005-12-16 Thread Russ Brown
On Fri, 16 Dec 2005 11:59:48 -0600, Csaba Nagy [EMAIL PROTECTED] wrote: On Fri, 2005-12-16 at 18:52, Tom Lane wrote: Csaba Nagy [EMAIL PROTECTED] writes: ... So tyty and tty could be arguably both taken as double ty, except that the official form is tty... but from a pronunciation point

Re: [GENERAL] is this a bug or I am blind?

2005-12-16 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes: FWIW, here's some links to Microsoft and MySQL dealing with the same issue, so we're not alone here. Hungarian seems to be a complex language to sort, but it seems that glibc is right in this case. The mysql bug link has a fairly detailed

Re: [GENERAL] is this a bug or I am blind?

2005-12-16 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes: On Fri, Dec 16, 2005 at 01:06:58PM -0500, Tom Lane wrote: Ah. So we could redefine hashtext() to return the hash of the strxfrm value. Slow, but a lot better than giving up hash join and hash aggregation altogether... Not to put too fine a

Re: [GENERAL] is this a bug or I am blind?

2005-12-16 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes: The other approach we could take is to define text comparison as yielding equality only for bitwise-equal strings. If strcoll() returns zero then ask strcmp() for a second opinion. Fwiw this has come up before on this list and it was discovered this is

Re: [GENERAL] is this a bug or I am blind?

2005-12-16 Thread Lincoln Yeoh
At 01:40 PM 12/16/2005 -0500, Tom Lane wrote: Nobody's said anything about giving up locale-sensitive sorting. The question is about locale-sensitive equality: does it really make sense that 'tty' = 'tyty'? Would your answer change in the context '/dev/tty' = '/dev/tyty'? Are you willing to

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread A. Kretschmer
am 15.12.2005, um 14:12:23 +0100 mailte Mage folgendes: online=# select * from common_logins where username = 'potyty'; uid | username | password | lastlogin | status | usertype | loginnum -+--+--+---++--+-- (0 rows) online=# select *

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Jaime Casanova
On 12/15/05, A. Kretschmer [EMAIL PROTECTED] wrote: am 15.12.2005, um 14:12:23 +0100 mailte Mage folgendes: online=# select * from common_logins where username = 'potyty'; uid | username | password | lastlogin | status | usertype | loginnum

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Csaba Nagy
[snip] even is that is true i think you need the comodin characters (ie: %. _) to make like behave different from simple comparisons Not entirely true, if the database was initialized in a different locale than C, then the direct comparison will probably go for an index on username, while like

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Richard Huxton
Mage wrote: online=# select * from common_logins where username = 'potyty'; online=# select * from common_logins where username like 'potyty'; It's probably worth seeing whether these have different plans (EXPLAIN ANALYSE...) and if the = is using an index but like isn't. If so, try issuing

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Jaime Casanova
On 12/15/05, Csaba Nagy [EMAIL PROTECTED] wrote: [snip] even is that is true i think you need the comodin characters (ie: %. _) to make like behave different from simple comparisons Not entirely true, if the database was initialized in a different locale than C, then the direct comparison

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Csaba Nagy
[snip] If that makes a difference then I'd guess you have one of two things: 1. A corrupt index (check the REINDEX command) 2. (perhaps more likely) Some localisation issues. What encoding/locale settings are you using? Based on the 3rd query of the OP, where the direct comparison results

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Mage
A. Kretschmer wrote: select *, length(username), length('potyty') from common_logins where username like 'potyty'; My guess: select length(username) from common_logins where username like 'potyty'; is _NOT_ 6, there is a SPACE like 'potyty '. If you look my 3rd query, you will see

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Mage
Csaba Nagy wrote: [snip] even is that is true i think you need the comodin characters (ie: %. _) to make like behave different from simple comparisons Not entirely true, if the database was initialized in a different locale than C, then the direct comparison will probably go for an

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Gregory S. Williamson
If you look my 3rd query, you will see that there are no spaces, however: select *, length(username), length('potyty') from common_logins where username like 'potyty'; uid | username | password | lastlogin | status | usertype | loginnum | length | length

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Jaime Casanova
I tried it in two databases (dump and load to another one), so I don't think that we have corrupted indexes. the problem persist after a dump a reload? then there is not because an index corruption... can you send part of the data that reproduces the bug? I can try on a newer version of

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Richard Huxton
Csaba Nagy wrote: [snip] If that makes a difference then I'd guess you have one of two things: 1. A corrupt index (check the REINDEX command) 2. (perhaps more likely) Some localisation issues. What encoding/locale settings are you using? Based on the 3rd query of the OP, where the direct

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Richard Huxton
Gregory S. Williamson wrote: I am puzzled by the lack of a % in the LIKE query. When I try this on postgres 7.4 and 8.0 I get no rows when I am missing it; including it works as expected. The names have been changed to protect the guilty ;-} but the core of it is true -- no % means

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Mage
Jaime Casanova wrote: I tried it in two databases (dump and load to another one), so I don't think that we have corrupted indexes. the problem persist after a dump a reload? then there is not because an index corruption... can you send part of the data that reproduces the bug? I

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Gregory S. Williamson
Well, then I have the disease. The database is UNICODE: gex_runtime=# select gex_clientname from gex_clients where gex_clientname = 'HomeHappinesses'; gex_clientname -- HomeGain (1 row) gex_runtime=# select

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Mage
Mage wrote: ./configure --prefix=/usr/local/pgsql --with-python client_encoding| LATIN2 lc_collate | hu_HU lc_ctype | hu_HU lc_messages| en_US lc_monetary| en_US lc_numeric |

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Tom Lane
Richard Huxton dev@archonet.com writes: Csaba Nagy wrote: Based on the 3rd query of the OP, where the direct comparison results in true for all the rows which matched the like, I would exclude the localisation issues variant... unless = is not equals in all cases ;-) Well spotted Csaba -

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Tom Lane
Richard Huxton dev@archonet.com writes: But like without any wildcards should be the same as =, but it isn't in the original post. I'm too lazy to go look at the code right now, but I think that the reduction of x LIKE constant-pattern to x = constant-pattern is part of the LIKE index

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Csaba Nagy
Ok, that explains then the problem... but the index is arguably corrupt in this case, with the important difference that it's not even fixable by reindex... I guess what the OP really wants is a solution to his problem. If the table is not too big, a quick fix would be to just drop the index.

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Jaime Casanova
On 12/15/05, Csaba Nagy [EMAIL PROTECTED] wrote: Ok, that explains then the problem... but the index is arguably corrupt in this case, with the important difference that it's not even fixable by reindex... I guess what the OP really wants is a solution to his problem. MAGE was reproducing

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Tom Lane
Jaime Casanova [EMAIL PROTECTED] writes: On 12/15/05, Csaba Nagy [EMAIL PROTECTED] wrote: Ok, that explains then the problem... but the index is arguably corrupt in this case, with the important difference that it's not even fixable by reindex... I guess what the OP really wants is a

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Mage
UPDATE I was trying to create a demo table, because I cannot send our confidental data. I have found weird result. # drop table common_logins; DROP TABLE $ psql ../cl.sql SET SET SET SET SET SET CREATE TABLE setval 203650 (1 row) ALTER TABLE CREATE INDEX CREATE

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes: Non-self-consistent comparison results can lead to an index that is either actually or effectively corrupt (because index searches proceed down the wrong tree path and thus fail to find items that should be found). So the observation that only index

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes: Greg Stark [EMAIL PROTECTED] writes: But then I thought of another idea. What if Postgres just used strxfrm() instead of strcoll everywhere? Then it ought to never produce inconsistent results. At least if strxfrm() doesn't just return randomly varying

Re: [GENERAL] is this a bug or I am blind?

2005-12-15 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes: What runtime penalty? It seems likely that strcoll is implemented by the equivalent of calling strxfrm twice internally anyways. Only by a very incompetent implementor. regards, tom lane ---(end of