Tom has applied a patch to fix this and backpatched it to all relivant
branches. He might be preparing a summary email about this.
---
Mage wrote:
Martijn van Oosterhout wrote:
On Sat, Dec 17, 2005 at 05:01:15PM -0500,
Martijn van Oosterhout wrote:
On Sat, Dec 17, 2005 at 05:01:15PM -0500, Tom Lane wrote:
Martijn van Oosterhout kleptog@svana.org writes:
I think the real solution is to implement COLLATE support.
Maybe so, but we still need to figure out what we're doing for the back
branches,
On Mon, Dec 19, 2005 at 08:37:26PM +0100, Martijn van Oosterhout wrote:
I don't know which locales are affected. It just can't be that
widespread because we're not getting similar reports for 99% of the
locales out there.
Not getting reports doesn't mean the problem is rare.
Perhaps people
Karsten Hilbert [EMAIL PROTECTED] writes:
On Mon, Dec 19, 2005 at 08:37:26PM +0100, Martijn van Oosterhout wrote:
I don't know which locales are affected. It just can't be that
widespread because we're not getting similar reports for 99% of the
locales out there.
Not getting reports doesn't
[snip]
Trouble is, you can never guarantee that you're dealing with actual words.
What of you're comparing someone's password that happens to contain
combination of letters that act in this way?
Well, in this case why would you care about how passwords are sorted ?
:-)
I think if the
On Sat, Dec 17, 2005 at 05:01:15PM -0500, Tom Lane wrote:
Martijn van Oosterhout kleptog@svana.org writes:
I think the real solution is to implement COLLATE support.
Maybe so, but we still need to figure out what we're doing for the back
branches, and that won't be it ...
To be honest,
Csaba Nagy [EMAIL PROTECTED] writes:
Trouble is, you can never guarantee that you're dealing with actual words.
What of you're comparing someone's password that happens to contain
combination of letters that act in this way?
Well, in this case why would you care about how passwords are
Martijn van Oosterhout kleptog@svana.org writes:
On Sat, Dec 17, 2005 at 05:01:15PM -0500, Tom Lane wrote:
Maybe so, but we still need to figure out what we're doing for the back
branches, and that won't be it ...
To be honest, there are really only a handful of locales that suffer
from this
On Mon, Dec 19, 2005 at 10:08:12AM -0500, Tom Lane wrote:
Let's not fix it is really not an acceptable answer, because the
behavior in the affected locales is entirely broken (inconsistent,
etc). And how do you know which locales are affected, anyway?
I don't know which locales are affected.
On Sat, Dec 17, 2005 at 10:30:04PM +0100, Martijn van Oosterhout wrote:
I think the real solution is to implement COLLATE support. Then we can
define all sorts of collations, like:
...
My patch was halfway there (the grammer was sorted, as were the SQL
propgation rules) but there was some
At 01:40 PM 12/16/2005 -0500, Tom Lane wrote:
Nobody's said anything about giving up locale-sensitive sorting. The
question is about locale-sensitive equality: does it really make sense
that 'tty' = 'tyty'? Would your answer change in the context
'/dev/tty' = '/dev/tyty'? Are you willing
At 11:49 AM 12/17/2005 +0800, Lincoln Yeoh wrote:
But in a column for license owner names, one might want tty and tyty
to be the same - one might have to have a multicolumn index depending on
the owner's locale of choice.
To make myself clear, one might want to store a person's name in one
Where are we on this? Given the original report:
online=# select * from common_logins where username = 'potyty';
uid | username | password | lastlogin | status | usertype | loginnum
-+--+--+---++--+--
(0 rows)
Bruce Momjian pgman@candle.pha.pa.us writes:
Where are we on this?
Waiting to see if there's any input on what the behavior needs to be.
regards, tom lane
---(end of broadcast)---
TIP 9: In versions below 8.0, the planner
Tom Lane wrote:
Perhaps the fast-path check is a bad idea, but fixing this is not
just a matter of removing that. If we subscribe to strcoll's
worldview then we have to conclude that *text strings are not
hashable*, because strings that should be equal may have different
hash codes.
By the
Peter Eisentraut [EMAIL PROTECTED] writes:
By the way, I have always been concerned about the feature of Unicode
that you can write logically equivalent strings using different
code-point sequences. Namely, you often have the option of writing an
accented letter using the legacy single
Tom Lane wrote:
Bruce Momjian pgman@candle.pha.pa.us writes:
Where are we on this?
Waiting to see if there's any input on what the behavior needs to be.
Actually, potyty and potty are not equal in Hungarian language. We
use the tyty form in complex words. Only data sorting
On Sat, Dec 17, 2005 at 02:08:06PM -0500, Tom Lane wrote:
The weight of opinion later in the thread seems to be leaning towards
the idea that we do not want to accept the word of strcoll/strxfrm about
whether two strings are equal: there are too many scenarios where lax
equality behavior would
Martijn van Oosterhout kleptog@svana.org writes:
I think the real solution is to implement COLLATE support.
Maybe so, but we still need to figure out what we're doing for the back
branches, and that won't be it ...
regards, tom lane
---(end of
Tom Lane [EMAIL PROTECTED] writes:
Peter Eisentraut [EMAIL PROTECTED] writes:
By the way, I have always been concerned about the feature of Unicode
that you can write logically equivalent strings using different
code-point sequences. Namely, you often have the option of writing an
Greg Stark wrote:
Using iso-8859-1 to encode é as a single byte versus using UTF8
which would take two bytes to encode it is an issue of using two
*different* encodings.
But that's not what we are discussing.
There is a separate issue that some characters could theoretically
have multiple
Peter Eisentraut [EMAIL PROTECTED] writes:
Greg Stark wrote:
Using iso-8859-1 to encode é as a single byte versus using UTF8
which would take two bytes to encode it is an issue of using two
*different* encodings.
But that's not what we are discussing.
The poster to which Tom was
Mage [EMAIL PROTECTED] writes:
Tom, I can send you the data I dumped and you can try it out with same
settings. I trust you.
Thanks. After digging through it, I can exhibit the problem: in hu_HU
locale and LATIN2 encoding, on a Linux box (Fedora Core 4, but probably
all glibc installations
On Fri, Dec 16, 2005 at 12:12:08PM -0500, Tom Lane wrote:
Thanks. After digging through it, I can exhibit the problem: in hu_HU
locale and LATIN2 encoding, on a Linux box (Fedora Core 4, but probably
all glibc installations will do this), I get
I don't know if this is related or not, but did
On Fri, 2005-12-16 at 18:12, Tom Lane wrote:
[snip]
I don't know anything about hu_HU comparison rules, but it appears that
strcoll() believes that these two strings should be considered equal.
Is that sane?
It is sane in a way, as the ty combination is pronounced together as a
single
Michael Fuhr [EMAIL PROTECTED] writes:
I don't know if this is related or not, but did the following issue
from a year ago ever get resolved? We were wondering then if the
Hungarian locale on some platforms might be causing problems.
AFAIR we never did figure that one out. I wasn't able to
Csaba Nagy [EMAIL PROTECTED] writes:
... So tyty and tty could be arguably both taken as double ty,
except that the official form is tty... but from a pronunciation point
of view they ARE equivalent in hungarian.
That's fair enough, but the question is should they be taken as
equivalent for
On Fri, Dec 16, 2005 at 12:12:08PM -0500, Tom Lane wrote:
Perhaps the fast-path check is a bad idea, but fixing this is not just
a matter of removing that. If we subscribe to strcoll's worldview then
we have to conclude that *text strings are not hashable*, because
strings that should be
On Fri, 2005-12-16 at 18:52, Tom Lane wrote:
Csaba Nagy [EMAIL PROTECTED] writes:
... So tyty and tty could be arguably both taken as double ty,
except that the official form is tty... but from a pronunciation point
of view they ARE equivalent in hungarian.
That's fair enough, but the
Martijn van Oosterhout kleptog@svana.org writes:
On Fri, Dec 16, 2005 at 12:12:08PM -0500, Tom Lane wrote:
Perhaps the fast-path check is a bad idea, but fixing this is not just
a matter of removing that. If we subscribe to strcoll's worldview then
we have to conclude that *text strings are
On Fri, Dec 16, 2005 at 06:54:15PM +0100, Martijn van Oosterhout wrote:
That's true, in the sense that unconverted strings are not hashable.
This is what strxfrm was created for, to return the sorting key for a
string. A quick C program demonstrates that indeed in that locale these
two strings
On Fri, Dec 16, 2005 at 01:06:58PM -0500, Tom Lane wrote:
Ah. So we could redefine hashtext() to return the hash of the strxfrm
value. Slow, but a lot better than giving up hash join and hash
aggregation altogether...
Not to put too fine a point on it, but either you want locale-sensetive
On Fri, 16 Dec 2005 11:59:48 -0600, Csaba Nagy [EMAIL PROTECTED] wrote:
On Fri, 2005-12-16 at 18:52, Tom Lane wrote:
Csaba Nagy [EMAIL PROTECTED] writes:
... So tyty and tty could be arguably both taken as double ty,
except that the official form is tty... but from a pronunciation
point
Martijn van Oosterhout kleptog@svana.org writes:
FWIW, here's some links to Microsoft and MySQL dealing with the same
issue, so we're not alone here. Hungarian seems to be a complex
language to sort, but it seems that glibc is right in this case.
The mysql bug link has a fairly detailed
Martijn van Oosterhout kleptog@svana.org writes:
On Fri, Dec 16, 2005 at 01:06:58PM -0500, Tom Lane wrote:
Ah. So we could redefine hashtext() to return the hash of the strxfrm
value. Slow, but a lot better than giving up hash join and hash
aggregation altogether...
Not to put too fine a
Tom Lane [EMAIL PROTECTED] writes:
The other approach we could take is to define text comparison as
yielding equality only for bitwise-equal strings. If strcoll() returns
zero then ask strcmp() for a second opinion.
Fwiw this has come up before on this list and it was discovered this is
At 01:40 PM 12/16/2005 -0500, Tom Lane wrote:
Nobody's said anything about giving up locale-sensitive sorting. The
question is about locale-sensitive equality: does it really make sense
that 'tty' = 'tyty'? Would your answer change in the context
'/dev/tty' = '/dev/tyty'? Are you willing to
am 15.12.2005, um 14:12:23 +0100 mailte Mage folgendes:
online=# select * from common_logins where username = 'potyty';
uid | username | password | lastlogin | status | usertype | loginnum
-+--+--+---++--+--
(0 rows)
online=# select *
On 12/15/05, A. Kretschmer [EMAIL PROTECTED] wrote:
am 15.12.2005, um 14:12:23 +0100 mailte Mage folgendes:
online=# select * from common_logins where username = 'potyty';
uid | username | password | lastlogin | status | usertype | loginnum
[snip]
even is that is true i think you need the comodin characters (ie: %.
_) to make like behave different from simple comparisons
Not entirely true, if the database was initialized in a different locale
than C, then the direct comparison will probably go for an index on
username, while like
Mage wrote:
online=# select * from common_logins where username = 'potyty';
online=# select * from common_logins where username like 'potyty';
It's probably worth seeing whether these have different plans (EXPLAIN
ANALYSE...) and if the = is using an index but like isn't.
If so, try issuing
On 12/15/05, Csaba Nagy [EMAIL PROTECTED] wrote:
[snip]
even is that is true i think you need the comodin characters (ie: %.
_) to make like behave different from simple comparisons
Not entirely true, if the database was initialized in a different locale
than C, then the direct comparison
[snip]
If that makes a difference then I'd guess you have one of two things:
1. A corrupt index (check the REINDEX command)
2. (perhaps more likely) Some localisation issues.
What encoding/locale settings are you using?
Based on the 3rd query of the OP, where the direct comparison results
A. Kretschmer wrote:
select *, length(username), length('potyty') from common_logins where username
like 'potyty';
My guess:
select length(username) from common_logins where username like 'potyty';
is _NOT_ 6, there is a SPACE like 'potyty '.
If you look my 3rd query, you will see
Csaba Nagy wrote:
[snip]
even is that is true i think you need the comodin characters (ie: %.
_) to make like behave different from simple comparisons
Not entirely true, if the database was initialized in a different locale
than C, then the direct comparison will probably go for an
If you look my 3rd query, you will see that there are no spaces, however:
select *, length(username), length('potyty') from common_logins where
username like 'potyty';
uid | username | password | lastlogin | status |
usertype | loginnum | length | length
I tried it in two databases (dump and load to another one), so I don't
think that we have corrupted indexes.
the problem persist after a dump a reload? then there is not because
an index corruption... can you send part of the data that reproduces
the bug?
I can try on a newer version of
Csaba Nagy wrote:
[snip]
If that makes a difference then I'd guess you have one of two things:
1. A corrupt index (check the REINDEX command)
2. (perhaps more likely) Some localisation issues.
What encoding/locale settings are you using?
Based on the 3rd query of the OP, where the direct
Gregory S. Williamson wrote:
I am puzzled by the lack of a % in the LIKE query. When I try this on
postgres 7.4 and 8.0 I get no rows when I am missing it; including it works as expected.
The names have been changed to protect the guilty ;-} but the core of it is true -- no
% means
Jaime Casanova wrote:
I tried it in two databases (dump and load to another one), so I don't
think that we have corrupted indexes.
the problem persist after a dump a reload? then there is not because
an index corruption... can you send part of the data that reproduces
the bug?
I
Well, then I have the disease. The database is UNICODE:
gex_runtime=# select gex_clientname from gex_clients where gex_clientname =
'HomeHappinesses';
gex_clientname
--
HomeGain
(1 row)
gex_runtime=# select
Mage wrote:
./configure --prefix=/usr/local/pgsql --with-python
client_encoding| LATIN2
lc_collate | hu_HU
lc_ctype | hu_HU
lc_messages| en_US
lc_monetary| en_US
lc_numeric |
Richard Huxton dev@archonet.com writes:
Csaba Nagy wrote:
Based on the 3rd query of the OP, where the direct comparison results in
true for all the rows which matched the like, I would exclude the
localisation issues variant... unless = is not equals in all cases ;-)
Well spotted Csaba -
Richard Huxton dev@archonet.com writes:
But like without any wildcards should be the same as =, but it isn't
in the original post.
I'm too lazy to go look at the code right now, but I think that the
reduction of x LIKE constant-pattern to x = constant-pattern is
part of the LIKE index
Ok, that explains then the problem... but the index is arguably corrupt
in this case, with the important difference that it's not even fixable
by reindex...
I guess what the OP really wants is a solution to his problem. If the
table is not too big, a quick fix would be to just drop the index.
On 12/15/05, Csaba Nagy [EMAIL PROTECTED] wrote:
Ok, that explains then the problem... but the index is arguably corrupt
in this case, with the important difference that it's not even fixable
by reindex...
I guess what the OP really wants is a solution to his problem.
MAGE was reproducing
Jaime Casanova [EMAIL PROTECTED] writes:
On 12/15/05, Csaba Nagy [EMAIL PROTECTED] wrote:
Ok, that explains then the problem... but the index is arguably corrupt
in this case, with the important difference that it's not even fixable
by reindex...
I guess what the OP really wants is a
UPDATE
I was trying to create a demo table, because I cannot send our
confidental data. I have found weird result.
# drop table common_logins;
DROP TABLE
$ psql ../cl.sql
SET
SET
SET
SET
SET
SET
CREATE TABLE
setval
203650
(1 row)
ALTER TABLE
CREATE INDEX
CREATE
Tom Lane [EMAIL PROTECTED] writes:
Non-self-consistent comparison results can lead to an index that is
either actually or effectively corrupt (because index searches proceed
down the wrong tree path and thus fail to find items that should be
found). So the observation that only index
Tom Lane [EMAIL PROTECTED] writes:
Greg Stark [EMAIL PROTECTED] writes:
But then I thought of another idea. What if Postgres just used strxfrm()
instead of strcoll everywhere? Then it ought to never produce inconsistent
results. At least if strxfrm() doesn't just return randomly varying
Greg Stark [EMAIL PROTECTED] writes:
What runtime penalty? It seems likely that strcoll is implemented by the
equivalent of calling strxfrm twice internally anyways.
Only by a very incompetent implementor.
regards, tom lane
---(end of
61 matches
Mail list logo