Re: [GENERAL] Accent insensitive search?

2009-03-30 Thread Stuart Bishop
On Tue, Mar 24, 2009 at 4:53 PM, Jasen Betts  wrote:
> On 2009-03-18, cifroes  wrote:
>> This is a multi-part message in MIME format.
>>
>> --_=_NextPart_001_01C9A7E6.B32BBA87
>> Content-Type: text/plain;
>>       charset="iso-8859-1"
>> Content-Transfer-Encoding: quoted-printable
>>
>> Hi,
>>
>> I have a DB in utf-8 and postgres 8.3.x.=20
>>
>> How can I do an accent insensitive search (like ...) ?
>
> use a posix regular expression that matches the string you want:
>
> select * from foo where colname ~ '^[CcĆćĈĉĊċČč][ĀāĂ㥹Aa][Pp][ŌōŎŏŐő]$';
>
> you could write a function to do the translation.



Unicode normalization is an interesting way to strip accents I've found:

>>> import unicodedata
>>>
>>> s = u'Björn'
>>> s = unicodedata.normalize("NFKD", s)
>>> s = ''.join(c for c in s if ord(c) < 127)
>>> print s
Bjorn

You can also use the character names to map many more characters to
the ascii equivalent. A large number of these can me smashed into
ASCII using regular expressions and some manual mappings to map LETTER
THORN -> th, LETTER LATERAL CLICK -> X etc. Just mapping CAPITAL
LETTER XX -> XX and SMALL LETTER XX -> xx seems to get you most of
europe if you special case SHARP S -> ss and THORN -> th.

>>> s = u'ァ'
>>> print unicodedata.name(s)
KATAKANA LETTER SMALL A


-- 
Stuart Bishop 
http://www.stuartbishop.net/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Accent insensitive search?

2009-03-24 Thread Jasen Betts
On 2009-03-18, cifroes  wrote:
> This is a multi-part message in MIME format.
>
> --_=_NextPart_001_01C9A7E6.B32BBA87
> Content-Type: text/plain;
>   charset="iso-8859-1"
> Content-Transfer-Encoding: quoted-printable
>
> Hi,
>
> I have a DB in utf-8 and postgres 8.3.x.=20
>
> How can I do an accent insensitive search (like ...) ?

use a posix regular expression that matches the string you want:

select * from foo where colname ~ '^[CcĆćĈĉĊċČč][ĀāĂ㥹Aa][Pp][ŌōŎŏŐő]$';

you could write a function to do the translation.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Christophe


On Mar 18, 2009, at 11:24 AM, Alvaro Herrera wrote:

Hmm, if to_ascii() doesn't work, that's something worth some research.
Maybe the encoding config is broken, for example.


The docs say to_ascii() only works with LATIN1, LATIN2, LATIN9, and  
WIN1250; maybe convert('string', 'UTF-8', 'SQL_ASCII')?


-
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Alvaro Herrera
Thom Brown escribió:
> Here's an example of a function I might use (although I haven't actually got
> plperl installed, so can't test it myself, but you'll get the idea:
> 
> CREATE OR REPLACE FUNCTION unaccent_string(text) RETURNS text AS $$
> my ($input_string) = @_;
> $input_string =~ s/[âãäåāăą]/a;
> $input_string =~ s/[ÁÂÃÄÅĀĂĄ]/A;

Hmm, if to_ascii() doesn't work, that's something worth some research.
Maybe the encoding config is broken, for example.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Oleg Bartunov

On Wed, 18 Mar 2009, cifroes wrote:


Hi,

I have a DB in utf-8 and postgres 8.3.x.

How can I do an accent insensitive search (like ...) ?


Take a look on text search capability and 
http://www.sai.msu.su/~megera/wiki/unaccent

We have patches for CVS HEAD, but unfortunately they will likely
go to the 8.5 release

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

-
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Christophe
What I've done in the past in this situation is to create a separate  
field with the text normalized to whatever the search form is (all  
lower case, accents stripped, etc.), and then index and search that  
from the application.


Although I've not tried it, a functional index that did the same thing  
might work, as long as you were careful in how the queries were  
written to ensure that it was used.


-
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread John R Pierce



how about...

select    where translate(lower(myfield), 
'âãäåāăąèééêëēĕėęěìíîïìĩīĭóôõöōŏőùúûüũūŭů', 
'aaaeeooouuu') = 'stringiwannamatch';



or something like that.   I may have miscounted the vowells in the 'to' 
string :)




-
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Thomas Kellerer

Sam Mason wrote on 18.03.2009 18:15:

On Wed, Mar 18, 2009 at 04:29:24PM -, cifroes wrote:
I have a DB in utf-8 and postgres 8.3.x. 


How can I do an accent insensitive search (like ...) ?


No good idea at the moment; I'd somehow expect to find this sort of
normalization in the functionality provided by the "text search" code.
My reasoning being that searching for a word should find it if you're
being accurate or lazy with your accents.  I've not been able to coax it
into doing anything reliable though.

If you can't find anything better in PG; the translate[1] function would
be my best suggestion.  Performance should be better than using regular
expressions.


Are you aware of any plans to add accent-insesitive collations to PG?

Would be great if PG offered a similar flexibility here as other databases.

Thomas


-
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Thom Brown
2009/3/18 Sam Mason 

>
> If you can't find anything better in PG; the translate[1] function would
> be my best suggestion.  Performance should be better than using regular
> expressions.
>
>
Yeah, that does appear to perform better.  I tried the following at it
worked for me:

CREATE OR REPLACE FUNCTION unaccent_string(text) RETURNS text AS $$
DECLARE
input_string text := $1;
BEGIN

input_string := translate(input_string, 'âãäåāăąÁÂÃÄÅĀĂĄ',
'aaa');
input_string := translate(input_string, 'èééêëēĕėęěĒĔĖĘĚ',
'eee');
input_string := translate(input_string, 'ìíîïìĩīĭÌÍÎÏÌĨĪĬ',
'');
input_string := translate(input_string, 'óôõöōŏőÒÓÔÕÖŌŎŐ',
'ooo');
input_string := translate(input_string, 'ùúûüũūŭůÙÚÛÜŨŪŬŮ',
'');

return input_string;
END;
$$ LANGUAGE plpgsql;


Regards

Thom


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Pedro Doria Meunier
Ooops!
Silly me! I should have read more carefully ... (blush) sorry!

Pedro Doria Meunier
GSM: +351961720188
Skype: pdoriam

On Wednesday 18 March 2009 04:46:16 pm Pedro Doria Meunier wrote:
> Hi,
>
> use ILIKE
>
> HTH,
>
> Pedro Doria Meunier
> GSM: +351961720188
> Skype: pdoriam
>
> On Wednesday 18 March 2009 04:29:24 pm cifroes wrote:
> > Hi,
> >
> > I have a DB in utf-8 and postgres 8.3.x.
> >
> > How can I do an accent insensitive search (like ...) ?
> >
> >
> > TIA




signature.asc
Description: This is a digitally signed message part.


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Sam Mason
On Wed, Mar 18, 2009 at 04:29:24PM -, cifroes wrote:
> I have a DB in utf-8 and postgres 8.3.x. 
> 
> How can I do an accent insensitive search (like ...) ?

No good idea at the moment; I'd somehow expect to find this sort of
normalization in the functionality provided by the "text search" code.
My reasoning being that searching for a word should find it if you're
being accurate or lazy with your accents.  I've not been able to coax it
into doing anything reliable though.

If you can't find anything better in PG; the translate[1] function would
be my best suggestion.  Performance should be better than using regular
expressions.

-- 
  Sam  http://samason.me.uk/
 
 [1] http://www.postgresql.org/docs/current/static/functions-string.html

-
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Thom Brown
Here's an example of a function I might use (although I haven't actually got
plperl installed, so can't test it myself, but you'll get the idea:

CREATE OR REPLACE FUNCTION unaccent_string(text) RETURNS text AS $$
my ($input_string) = @_;
$input_string =~ s/[âãäåāăą]/a;
$input_string =~ s/[ÁÂÃÄÅĀĂĄ]/A;
$input_string =~ s/[èééêëēĕėęě]/e;
$input_string =~ s/[ĒĔĖĘĚ]/E;
$input_string =~ s/[ìíîïìĩīĭ]/i;
$input_string =~ s/[ÌÍÎÏÌĨĪĬ]/I;
$input_string =~ s/[óôõöōŏő]/o;
$input_string =~ s/[ÒÓÔÕÖŌŎŐ]/O;
$input_string =~ s/[ùúûüũūŭů]/u;
$input_string =~ s/[ÙÚÛÜŨŪŬŮ]/U;
return $input_string;
$$ LANGUAGE plperl;


Regards

Thom


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Thom Brown
2009/3/18 Pedro Doria Meunier 

> Hi,
>
> use ILIKE
>
> HTH,
>

ILIKE is only case-insensitive, and won't match accented characters.  The
only thing I can think of doing is to create a function which will replace
characters with their equivalent non-accented counterparts and use that in
the query.

Example: SELECT name FROM people WHERE unaccent_string(name) ~* 'Pédro';

The function would perform a reg-ex replace similar to: s/[èééêë]/e

Just an idea

Thom


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread cifroes
Not case insensitive but accent insensitive :)

And I tried select to_ascii('capo','LATIN1'), to_ascii('çapo','LATIN1') and the 
results are different


-Original Message-
From: pgsql-general-ow...@postgresql.org on behalf of Pedro Doria Meunier
Sent: Wed 3/18/2009 4:46 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Accent insensitive search?
 
Hi,

use ILIKE

HTH,

Pedro Doria Meunier
GSM: +351961720188
Skype: pdoriam

On Wednesday 18 March 2009 04:29:24 pm cifroes wrote:
> Hi,
>
> I have a DB in utf-8 and postgres 8.3.x.
>
> How can I do an accent insensitive search (like ...) ?
>
>
> TIA






Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Pedro Doria Meunier
Hi,

use ILIKE

HTH,

Pedro Doria Meunier
GSM: +351961720188
Skype: pdoriam

On Wednesday 18 March 2009 04:29:24 pm cifroes wrote:
> Hi,
>
> I have a DB in utf-8 and postgres 8.3.x.
>
> How can I do an accent insensitive search (like ...) ?
>
>
> TIA




signature.asc
Description: This is a digitally signed message part.


Re: [GENERAL] Accent-insensitive search

2007-07-13 Thread Alvaro Herrera
turbovince escribió:
> Hello, I would like to perform some accent-insensitive searches on my
> database, which means that a select query with condition, say, WHERE
> NAME = 'HELLÔ' would return records where name is 'HELLO' as well.
> 
> My data is encoded in Unicode (UTF8) and therefore I cannot use
> Postgre's to_ascii() trick to achieve accent-insensitive searches.

Use the convert() function to turn it into Latin1 (or whatever encoding
you prefer), then to_ascii.

Note that if you have strange chars it will fail anyway (for example
there was this guy not long ago complaining in pgsql-es-ayuda that it
failed when he had the "mu" greek letter in a product description).

-- 
Alvaro Herrera http://www.flickr.com/photos/alvherre/
"El sabio habla porque tiene algo que decir;
el tonto, porque tiene que decir algo" (Platon).

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Accent-insensitive search

2007-07-12 Thread Jorge Godoy
On Monday 09 July 2007 18:33:49 turbovince wrote:
> Hello, I would like to perform some accent-insensitive searches on my
> database, which means that a select query with condition, say, WHERE
> NAME = 'HELLÔ' would return records where name is 'HELLO' as well.
>
> My data is encoded in Unicode (UTF8) and therefore I cannot use
> Postgre's to_ascii() trick to achieve accent-insensitive searches.
>
> Is there any way I could perform such searches with an UTF8 encoding ?

Check the translate() documentation.  It isn't the best thing in the world to 
have to use it and code things by yourself, but it works.

-- 
Jorge Godoy  <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Accent insensitive search

2007-06-22 Thread Alvaro Herrera
Diego Manilla Suárez wrote:
> Hi. I have a few databases created with UNICODE encoding, and I would 
> like to be able to search with accent insensitivity. There's something 
> in Oracle (NLS_COMP, NLS_SORT) and SQL Server (don't remember) to do 
> this, but I found nothing in PostgreSQL, just the 'to_ascii' function, 
> which AFAIK, doesn't work with UNICODE.

to_ascii() doesn't work with UTF-8, but you can use convert() to turn
the UTF8 text into Latin-1 and then use to_ascii() to remove the funny
bits.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Accent insensitive search

2007-06-21 Thread Albe Laurenz
PFC wrote:
>> Hi. I have a few databases created with UNICODE encoding, and I would  
>> like to be able to search with accent insensitivity. There's something  
>> in Oracle (NLS_COMP, NLS_SORT) and SQL Server (don't remember) to do  
>> this, but I found nothing in PostgreSQL, just the 'to_ascii' function,  
>> which AFAIK, doesn't work with UNICODE.
> 
>   The easiest way is to create an extra column which will hold a copy of  
> your text, with all accents removed. You can also convert it to lowercase  
> and remove apostrophes, punctuation etc. Said column is kept up to date  
> with a trigger.

Creating an extra column which holds a copy of the data is certainly not
a very good idea, as it will create unnecessary redundancy (breaking
normal forms), bloat your table, and the trigger will have a certain
performance impact.

My suggestion is to write a function that removes the accents in a string
for your language, let's call it noaccents(text).

Then you can index the column atext with

CREATE INDEX atable_atext_idx ON atable ((noaccents(atext)))

Then every query of the form

... WHERE noaccents(atext) = noaccents('SOMÉTHING')

can use the index.

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Accent insensitive search

2007-06-21 Thread Gregory Stark

"PFC" <[EMAIL PROTECTED]> writes:

>> Hi. I have a few databases created with UNICODE encoding, and I would like to
>> be able to search with accent insensitivity. There's something  in Oracle
>> (NLS_COMP, NLS_SORT) and SQL Server (don't remember) to do  this, but I found
>> nothing in PostgreSQL, just the 'to_ascii' function,  which AFAIK, doesn't
>> work with UNICODE.

Postgres supports localized collation orders but what it doesn't currently
support is having multiple collation orders within the same server.

So when you initialize the server with initdb it takes the setting of
LC_COLLATE (usually from LC_ALL) and stores that permanently. If you initdb
with LC_COLLATE set to a locale like en_GB.UTF-8 or something like that you
may find one that has the behaviour you want. I think they won't be entirely
accent insensitive but they'll consider accents only if the rest of the string
is identical.

You can test the sort order of a locale by writing a file with sample words
and sorting with something like:

LC_ALL=en_GB.UTF-8 sort /tmp/data

>   The easiest way is to create an extra column which will hold a copy of
> your text, with all accents removed. You can also convert it to lowercase  and
> remove apostrophes, punctuation etc. Said column is kept up to date  with a
> trigger.

That's another alternative which is useful if you need multiple collations in
your database. This gives you control over which collation is used when and
exactly what the rules are. The downside is that you have to reinvent the
collation rules which the localized collations already provide.

You don't necessarily have to keep a column in your table with the normalized
strings. You can normalize "on-the-fly" using an expression index as long as
your function always returns the same data given the same inputs (and is
therefore marked "immutable").

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Accent insensitive search

2007-06-21 Thread PFC


Hi. I have a few databases created with UNICODE encoding, and I would  
like to be able to search with accent insensitivity. There's something  
in Oracle (NLS_COMP, NLS_SORT) and SQL Server (don't remember) to do  
this, but I found nothing in PostgreSQL, just the 'to_ascii' function,  
which AFAIK, doesn't work with UNICODE.


	The easiest way is to create an extra column which will hold a copy of  
your text, with all accents removed. You can also convert it to lowercase  
and remove apostrophes, punctuation etc. Said column is kept up to date  
with a trigger.

Python is suitable for this (use unicodedata.normalize).
	Keeping a copy of the processed data will speed up search versus WHERE  
remove_accents( blah ) = 'text', even with a function index.
	Note that this function could be written in C and use a table on the  
first 64K unicode symbols for speedup.


See attached file.

create_ft_functions.sql
Description: Binary data

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Accent insensitive search

2003-07-01 Thread Alvaro Herrera
On Tue, Jul 01, 2003 at 03:11:54PM +, Alejandro Javier Pomeraniec wrote:

> Hi ! Does anyone knows how to make accent insensitive searches??

Convert both the pattern and the column to ASCII with to_ascii

> SELECT * FROM testtable WHERE testfield like '%olo%';

SELECT * FROM testtable WHERE to_ascii(testfield) like '%olo%';

Do you know the pgsql-ayuda mailing list?  You may like it.
http://tlali.iztacala.unam.mx/mailman/listinfo/pgsql-ayuda

-- 
Alvaro Herrera ()
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, CSLewis)

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [GENERAL] Accent insensitive search

2003-07-01 Thread Ian Barwick
On Tuesday 01 July 2003 17:11, Alejandro Javier Pomeraniec wrote:
> Hi ! Does anyone knows how to make accent insensitive searches??
>
> For example
>
> i have this data in a table
>
> 
> Colón
> Polo
> 
>
> I need that this query
>
> SELECT * FROM testtable WHERE testfield like '%olo%';
>
> brings both results instead of only showing Polo.

One solution[*]:

SELECT * FROM testtable WHERE to_ascii(testfield,'LATIN1') LIKE '%olo%'

Note this might not work with all database encodings, especially UNICODE.

[*] no doubt someone will be along in a moment with another.


Ian Barwick
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings