Re: [GENERAL] Accent insensitive search?
On Tue, Mar 24, 2009 at 4:53 PM, Jasen Betts wrote: > On 2009-03-18, cifroes wrote: >> This is a multi-part message in MIME format. >> >> --_=_NextPart_001_01C9A7E6.B32BBA87 >> Content-Type: text/plain; >> charset="iso-8859-1" >> Content-Transfer-Encoding: quoted-printable >> >> Hi, >> >> I have a DB in utf-8 and postgres 8.3.x.=20 >> >> How can I do an accent insensitive search (like ...) ? > > use a posix regular expression that matches the string you want: > > select * from foo where colname ~ '^[CcĆćĈĉĊċČč][ĀāĂ㥹Aa][Pp][ŌōŎŏŐő]$'; > > you could write a function to do the translation. Unicode normalization is an interesting way to strip accents I've found: >>> import unicodedata >>> >>> s = u'Björn' >>> s = unicodedata.normalize("NFKD", s) >>> s = ''.join(c for c in s if ord(c) < 127) >>> print s Bjorn You can also use the character names to map many more characters to the ascii equivalent. A large number of these can me smashed into ASCII using regular expressions and some manual mappings to map LETTER THORN -> th, LETTER LATERAL CLICK -> X etc. Just mapping CAPITAL LETTER XX -> XX and SMALL LETTER XX -> xx seems to get you most of europe if you special case SHARP S -> ss and THORN -> th. >>> s = u'ァ' >>> print unicodedata.name(s) KATAKANA LETTER SMALL A -- Stuart Bishop http://www.stuartbishop.net/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Accent insensitive search?
On 2009-03-18, cifroes wrote: > This is a multi-part message in MIME format. > > --_=_NextPart_001_01C9A7E6.B32BBA87 > Content-Type: text/plain; > charset="iso-8859-1" > Content-Transfer-Encoding: quoted-printable > > Hi, > > I have a DB in utf-8 and postgres 8.3.x.=20 > > How can I do an accent insensitive search (like ...) ? use a posix regular expression that matches the string you want: select * from foo where colname ~ '^[CcĆćĈĉĊċČč][ĀāĂ㥹Aa][Pp][ŌōŎŏŐő]$'; you could write a function to do the translation. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Accent insensitive search?
On Mar 18, 2009, at 11:24 AM, Alvaro Herrera wrote: Hmm, if to_ascii() doesn't work, that's something worth some research. Maybe the encoding config is broken, for example. The docs say to_ascii() only works with LATIN1, LATIN2, LATIN9, and WIN1250; maybe convert('string', 'UTF-8', 'SQL_ASCII')? - Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Accent insensitive search?
Thom Brown escribió: > Here's an example of a function I might use (although I haven't actually got > plperl installed, so can't test it myself, but you'll get the idea: > > CREATE OR REPLACE FUNCTION unaccent_string(text) RETURNS text AS $$ > my ($input_string) = @_; > $input_string =~ s/[âãäåāăą]/a; > $input_string =~ s/[ÁÂÃÄÅĀĂĄ]/A; Hmm, if to_ascii() doesn't work, that's something worth some research. Maybe the encoding config is broken, for example. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support - Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Accent insensitive search?
On Wed, 18 Mar 2009, cifroes wrote: Hi, I have a DB in utf-8 and postgres 8.3.x. How can I do an accent insensitive search (like ...) ? Take a look on text search capability and http://www.sai.msu.su/~megera/wiki/unaccent We have patches for CVS HEAD, but unfortunately they will likely go to the 8.5 release Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 - Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Accent insensitive search?
What I've done in the past in this situation is to create a separate field with the text normalized to whatever the search form is (all lower case, accents stripped, etc.), and then index and search that from the application. Although I've not tried it, a functional index that did the same thing might work, as long as you were careful in how the queries were written to ensure that it was used. - Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Accent insensitive search?
how about... select where translate(lower(myfield), 'âãäåāăąèééêëēĕėęěìíîïìĩīĭóôõöōŏőùúûüũūŭů', 'aaaeeooouuu') = 'stringiwannamatch'; or something like that. I may have miscounted the vowells in the 'to' string :) - Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Accent insensitive search?
Sam Mason wrote on 18.03.2009 18:15: On Wed, Mar 18, 2009 at 04:29:24PM -, cifroes wrote: I have a DB in utf-8 and postgres 8.3.x. How can I do an accent insensitive search (like ...) ? No good idea at the moment; I'd somehow expect to find this sort of normalization in the functionality provided by the "text search" code. My reasoning being that searching for a word should find it if you're being accurate or lazy with your accents. I've not been able to coax it into doing anything reliable though. If you can't find anything better in PG; the translate[1] function would be my best suggestion. Performance should be better than using regular expressions. Are you aware of any plans to add accent-insesitive collations to PG? Would be great if PG offered a similar flexibility here as other databases. Thomas - Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Accent insensitive search?
2009/3/18 Sam Mason > > If you can't find anything better in PG; the translate[1] function would > be my best suggestion. Performance should be better than using regular > expressions. > > Yeah, that does appear to perform better. I tried the following at it worked for me: CREATE OR REPLACE FUNCTION unaccent_string(text) RETURNS text AS $$ DECLARE input_string text := $1; BEGIN input_string := translate(input_string, 'âãäåāăąÁÂÃÄÅĀĂĄ', 'aaa'); input_string := translate(input_string, 'èééêëēĕėęěĒĔĖĘĚ', 'eee'); input_string := translate(input_string, 'ìíîïìĩīĭÌÍÎÏÌĨĪĬ', ''); input_string := translate(input_string, 'óôõöōŏőÒÓÔÕÖŌŎŐ', 'ooo'); input_string := translate(input_string, 'ùúûüũūŭůÙÚÛÜŨŪŬŮ', ''); return input_string; END; $$ LANGUAGE plpgsql; Regards Thom
Re: [GENERAL] Accent insensitive search?
Ooops! Silly me! I should have read more carefully ... (blush) sorry! Pedro Doria Meunier GSM: +351961720188 Skype: pdoriam On Wednesday 18 March 2009 04:46:16 pm Pedro Doria Meunier wrote: > Hi, > > use ILIKE > > HTH, > > Pedro Doria Meunier > GSM: +351961720188 > Skype: pdoriam > > On Wednesday 18 March 2009 04:29:24 pm cifroes wrote: > > Hi, > > > > I have a DB in utf-8 and postgres 8.3.x. > > > > How can I do an accent insensitive search (like ...) ? > > > > > > TIA signature.asc Description: This is a digitally signed message part.
Re: [GENERAL] Accent insensitive search?
On Wed, Mar 18, 2009 at 04:29:24PM -, cifroes wrote: > I have a DB in utf-8 and postgres 8.3.x. > > How can I do an accent insensitive search (like ...) ? No good idea at the moment; I'd somehow expect to find this sort of normalization in the functionality provided by the "text search" code. My reasoning being that searching for a word should find it if you're being accurate or lazy with your accents. I've not been able to coax it into doing anything reliable though. If you can't find anything better in PG; the translate[1] function would be my best suggestion. Performance should be better than using regular expressions. -- Sam http://samason.me.uk/ [1] http://www.postgresql.org/docs/current/static/functions-string.html - Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Accent insensitive search?
Here's an example of a function I might use (although I haven't actually got plperl installed, so can't test it myself, but you'll get the idea: CREATE OR REPLACE FUNCTION unaccent_string(text) RETURNS text AS $$ my ($input_string) = @_; $input_string =~ s/[âãäåāăą]/a; $input_string =~ s/[ÁÂÃÄÅĀĂĄ]/A; $input_string =~ s/[èééêëēĕėęě]/e; $input_string =~ s/[ĒĔĖĘĚ]/E; $input_string =~ s/[ìíîïìĩīĭ]/i; $input_string =~ s/[ÌÍÎÏÌĨĪĬ]/I; $input_string =~ s/[óôõöōŏő]/o; $input_string =~ s/[ÒÓÔÕÖŌŎŐ]/O; $input_string =~ s/[ùúûüũūŭů]/u; $input_string =~ s/[ÙÚÛÜŨŪŬŮ]/U; return $input_string; $$ LANGUAGE plperl; Regards Thom
Re: [GENERAL] Accent insensitive search?
2009/3/18 Pedro Doria Meunier > Hi, > > use ILIKE > > HTH, > ILIKE is only case-insensitive, and won't match accented characters. The only thing I can think of doing is to create a function which will replace characters with their equivalent non-accented counterparts and use that in the query. Example: SELECT name FROM people WHERE unaccent_string(name) ~* 'Pédro'; The function would perform a reg-ex replace similar to: s/[èééêë]/e Just an idea Thom
Re: [GENERAL] Accent insensitive search?
Not case insensitive but accent insensitive :) And I tried select to_ascii('capo','LATIN1'), to_ascii('çapo','LATIN1') and the results are different -Original Message- From: pgsql-general-ow...@postgresql.org on behalf of Pedro Doria Meunier Sent: Wed 3/18/2009 4:46 PM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] Accent insensitive search? Hi, use ILIKE HTH, Pedro Doria Meunier GSM: +351961720188 Skype: pdoriam On Wednesday 18 March 2009 04:29:24 pm cifroes wrote: > Hi, > > I have a DB in utf-8 and postgres 8.3.x. > > How can I do an accent insensitive search (like ...) ? > > > TIA
Re: [GENERAL] Accent insensitive search?
Hi, use ILIKE HTH, Pedro Doria Meunier GSM: +351961720188 Skype: pdoriam On Wednesday 18 March 2009 04:29:24 pm cifroes wrote: > Hi, > > I have a DB in utf-8 and postgres 8.3.x. > > How can I do an accent insensitive search (like ...) ? > > > TIA signature.asc Description: This is a digitally signed message part.
Re: [GENERAL] Accent-insensitive search
turbovince escribió: > Hello, I would like to perform some accent-insensitive searches on my > database, which means that a select query with condition, say, WHERE > NAME = 'HELLÔ' would return records where name is 'HELLO' as well. > > My data is encoded in Unicode (UTF8) and therefore I cannot use > Postgre's to_ascii() trick to achieve accent-insensitive searches. Use the convert() function to turn it into Latin1 (or whatever encoding you prefer), then to_ascii. Note that if you have strange chars it will fail anyway (for example there was this guy not long ago complaining in pgsql-es-ayuda that it failed when he had the "mu" greek letter in a product description). -- Alvaro Herrera http://www.flickr.com/photos/alvherre/ "El sabio habla porque tiene algo que decir; el tonto, porque tiene que decir algo" (Platon). ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Accent-insensitive search
On Monday 09 July 2007 18:33:49 turbovince wrote: > Hello, I would like to perform some accent-insensitive searches on my > database, which means that a select query with condition, say, WHERE > NAME = 'HELLÔ' would return records where name is 'HELLO' as well. > > My data is encoded in Unicode (UTF8) and therefore I cannot use > Postgre's to_ascii() trick to achieve accent-insensitive searches. > > Is there any way I could perform such searches with an UTF8 encoding ? Check the translate() documentation. It isn't the best thing in the world to have to use it and code things by yourself, but it works. -- Jorge Godoy <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Accent insensitive search
Diego Manilla Suárez wrote: > Hi. I have a few databases created with UNICODE encoding, and I would > like to be able to search with accent insensitivity. There's something > in Oracle (NLS_COMP, NLS_SORT) and SQL Server (don't remember) to do > this, but I found nothing in PostgreSQL, just the 'to_ascii' function, > which AFAIK, doesn't work with UNICODE. to_ascii() doesn't work with UTF-8, but you can use convert() to turn the UTF8 text into Latin-1 and then use to_ascii() to remove the funny bits. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Accent insensitive search
PFC wrote: >> Hi. I have a few databases created with UNICODE encoding, and I would >> like to be able to search with accent insensitivity. There's something >> in Oracle (NLS_COMP, NLS_SORT) and SQL Server (don't remember) to do >> this, but I found nothing in PostgreSQL, just the 'to_ascii' function, >> which AFAIK, doesn't work with UNICODE. > > The easiest way is to create an extra column which will hold a copy of > your text, with all accents removed. You can also convert it to lowercase > and remove apostrophes, punctuation etc. Said column is kept up to date > with a trigger. Creating an extra column which holds a copy of the data is certainly not a very good idea, as it will create unnecessary redundancy (breaking normal forms), bloat your table, and the trigger will have a certain performance impact. My suggestion is to write a function that removes the accents in a string for your language, let's call it noaccents(text). Then you can index the column atext with CREATE INDEX atable_atext_idx ON atable ((noaccents(atext))) Then every query of the form ... WHERE noaccents(atext) = noaccents('SOMÉTHING') can use the index. Yours, Laurenz Albe ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Accent insensitive search
"PFC" <[EMAIL PROTECTED]> writes: >> Hi. I have a few databases created with UNICODE encoding, and I would like to >> be able to search with accent insensitivity. There's something in Oracle >> (NLS_COMP, NLS_SORT) and SQL Server (don't remember) to do this, but I found >> nothing in PostgreSQL, just the 'to_ascii' function, which AFAIK, doesn't >> work with UNICODE. Postgres supports localized collation orders but what it doesn't currently support is having multiple collation orders within the same server. So when you initialize the server with initdb it takes the setting of LC_COLLATE (usually from LC_ALL) and stores that permanently. If you initdb with LC_COLLATE set to a locale like en_GB.UTF-8 or something like that you may find one that has the behaviour you want. I think they won't be entirely accent insensitive but they'll consider accents only if the rest of the string is identical. You can test the sort order of a locale by writing a file with sample words and sorting with something like: LC_ALL=en_GB.UTF-8 sort /tmp/data > The easiest way is to create an extra column which will hold a copy of > your text, with all accents removed. You can also convert it to lowercase and > remove apostrophes, punctuation etc. Said column is kept up to date with a > trigger. That's another alternative which is useful if you need multiple collations in your database. This gives you control over which collation is used when and exactly what the rules are. The downside is that you have to reinvent the collation rules which the localized collations already provide. You don't necessarily have to keep a column in your table with the normalized strings. You can normalize "on-the-fly" using an expression index as long as your function always returns the same data given the same inputs (and is therefore marked "immutable"). -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Accent insensitive search
Hi. I have a few databases created with UNICODE encoding, and I would like to be able to search with accent insensitivity. There's something in Oracle (NLS_COMP, NLS_SORT) and SQL Server (don't remember) to do this, but I found nothing in PostgreSQL, just the 'to_ascii' function, which AFAIK, doesn't work with UNICODE. The easiest way is to create an extra column which will hold a copy of your text, with all accents removed. You can also convert it to lowercase and remove apostrophes, punctuation etc. Said column is kept up to date with a trigger. Python is suitable for this (use unicodedata.normalize). Keeping a copy of the processed data will speed up search versus WHERE remove_accents( blah ) = 'text', even with a function index. Note that this function could be written in C and use a table on the first 64K unicode symbols for speedup. See attached file. create_ft_functions.sql Description: Binary data ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Accent insensitive search
On Tue, Jul 01, 2003 at 03:11:54PM +, Alejandro Javier Pomeraniec wrote: > Hi ! Does anyone knows how to make accent insensitive searches?? Convert both the pattern and the column to ASCII with to_ascii > SELECT * FROM testtable WHERE testfield like '%olo%'; SELECT * FROM testtable WHERE to_ascii(testfield) like '%olo%'; Do you know the pgsql-ayuda mailing list? You may like it. http://tlali.iztacala.unam.mx/mailman/listinfo/pgsql-ayuda -- Alvaro Herrera () "Pensar que el espectro que vemos es ilusorio no lo despoja de espanto, sólo le suma el nuevo terror de la locura" (Perelandra, CSLewis) ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [GENERAL] Accent insensitive search
On Tuesday 01 July 2003 17:11, Alejandro Javier Pomeraniec wrote: > Hi ! Does anyone knows how to make accent insensitive searches?? > > For example > > i have this data in a table > > > Colón > Polo > > > I need that this query > > SELECT * FROM testtable WHERE testfield like '%olo%'; > > brings both results instead of only showing Polo. One solution[*]: SELECT * FROM testtable WHERE to_ascii(testfield,'LATIN1') LIKE '%olo%' Note this might not work with all database encodings, especially UNICODE. [*] no doubt someone will be along in a moment with another. Ian Barwick [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings