Plain-text search algorithms: normalization, decomposition, case mapping, word breaks

2003-06-27 Thread Philippe Verdy
In order to implement a plain-text search algorithm, in a language neutral way that would still work with all scripts, I am searching for advices on how this can be done safely (notably for automated search engines), to allow searching for text matching some basic encoding styles. My first

RE: Plain-text search algorithms: normalization, decomposition, case mapping, word breaks

2003-06-27 Thread Jony Rosenne
- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Philippe Verdy Sent: Friday, June 27, 2003 1:46 PM To: [EMAIL PROTECTED] Subject: SPAM: Plain-text search algorithms: normalization, decomposition, case mapping, word breaks In order to implement a plain-text search algorithm

Re: Plain-text search algorithms: normalization, decomposition, case mapping, word breaks

2003-06-27 Thread Philippe Verdy
On Friday, June 27, 2003 3:36 PM, Jony Rosenne [EMAIL PROTECTED] wrote: For Hebrew and Arabic, add a step: Find the root, remove prefixes, suffixes and other grammatical artifacts and obtain the base form of the word. Removing common suffixes is a separate issue (this requires unification of

Re: Plain-text search algorithms: normalization, decomposition, case mapping, word breaks

2003-06-27 Thread Ben Dougall
i'm a bit confused. i thought that this type of thing was already pretty well covered by the various unicode resources? (i guess there's a strong chance not, if you're asking this question). this is the way i see it: it's for you to decide which format you internally normalise to (i'm not

Re: Plain-text search algorithms: normalization, decomposition, case mapping, word breaks

2003-06-27 Thread Philippe Verdy
On Friday, June 27, 2003 4:44 PM, Ben Dougall [EMAIL PROTECTED] wrote: i'm a bit confused. i thought that this type of thing was already pretty well covered by the various unicode resources? (i guess there's a strong chance not, if you're asking this question). I'm not discussing about how