In order to implement a plain-text search algorithm, in a language neutral way that
would still work with all scripts, I am searching for advices on how this can be done
safely (notably for automated search engines), to allow searching for text matching
some basic encoding styles.
My first
-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Philippe Verdy
Sent: Friday, June 27, 2003 1:46 PM
To: [EMAIL PROTECTED]
Subject: SPAM: Plain-text search algorithms: normalization,
decomposition, case mapping, word breaks
In order to implement a plain-text search algorithm
On Friday, June 27, 2003 3:36 PM, Jony Rosenne [EMAIL PROTECTED] wrote:
For Hebrew and Arabic, add a step: Find the root, remove prefixes,
suffixes and other grammatical artifacts and obtain the base form of
the word.
Removing common suffixes is a separate issue (this requires unification of
i'm a bit confused. i thought that this type of thing was already
pretty well covered by the various unicode resources? (i guess there's
a strong chance not, if you're asking this question).
this is the way i see it:
it's for you to decide which format you internally normalise to (i'm
not
On Friday, June 27, 2003 4:44 PM, Ben Dougall [EMAIL PROTECTED] wrote:
i'm a bit confused. i thought that this type of thing was already
pretty well covered by the various unicode resources? (i guess there's
a strong chance not, if you're asking this question).
I'm not discussing about how
5 matches
Mail list logo