[DataparkSearch Forum] Re: Ispell & Synonims

DataparkSearchForum Thu, 12 Apr 2007 08:41:11 -0700

- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: Oleg
Subject: Re: Ispell & Synonims

В продолжение предыдущего поста:
В настоящий момент я преследую две цели: 1. учет словоформ в open source search 
engines 2. создание spellcheckera румынского языка для OpenOffice.

В переписке с [EMAIL PROTECTED] я написал что изменил немного утилиту unmunch 
из hunspell для того чтобы в сгенерированном файле содержащий все словоформы 
разделялись сепараторм слово и его словоформы. Сказал что это мне надо для 
скормления словоформ в search engine. На что Kevin B. Hendricks сказал 
следующее:

Kevin B. Hendricks wrote:
> Hi,
>
> FYI: the unmunch algorithm for any one word and affix file is quite fast so 
> that instead of pre-expanding the root/word list you could in fact simply 
> take pieces of code from myspell that takes a word and finds a root with 
> affix flags and then expand it for all affixes on the fly so to speak (at 
> least for English).
>
> Effectively, simply spellcheck each word in the search query (which can be 
> done on the fly while typing (just like in OOo) which will identifies the 
> entry in the hash table formed from the .dic file and then expand it on the 
> fly using .aff info stored in memory to create the fuzzy word list for each 
> word if you wanted.
>
> Another nice feature of using a spellchecker with affix compression in that 
> way is that you would catch typos and could offer suggestions to replace 
> mistyped words very very easily.
>
> In fact, you could just incorporate myspell as a library (it is BSD licensed) 
> (or any other spellchecker with a compatible license) into your search code 
> and get all of these features.

------
Таким образом, путем "just incorporate myspell as a library" можно добавить 
support а) для myspell словарей б) spellchecking (--with-aspell я еще не 
тестировал)

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=05;topic_id=1122123666;page=3

[DataparkSearch Forum] Re: Ispell & Synonims

Reply via email to