Hunspell has the repstatement for this, relacing multiple chars by others.
But with long wprds, like in ckmpounding languages, thàt does not help
because multiple mistakes in one word is quite common.

My idea would be a multiple rep,withh the data cached to spare compüting
time.
Ruud

> On 04/21/2013 03:11 AM, Jaume Ortolà i Font wrote:
>> 2013/4/21 Andriy Rysin <ary...@gmail.com <mailto:ary...@gmail.com>>
>>
>>     1) I would like to treat several apostrophes equally (apostrophes
>> are
>>     part of the word in Ukrainian), e.g. in dictionary and rules I
>>     could use
>>     ' (0x27) but I would like to be able to parse text that has U+2019
>>     (and
>>     potentially U+02BC) the same way, I guess I could do a simple
>>     replace in
>>     word tokenizer but I was wondering if there's a better way
>>
>> This is what is done in Catalan. So far  I have found no problem.
>>
>> Jaume
> Thanks, will try that. Another one: what's the recommended way to store
> knowledge about alternative spellings for the word, e.g. color vs
> colour? It looks like it would make sense to code this relation in the
> dictionary so that we don't have to introduce regex for alternative
> spelling and repeat it multiple times in the rules. But I looked at the
> English module and it looks like such relation is not present in the
> dictionary but instead hardcoded in the rules.
>
> Thanks
> Andriy
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter_______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>



------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to