Re: equivalent and optional characters in words

Marcin Miłkowski Mon, 22 Apr 2013 00:43:30 -0700

W dniu 2013-04-21 16:54, R.J. Baars pisze:
> Hunspell has the repstatement for this, relacing multiple chars by others.
> But with long wprds, like in ckmpounding languages, thàt does not help
> because multiple mistakes in one word is quite common.
>
> My idea would be a multiple rep,withh the data cached to spare compüting
> time.


No, hunspell does not store alternative spellings this way. It only 
creates alternative suggestions. I know that hunspell seems to do 
anything (it's like a washing sink mounted on a bike - a hotchpotch of 
really strange things), but it doesn't _store_ alternative spellings.

I believe there are no open dictionaries of alternative spellings for 
most languages, hence English rules use manually entered versions.

Best,
Marcin


> Ruud
>
>> On 04/21/2013 03:11 AM, Jaume Ortolà i Font wrote:
>>> 2013/4/21 Andriy Rysin <ary...@gmail.com <mailto:ary...@gmail.com>>
>>>
>>>      1) I would like to treat several apostrophes equally (apostrophes
>>> are
>>>      part of the word in Ukrainian), e.g. in dictionary and rules I
>>>      could use
>>>      ' (0x27) but I would like to be able to parse text that has U+2019
>>>      (and
>>>      potentially U+02BC) the same way, I guess I could do a simple
>>>      replace in
>>>      word tokenizer but I was wondering if there's a better way
>>>
>>> This is what is done in Catalan. So far  I have found no problem.
>>>
>>> Jaume
>> Thanks, will try that. Another one: what's the recommended way to store
>> knowledge about alternative spellings for the word, e.g. color vs
>> colour? It looks like it would make sense to code this relation in the
>> dictionary so that we don't have to introduce regex for alternative
>> spelling and repeat it multiple times in the rules. But I looked at the
>> English module and it looks like such relation is not present in the
>> dictionary but instead hardcoded in the rules.
>>
>> Thanks
>> Andriy
>> ------------------------------------------------------------------------------
>> Precog is a next-generation analytics platform capable of advanced
>> analytics on semi-structured data. The platform includes APIs for building
>> apps and a phenomenal toolset for data science. Developers can use
>> our toolset for easy data analysis & visualization. Get a free account!
>> http://www2.precog.com/precogplatform/slashdotnewsletter_______________________________________________
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>
>
>
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>


------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: equivalent and optional characters in words

Reply via email to