W dniu 2013-04-21 16:54, R.J. Baars pisze: > Hunspell has the repstatement for this, relacing multiple chars by others. > But with long wprds, like in ckmpounding languages, thàt does not help > because multiple mistakes in one word is quite common. > > My idea would be a multiple rep,withh the data cached to spare compüting > time.
No, hunspell does not store alternative spellings this way. It only creates alternative suggestions. I know that hunspell seems to do anything (it's like a washing sink mounted on a bike - a hotchpotch of really strange things), but it doesn't _store_ alternative spellings. I believe there are no open dictionaries of alternative spellings for most languages, hence English rules use manually entered versions. Best, Marcin > Ruud > >> On 04/21/2013 03:11 AM, Jaume Ortolà i Font wrote: >>> 2013/4/21 Andriy Rysin <ary...@gmail.com <mailto:ary...@gmail.com>> >>> >>> 1) I would like to treat several apostrophes equally (apostrophes >>> are >>> part of the word in Ukrainian), e.g. in dictionary and rules I >>> could use >>> ' (0x27) but I would like to be able to parse text that has U+2019 >>> (and >>> potentially U+02BC) the same way, I guess I could do a simple >>> replace in >>> word tokenizer but I was wondering if there's a better way >>> >>> This is what is done in Catalan. So far I have found no problem. >>> >>> Jaume >> Thanks, will try that. Another one: what's the recommended way to store >> knowledge about alternative spellings for the word, e.g. color vs >> colour? It looks like it would make sense to code this relation in the >> dictionary so that we don't have to introduce regex for alternative >> spelling and repeat it multiple times in the rules. But I looked at the >> English module and it looks like such relation is not present in the >> dictionary but instead hardcoded in the rules. >> >> Thanks >> Andriy >> ------------------------------------------------------------------------------ >> Precog is a next-generation analytics platform capable of advanced >> analytics on semi-structured data. The platform includes APIs for building >> apps and a phenomenal toolset for data science. Developers can use >> our toolset for easy data analysis & visualization. Get a free account! >> http://www2.precog.com/precogplatform/slashdotnewsletter_______________________________________________ >> Languagetool-devel mailing list >> Languagetool-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/languagetool-devel >> > > > > ------------------------------------------------------------------------------ > Precog is a next-generation analytics platform capable of advanced > analytics on semi-structured data. The platform includes APIs for building > apps and a phenomenal toolset for data science. Developers can use > our toolset for easy data analysis & visualization. Get a free account! > http://www2.precog.com/precogplatform/slashdotnewsletter > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel > > ------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel