Hi, I have found a modified script I made when I was working on Galician hunspell (see attachment). I can't remember if was entirely finished or not, sorry about that. Daniel, you are right about "recursivity" into Galician affixes file. You can find some documentation at http://linguamatica.com/index.php/linguamatica/article/view/13 but this paper is written in Galician language [that's the matter! ;) ] I tested the script with Icelandic files, it ran for a long time finishing with an error. So I opened is_IS.aff finding some mistakes that should be fixed before Icelandic files can be unmunched. Inside is_IS.aff there are lot of rules having a '0' at the fourth column; i. e. SFX 85 0 0/1,2
At the third column '0' means 'nothing' but at the fourth one is a character <0> So for dictionary entry Birta/97 and affixes rule SFX 97 N 1 SFX 97 0 0/14 Hunspell returns $ hunspell -d is_IS Hunspell 1.3.3 Birta * [match] Birta0 * [match] Birtas & Birtas 5 0: Birta, Birtast, Birtar, Birtan, Bitrasta [doesn't match, show suggestions] I can't speak Icelandic and it's hard to evaluate that behaviour, but I guess 'Birta0' is not the waited form from Birta/97 (and regexp in rule 14 doesn't match a word finishing with a <0>). There are a lot of rules like this that should be rewritten into is_IS.aff Hope this helps 2014-10-27 9:50 GMT+01:00 Daniel Naber <daniel.na...@languagetool.org>: > Hi, > > I tried to switch Icelandic and Galician to hunspell (as documented at > http://wiki.languagetool.org/hunspell-support#toc3), but I ran into > problems: > > For Icelandic, words like 'virkar' and 'texta' do not get recognized, > simply because hunspell's unmunch doesn't create them. Does anybody have > an idea why that might be? In other words, how can I get a complete list > of Icelandic words from is_IS.aff and is_IS.dic? > > For Galician, unmunch returns entries like "construíu/102,103,104|". > This seems to be caused by "recursive" definitions like "SFX 232 oñer > ón/104 poñer", where a suffix is not simply replaced by another suffix, > but by a suffix plus another tag. Can anybody confirm that? Is there a > workaround? > > Any help is welcome. > > Regards > Daniel > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel >
unmunch.sh
Description: Bourne shell script
------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel