Hi,
I have found a modified script I made when I was working on Galician
hunspell (see attachment). I can't remember if was entirely  finished or
not, sorry about that.
Daniel, you are right about "recursivity" into Galician affixes file. You
can find some documentation at
http://linguamatica.com/index.php/linguamatica/article/view/13 but this
paper is written in Galician language [that's the matter! ;) ]
I tested the script with Icelandic files, it ran for a long time finishing
with an error. So I opened is_IS.aff finding some mistakes that should be
fixed before Icelandic files can be unmunched.
Inside is_IS.aff there are lot of rules having a '0' at the fourth column;
i. e.
SFX 85 0 0/1,2

At the third column '0' means 'nothing' but at the fourth one is a
character <0>
So for dictionary entry
Birta/97

and affixes rule
SFX 97 N 1
SFX 97 0 0/14

Hunspell returns
$ hunspell -d is_IS
Hunspell 1.3.3
Birta
*
[match]

Birta0
*
[match]

Birtas
& Birtas 5 0: Birta, Birtast, Birtar, Birtan, Bitrasta

[doesn't match, show suggestions]

I can't speak Icelandic and it's hard to evaluate that behaviour, but I
guess 'Birta0' is not the waited form from Birta/97 (and regexp in rule 14
doesn't match a word finishing with a <0>).

There are a lot of rules like this that should be rewritten into is_IS.aff

Hope this helps


2014-10-27 9:50 GMT+01:00 Daniel Naber <daniel.na...@languagetool.org>:

> Hi,
>
> I tried to switch Icelandic and Galician to hunspell (as documented at
> http://wiki.languagetool.org/hunspell-support#toc3), but I ran into
> problems:
>
> For Icelandic, words like 'virkar' and 'texta' do not get recognized,
> simply because hunspell's unmunch doesn't create them. Does anybody have
> an idea why that might be? In other words, how can I get a complete list
> of Icelandic words from is_IS.aff and is_IS.dic?
>
> For Galician, unmunch returns entries like "construíu/102,103,104|".
> This seems to be caused by "recursive" definitions like "SFX 232 oñer
> ón/104 poñer", where a suffix is not simply replaced by another suffix,
> but by a suffix plus another tag. Can anybody confirm that? Is there a
> workaround?
>
> Any help is welcome.
>
> Regards
>   Daniel
>
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>

Attachment: unmunch.sh
Description: Bourne shell script

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to