I found out that the unmunch.sh script, which turns out to be from the
Hunspell 1.2.8 version (available in the folder for that version in
SourceForge) is a bit buggy. See
https://sourceforge.net/p/hunspell/bugs/147/

As I explain in that Hunspell bug report, I ended up writting a Python
script to unmunch Galician files. I’ve only implemented those features
specifically required to unmunch Galician files, but I’m open to complete
the implementation as required by other languages.

With this custom script, I get a 1.5 GiB file for Galician, when unmunch.sh
would generate a <1GiB file.

2014-11-05 10:49 GMT+01:00 R.J. Baars <r.j.ba...@xs4all.nl>:

> Like I said, Tatoeba is much too small.
>
> There will never be a new unmunch that supports all new Hunspell
> functions, since the compounding (or continuation, which is much the same)
> makes a list unlimited of size.
>
> Ruud
>
>
> > On 2014-11-04 13:29, R.J. Baars wrote:
> >
> >> I put a script generating icelandic and the data here:
> >>
> >> www.taaltik.nl/daniel/ice.zip
> >
> > I'm not sure if this approach is viable, at least for Icelandic. Just
> > too many words are missing. For example, I just needed to check a single
> > paragraph to find these words that are accepted by Hunspell but are
> > still not in the list:
> >
> > forritið
> > efnahagslegt
> > stjórnmálalegt
> > menningarlegt
> > fátækustu
> >
> > Maybe we just need to use unmunch / wait for a better unmunch.
> >
> > Regards
> >   Daniel
> >
> >
> >
> ------------------------------------------------------------------------------
> > _______________________________________________
> > Languagetool-devel mailing list
> > Languagetool-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/languagetool-devel
> >
>
>
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to