Re: [Languagetool] Word form dictionary for German
Richard Eckart de Castilho wrote: Hi, I noticed today that the german.dict file in LanguageTool is a binary file, I suppose created with Morplogik. Is the original data and the conversion script available somewhere? Best, -- Richard I don't know how the German dictionary is created, but I created scripts to create the French and Breton POS tag dictionaries. They are available in SVN. Perhaps that can be useful to you: src/main/resources/org/languagetool/resource/fr/create-lexicon.sh src/main/resources/org/languagetool/resource/br/create-lexicon.pl I'll add scripts to create the spelling dictionaries... once I figure out how to create them. Regards PS: I just added the script for the French dictionary in SVN so you may need to checkout the latest code from SVN. -- Dominique -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: [Languagetool] getting fsa operational
Daniel Naber wrote: On 22.10.2012, 21:11:30 Dominique Pellé wrote: But then, how do I tell morfologik that my input word file is in ISO8859-15 encoding? I'm not sure, isn't the *.info file used for that? I think it would be a lot easier we created scripts to create the fsa dictionaries and POS tag dictionaries, and checked-in the scripts in SVN. I agree. Could you provide some scripts? On the other hand, I'd like to avoid the redundancy in having *bat and *sh - what about a very simple Java program instead? Regards Daniel Late reply as I had to replace my dying laptop. Personally, I prefer scripts (sh, Perl, Python...) for such small things which run external commands and may need to download files etc. It's much shorter than a Java program and easier to read. I already checked-in scripts for the French and Breton POS tag dictionaries. I'll add scripts for the FSA dictionaries as well soon. -- Dominique -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: [Languagetool] Word form dictionary for German
Thank you all for your pointers. I found the data and documentation of its tag set now. It is also nice to see that the conversion to the FSA is pretty straight forward. Regards, -- Richard Am 03.11.2012 um 08:34 schrieb Dominique Pellé dominique.pe...@gmail.com: Richard Eckart de Castilho wrote: Hi, I noticed today that the german.dict file in LanguageTool is a binary file, I suppose created with Morplogik. Is the original data and the conversion script available somewhere? Best, -- Richard I don't know how the German dictionary is created, but I created scripts to create the French and Breton POS tag dictionaries. They are available in SVN. Perhaps that can be useful to you: src/main/resources/org/languagetool/resource/fr/create-lexicon.sh src/main/resources/org/languagetool/resource/br/create-lexicon.pl I'll add scripts to create the spelling dictionaries... once I figure out how to create them. Regards PS: I just added the script for the French dictionary in SVN so you may need to checkout the latest code from SVN. -- Dominique -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: [Languagetool] Word form dictionary for German
On 02.11.2012, 09:57:25 Richard Eckart de Castilho wrote: I noticed today that the german.dict file in LanguageTool is a binary file, I suppose created with Morplogik. Is the original data and the conversion script available somewhere? We extended the original data[1] a bit in LT, so you can consider the binary file the original data. We don't keep a text file or so as it's so large. Regards Daniel [1] http://www.danielnaber.de/morphologie/ -- http://www.danielnaber.de -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: [Languagetool] Fighting false alarms
On 02.11.2012, 12:45:17 Jaume Ortolà i Font wrote: Links like [[Category:animals]], [[File:lion.jpg|The lion in the jungle]] and so on also cause false alarms frequently. The problem with these is that they are translated (e.g. Category - Kategorie). One can get the translations from the Wikipedia XML dumps though. If you want to put some work in that it would be nice, but maybe this problem will be solved by Sweble soon. I'll send another mail about the interlanguage links later. Regards Daniel -- http://www.danielnaber.de -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: [Languagetool] Fighting false alarms
On 02.11.2012, 01:51:47 Jaume Ortolà i Font wrote: I have detected a source of false alarms (for Catalan) in the Wikipedia interlanguage links [1]. I have now added a regex to WikipediaQuickCheck.java which removes most (not all) interlanguage links. Feel free to improve it. Regards Daniel -- http://www.danielnaber.de -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: [Languagetool] getting fsa operational
W dniu 2012-10-22 21:11, Dominique Pellé pisze: Daniel Naber list2...@danielnaber.de wrote: On 16.10.2012, 16:52:26 Ruud Baars wrote: By the way, I read the instruction in the wiki, but these are quite complex (for me). I think you can ignore everything not related to Java - both exporting data from a dictionary and creating a dictionary should be possible with Java only. Here's what I used last time: java -jar morfologik tab2morph -i mydict.txt -o output.txt java -jar morfologik fsa_build -f cfsa2 -i output.txt -o mydict.dict Instead of morfologik you need to use the path to morfologik-tools-1.5.2- standalone.jar (not part of LanguageTool). mydict.txt is the dictionary in the same format as it gets exported (tab-delimited). Regards Daniel Hi I read... http://languagetool.wikidot.com/hunspell-support But I'm still having problems creating the fsa dictionaries too. I remember reading that fsa does not work with utf8. Will this be fixed by the way? It already is fixed. It works with utf-8 since the latest update of morfologik-stemming, no need to convert anything! I think it would be a lot easier we created scripts to create the fsa dictionaries and POS tag dictionaries, and checked-in the scripts in SVN. That would be a easier to look at and less ambiguous than a wiki page to describe how to create the fsa or POS tag dictionaries. Oh well, I didn't use any scripts, as I experimented on the command line. I should write the scripts... Best, Marcin -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel