Re: [Languagetool] Word form dictionary for German

2012-11-03 Thread Dominique Pellé
Richard Eckart de Castilho wrote:

 Hi,

 I noticed today that the german.dict file in LanguageTool is a binary file, I 
 suppose created with Morplogik. Is the original data and the conversion 
 script available somewhere?

 Best,

 -- Richard

I don't know how the German dictionary is created, but
I created scripts to create the French and Breton POS tag
dictionaries. They are available in SVN. Perhaps that can be
useful to you:

src/main/resources/org/languagetool/resource/fr/create-lexicon.sh
src/main/resources/org/languagetool/resource/br/create-lexicon.pl

I'll add scripts to create the spelling dictionaries... once I figure
out how to create them.

Regards

PS: I just added the script for the French dictionary in SVN
so you may need to checkout the latest code from SVN.

-- Dominique

--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] getting fsa operational

2012-11-03 Thread Dominique Pellé
Daniel Naber wrote:

 On 22.10.2012, 21:11:30 Dominique Pellé wrote:

 But then, how do I tell morfologik that my input word
 file is in ISO8859-15 encoding?

 I'm not sure, isn't the *.info file used for that?

 I think it would be a lot easier we created scripts
 to create the fsa dictionaries and POS tag
 dictionaries, and checked-in the scripts in SVN.

 I agree. Could you provide some scripts? On the other hand, I'd like to
 avoid the redundancy in having *bat and *sh  - what about a very simple
 Java program instead?

 Regards
  Daniel


Late reply as I had to replace my dying laptop.

Personally, I prefer scripts (sh, Perl, Python...) for
such small things which run external commands and
may need to  download files etc. It's much shorter than
a Java program and easier to read.

I already checked-in scripts for the French and Breton
POS tag dictionaries. I'll add scripts for the FSA dictionaries
as well soon.

-- Dominique

--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] Word form dictionary for German

2012-11-03 Thread Richard Eckart de Castilho
Thank you all for your pointers. I found the data and documentation of its tag 
set now. It is also nice to see that the conversion to the FSA is pretty 
straight forward.

Regards,

-- Richard

Am 03.11.2012 um 08:34 schrieb Dominique Pellé dominique.pe...@gmail.com:

 Richard Eckart de Castilho wrote:
 
 Hi,
 
 I noticed today that the german.dict file in LanguageTool is a binary file, 
 I suppose created with Morplogik. Is the original data and the conversion 
 script available somewhere?
 
 Best,
 
 -- Richard
 
 I don't know how the German dictionary is created, but
 I created scripts to create the French and Breton POS tag
 dictionaries. They are available in SVN. Perhaps that can be
 useful to you:
 
 src/main/resources/org/languagetool/resource/fr/create-lexicon.sh
 src/main/resources/org/languagetool/resource/br/create-lexicon.pl
 
 I'll add scripts to create the spelling dictionaries... once I figure
 out how to create them.
 
 Regards
 
 PS: I just added the script for the French dictionary in SVN
 so you may need to checkout the latest code from SVN.
 
 -- Dominique


--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] Word form dictionary for German

2012-11-03 Thread Daniel Naber
On 02.11.2012, 09:57:25 Richard Eckart de Castilho wrote:

 I noticed today that the german.dict file in LanguageTool is a binary
 file, I suppose created with Morplogik. Is the original data and the
 conversion script available somewhere?

We extended the original data[1] a bit in LT, so you can consider the 
binary file the original data. We don't keep a text file or so as it's so 
large.

Regards
 Daniel

[1] http://www.danielnaber.de/morphologie/

-- 
http://www.danielnaber.de


--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] Fighting false alarms

2012-11-03 Thread Daniel Naber
On 02.11.2012, 12:45:17 Jaume Ortolà i Font wrote:

 Links like [[Category:animals]], [[File:lion.jpg|The lion in the jungle]]
 and so on also cause false alarms frequently.

The problem with these is that they are translated (e.g. Category - 
Kategorie). One can get the translations from the Wikipedia XML dumps 
though. If you want to put some work in that it would be nice, but maybe 
this problem will be solved by Sweble soon.

I'll send another mail about the interlanguage links later.

Regards
 Daniel

-- 
http://www.danielnaber.de


--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] Fighting false alarms

2012-11-03 Thread Daniel Naber
On 02.11.2012, 01:51:47 Jaume Ortolà i Font wrote:

 I have detected a source of false alarms (for Catalan) in the Wikipedia
 interlanguage links [1].

I have now added a regex to WikipediaQuickCheck.java which removes most 
(not all) interlanguage links. Feel free to improve it.

Regards
 Daniel

-- 
http://www.danielnaber.de


--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] getting fsa operational

2012-11-03 Thread Marcin Miłkowski
W dniu 2012-10-22 21:11, Dominique Pellé pisze:
 Daniel Naber list2...@danielnaber.de wrote:

 On 16.10.2012, 16:52:26 Ruud Baars wrote:

 By the way, I read the instruction in the wiki, but these are quite
 complex (for me).

 I think you can ignore everything not related to Java - both exporting data
 from a dictionary and creating a dictionary should be possible with Java
 only. Here's what I used last time:

 java -jar morfologik tab2morph  -i mydict.txt -o output.txt
 java -jar morfologik fsa_build -f cfsa2 -i output.txt -o mydict.dict

 Instead of morfologik you need to use the path to morfologik-tools-1.5.2-
 standalone.jar (not part of LanguageTool). mydict.txt is the dictionary in
 the same format as it gets exported (tab-delimited).

 Regards
   Daniel

 Hi

 I read...  http://languagetool.wikidot.com/hunspell-support
 But I'm still having problems creating the fsa dictionaries
 too. I remember reading that fsa does not work
 with utf8. Will this be fixed by the way?

It already is fixed. It works with utf-8 since the latest update of 
morfologik-stemming, no need to convert anything!


 I think it would be a lot easier we created scripts
 to create the fsa dictionaries and POS tag
 dictionaries, and checked-in the scripts in SVN.
 That would be a easier to look at and less
 ambiguous than a wiki page to describe how
 to create the fsa or POS tag dictionaries.

Oh well, I didn't use any scripts, as I experimented on the command 
line. I should write the scripts...

Best,
Marcin

--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel