Re: Modules for individual supported languages?

2013-10-04 Thread Marcin Miłkowski
W dniu 2013-10-03 21:24, Jan Schreiber pisze:
 Somebody by the name of Łukasz Janik posted this to our Facebook wall:

   prosze kazdy jezyk jako osobno

 I don't speak a single word of Polish, but according to Google
 Translator, this is a feature request to release single-language
 versions of LT. (Google and I might be wrong here of course.) ;-)

You're right. Łukasz Janik has been pushing this for years now ;)


 I tend to agree with him. Given the fact that the vast majority of
 people probably doesn't actively use more than three languages, we're
 imposing a huge overhead on our users.

I'm not so sure that the overhead is so huge, given that the broadband 
user base is growing every year.


 We've discussed this before, but I'm not sure what the outcome was. I
 think the ideal solution would be if the users could configure the
 languages they want before downloading. If that is not possible, there
 should be a clean way to remove unwanted languages during or after
 installation.

 Maybe we could have a two-step download: In the first step, you download
 the main app, perhaps with English already on board. During install, you
 can choose whatever other languages you may need.

The easiest way for (Libre|Open)Office users would be to have separate 
downloads for every language, just like with spelling dictionaries. This 
is already feasible due to modularization. We can already create this 
via Maven poms. Also, some languages could even offer spelling hooks to 
(L|O)Office to replace the deadly slow hunspell.

Same goes for the Firefox extension, and the standalone app.

The only remaining problem is that for bilingual rules, we really need 
some mechanisms to communicate between the modules, and to download 
modules on the fly. Office users don't use that, but for CheckMate 
(translation QA) that could be a problem.

Regards,
Marcin

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Modules for individual supported languages?

2013-10-04 Thread Richard Eckart de Castilho
On 04.10.2013, at 10:03, Marcin Miłkowski list-addr...@wp.pl wrote:

 The only remaining problem is that for bilingual rules, we really need 
 some mechanisms to communicate between the modules, and to download 
 modules on the fly. Office users don't use that, but for CheckMate 
 (translation QA) that could be a problem.

Downloading on the fly could be solved by hooking into the resource
loader mechanism that has been suggested elsewhere. In fact, I that
would be implemented, I was thinking of using it for enabling exactly
that. In DKPro Core, we enabled many of the language analysis modules
to automatically download their models from a Maven repository, but for
LanguageTool, we currently still bundle the whole bunch because this
loader mechanism is lacking.

-- Richard
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Facebook stats

2013-10-04 Thread Kumara Bhikkhu
Might be just a coincidence. Anyway, recently, I've been planting 
baits on QA and bugs forums of OOo and LibO, plus other places. 
Hope to attract coders that way too.

Jan Schreiber wrote thus at 04:40 PM 04-10-13:
Hello everybody,

FYI, here are a few screenshots that show recent statistics about our
Facebook page. User activity there is increasing slowly but steadily. In
the last few days, two users have contacted us via our Facebook wall.

--Jan


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: He tried not to laughs.

2013-10-04 Thread Kumara Bhikkhu
By verb I mean of course the wrong verb (i.e., non-base form).

Anyway, mails you from always end up as below: all joined up.

Marcin Miłkowski wrote thus at 04:04 PM 04-10-13:
W dniu 2013-10-04 05:00, Kumara Bhikkhu pisze:  
Disambiguator. I don't even know what that is.  
Never mind. Thanks.   I'll add rules for not 
to (verb) and (verb) to  (verb), and see how 
that goes. What do you want to achieve this way? 
These will not be error patterns... Best, 
Marcin   kb   Marcin Miłkowski wrote thus 
at  06:39 PM 03-10-13:  W dniu 2013-10-03 
11:43, Kumara Bhikkhu pisze:   Marcin 
Miłkowski wrote thus at 04 04:15 PM  
03-10-13:  Hi,   W dniu 2013-10-03 
06:24,  Kumara Bhikkhu pisze:  Can the one 
who  created this contact me personally? It's 
not  triggering  He tried not to 
laugh_s_.  I  don't know how to correct 
it.   I'll write  this on the list -- 
laughs is also plural of  laugh, which  
is excluded by the exception  below (NNS). 
Unfortunately, without this   exception, a 
lot of false alarms are found.I thought 
so.Now, maybe we could have a  second 
variant of the rule that takes not   
+to, and then the exception would not be  
required. This would have to  be tested on 
a  large corpus.   Anyway of indicating 
verbs but  excepting those what are also 
nouns? No, it's  not possible unless you have 
a perfect rule in  the disambiguator for this. 
Best, Marcin  
-- 
 
  October Webinars: Code for Performance 
Free  Intel webinars can help you 
accelerate  application performance. Explore 
tips for MPI,  OpenMP, advanced profiling, and 
more. Get the  most from the latest Intel 
processors and  coprocessors. See abstracts 
and register   
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk  
  
___  
  Languagetool-devel mailing list  
Languagetool-devel@lists.sourceforge.net  
https://lists.sourceforge.net/lists/listinfo/languagetool-devel  

-- 
 
  October Webinars: Code for Performance  Free 
Intel webinars can help you accelerate 
application performance.  Explore tips for MPI, 
OpenMP, advanced profiling, and more. Get the 
most from  the latest Intel processors and 
coprocessors. See abstracts and register   
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk  
  
___  
  Languagetool-devel mailing list  
Languagetool-devel@lists.sourceforge.net  
https://lists.sourceforge.net/lists/listinfo/languagetool-devel  
   .  
-- 
October Webinars: Code for Performance Free 
Intel webinars can help you accelerate 
application performance. Explore tips for MPI, 
OpenMP, advanced profiling, and more. Get the 
most from the latest Intel processors and 
coprocessors. See abstracts and register  
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk 
___ 
Languagetool-devel mailing list 
Languagetool-devel@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: building a synthesizer

2013-10-04 Thread Jaume Ortolà i Font
Daniel,

I found the same problem recently. I resorted to the attached perl script
for this step.

Regards,
Jaume Ortolà




2013/10/4 Daniel Naber list2...@danielnaber.de

 Hi,

 did anybody recently build a synthesizer? When I follow the instructions
 at http://wiki.languagetool.org/developing-a-tagger-dictionary#toc8 I
 get messages like this:

 Line number 1 has less than 3 tab-separated fields: |I  

 So what's the correct input format for the tab2morph step? When I use a
 format with three tab-separated columns the synth dict I get is very
 large (10MB).

 Regards
   Daniel

 --
 http://www.danielnaber.de



 --
 October Webinars: Code for Performance
 Free Intel webinars can help you accelerate application performance.
 Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
 from
 the latest Intel processors and coprocessors. See abstracts and register 
 http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk
 ___
 Languagetool-devel mailing list
 Languagetool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/languagetool-devel



morph_data_ca.pl
Description: Binary data
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: building a synthesizer

2013-10-04 Thread Dominique Pellé
Hi

Why aren't the scripts in of all binary dictionaries in Git?
They are useful when dictionaries need to be upgraded.
And they help maintainers of other languages to figure out
how to their dictionaries. A script is less ambiguous than
documentation.

I have not created a synthesizer dictionary yet, but
the POS and FSA for French and Breton
are created from scripts in Git:

languagetool-language-modules/fr/src/main/resources/org/languagetool/resource/fr/create-lexicon.sh
languagetool-language-modules/br/src/main/resources/org/languagetool/resource/br/create-lexicon.pl
languagetool-language-modules/br/src/main/resources/org/languagetool/resource/br/hunspell/create-fsa-spell-dictionary.sh

In fact, I saw that one of the reasons for not packaging
LanguageTool in Debian, is because we don't automate
the creation of binary dictionaries:

https://bugs.launchpad.net/ubuntu/+source/openoffice.org/+bug/114375

Regards
Dominique

Jaume Ortolà i Font wrote:

 Daniel,

 I found the same problem recently. I resorted to the attached perl script
 for this step.

 Regards,
 Jaume Ortolà

 2013/10/4 Daniel Naber list2...@danielnaber.de

 Hi,

 did anybody recently build a synthesizer? When I follow the instructions
 at http://wiki.languagetool.org/developing-a-tagger-dictionary#toc8 I
 get messages like this:

 Line number 1 has less than 3 tab-separated fields: |I  

 So what's the correct input format for the tab2morph step? When I use a
 format with three tab-separated columns the synth dict I get is very
 large (10MB).

 Regards
   Daniel

 --
 http://www.danielnaber.de

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: building a synthesizer

2013-10-04 Thread Marcin Miłkowski
W dniu 2013-10-04 18:28, Daniel Naber pisze:
 Hi,

 did anybody recently build a synthesizer? When I follow the instructions
 at http://wiki.languagetool.org/developing-a-tagger-dictionary#toc8 I
 get messages like this:

 Line number 1 has less than 3 tab-separated fields: |I

 So what's the correct input format for the tab2morph step? When I use a
 format with three tab-separated columns the synth dict I get is very
 large (10MB).

The message can be turned off, and should be turned off. This is just a 
warning from the tab2morph tool by using -nw switch.

For example:

java -jar $(morfologik) tab2morph -nw -i synt.txt -o synt_in.txt 2 
/dev/null

Best,
Marcin

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: building a synthesizer

2013-10-04 Thread Daniel Naber
On 2013-10-04 19:28, Dominique Pellé wrote:

 Why aren't the scripts in of all binary dictionaries in Git?

Why have scripts at all? Now with maven we can set up a small project 
that has morfologik-tools as a dependency and that does all import and 
export work by calling the morfologik methods from Java.

Anyway, Marcin's hint to use the -nw option helped and I now have the 
synthesizer I needed.

Regards
  Daniel

-- 
http://www.danielnaber.de


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: xml editor

2013-10-04 Thread Kumara Bhikkhu
Daniel Naber wrote thus at 02:12 PM 03-10-13:
On 2013-09-27 10:30, Kumara Bhikkhu wrote:

  I realise that working on an XML with a normal text editor is no fun.
  What's a good (free) xml editor for our purpose?

This is only complementary to a local editor, but it helps to test the
rule against real data:
http://community.languagetool.org/ruleEditor/expert
I wish it had code completion but the underlying editor doesn't support
that yet (or not easily).

I know this one. Wish it can do more than one rule at a time though.
I was referring to an editor that can tell if I missed out an end tag 
(perhaps because I accidentally deleted it) from the xml.

kb


--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel