Re: alternative to language constructor

2015-12-29 Thread Richard Eckart de Castilho
If I wanted to set up a default mapping from two-letter country codes to 
five-letter codes,
e.g. "de" -> "de-DE" so that I can enable spell checking and friends for users 
of DKPro Core,
what would be the best way to go about this?

Is there a way to query all available language codes?

Is there a way to query the capabilities available for a given language code?

Best,

-- Richard

> On 17.12.2015, at 18:14, Richard Eckart de Castilho 
> <richard.eck...@gmail.com> wrote:
> 
> On 17.12.2015, at 17:17, Daniel Naber <daniel.na...@languagetool.org> wrote:
>> 
>> On 2015-12-16 23:33, Richard Eckart de Castilho wrote:
>> 
>>> Personally, I like the method where I can pass a language code as a 
>>> String
>>> and get back the proper language:
>>> 
>>> Languages.getLanguageForShortName(langCode)
>>> 
>>> Wouldn't it be reasonable to discourage directly constructing languages
>>> in favor of using a central factory class like Languages?
>> 
>> Maybe... but it doesn't solve this very problem, as users then call 
>> getLanguageForShortName("de") and also wonder why they don't get spell 
>> checking. They need to call getLanguageForShortName("de-DE") for that.
> 
> You know what... now I understand why one guy trying to use the 
> LT spell checker in DKPro Core reported problem in getting spelling
> corrections:
> 
> https://groups.google.com/d/msg/dkpro-core-user/BydqkdBjqMM/dkFw5PYUx1UJ
> 
> We use the two-letter codes everywhere!
> 
> How about doing a built-in default mapping for the two-letter codes to, e.g.
> "de" -> "de-DE", "pt" -> "pt-PT" (I can already see Brazilians to argue for
> "pt-BR" ;) ) etc?
> 
> Cheers,
> 
> -- Richard


--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: alternative to language constructor

2015-12-17 Thread Richard Eckart de Castilho
On 17.12.2015, at 17:17, Daniel Naber <daniel.na...@languagetool.org> wrote:
> 
> On 2015-12-16 23:33, Richard Eckart de Castilho wrote:
> 
>> Personally, I like the method where I can pass a language code as a 
>> String
>> and get back the proper language:
>> 
>>  Languages.getLanguageForShortName(langCode)
>> 
>> Wouldn't it be reasonable to discourage directly constructing languages
>> in favor of using a central factory class like Languages?
> 
> Maybe... but it doesn't solve this very problem, as users then call 
> getLanguageForShortName("de") and also wonder why they don't get spell 
> checking. They need to call getLanguageForShortName("de-DE") for that.

You know what... now I understand why one guy trying to use the 
LT spell checker in DKPro Core reported problem in getting spelling
corrections:

https://groups.google.com/d/msg/dkpro-core-user/BydqkdBjqMM/dkFw5PYUx1UJ

We use the two-letter codes everywhere!

How about doing a built-in default mapping for the two-letter codes to, e.g.
"de" -> "de-DE", "pt" -> "pt-PT" (I can already see Brazilians to argue for
"pt-BR" ;) ) etc?

Cheers,

-- Richard

--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: alternative to language constructor

2015-12-16 Thread Richard Eckart de Castilho
Factory method sounds good. The only thing not nice is that static methods
cannot be governed by an interface which would formally conventionalized them
across all languages.

Personally, I like the method where I can pass a language code as a String
and get back the proper language:

  Languages.getLanguageForShortName(langCode)

Wouldn't it be reasonable to discourage directly constructing languages
in favor of using a central factory class like Languages?

Best,

-- Richard

> On 15.12.2015, at 19:31, Daniel Naber  wrote:
> 
> Hi,
> 
> users of the API regularly use "new German()" and then wonder why they 
> don't get spell checking. The solution is to use e.g. "new 
> GermanyGerman()" to specify the variant. That's confusing to almost 
> everyone including myself, so I'd like to change the API: the 
> constructor "German" will be deprecated and there will be a static 
> get(Variant) method. From the users' perspective:
> 
> old: new German() - new: German.get(German.Variant.NoSpellChecking)
> old: new GermanyGerman() - new German.get(German.Variant.Germany)
> 
> Same for English and other languages where this is relevant. Let me know 
> if you see a problem with that or if you know a better solution.
> 
> Regards
>  Daniel


--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: job: marketing LanguageTool

2015-12-10 Thread Richard Eckart de Castilho
Hi,

in my experience, it is terribly hard to get new contributors amongst 
developers and I believe that open source people that focus on marketing are an 
even rarer species.

Anyway, the "job advertisement" doesn't seem to be taking into account the 
perspective of those that you wish to recruit. I think the basic question 
everybody asks is "what is in it for me?" and the message here goes "work, no 
money".

Consider why somebody would be wanting to contribute here.

- Can they learn and if so what?
- Can they help and if so whom - and does that make the project cool?
- Do they get visibility and what for?
- Is there a cool community and why is it cool?
- anything else?

Consider if what you are asking for is attractive.

Right now, it looks like you are basically looking for somebody who is already 
highly motivated and ready to jump in head forwards by creating a long-term! 
concept and implementing it. This sounds like a hell of work - no fun actually! 
- just 2 hours per week hard work.

Consider demanding less and giving the people a better idea what it is that 
could actually be done. 

To be honest, I don't think that any of the above comments will actually help 
in the short term. But I hope you find them useful anyway. Advertising "jobs" 
is also a form of marketing and advertising volunteer jobs - well - is 
particularly hard!

If you have the time, you might find this open source book interesting: 
http://open-advice.org

Cheers,

-- Richard

> On 23.11.2015, at 06:27, Daniel Naber  wrote:
> 
> (sending this again, this time also to the forum, twitter etc.)
> 
> Hi,
> 
> LanguageTool needs someone who takes care of its marketing. Developers 
> are usually not good at marketing, and they are busy with programming 
> anyway. What would you do? Your task would simply be:
> 
> Make LanguageTool and its add-ons more popular.
> 
> How you do that is up to you. You could blog, improve our website, or 
> create a long-term concept for marketing and then help making it 
> reality. Like the developers, you won't get paid with money but with 
> fame and a languagetool.org email address. You either have experience in 
> marketing or are eager to learn. You should be able to regularly spend 
> at least 2 hours per week on it. Who would like to take this job? Please 
> reply here or to me personally.
> 
> Regards
>  Daniel


--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: marketing LT in US/UK

2015-09-10 Thread Richard Eckart de Castilho
Hi Daniel,

does the map show visitors or users (e.g. people interacting sensibly with the 
forms on the website)?

A substantial part of requests from some of the countries, e.g. Russia, might 
be due to bots.

-- Richard

On 09.09.2015, at 23:55, Daniel Naber  wrote:

> Hi,
> 
> the attached image shows where languagetool.org visitors are coming from: the 
> darker the blue, the more visitors. As you can see, relatively few users are 
> from US/UK, even though English is one of the languages that we support for a 
> long time and whose support should be quite good.
> 
> Please post your ideas on how we could make LT better known in English 
> speaking countries, any idea is welcome.
> 
> Regards
> Daniel
> --
> Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
> Get real-time metrics from all of your servers, apps and tools
> in one place.
> SourceForge users - Click here to start your Free Trial of Datadog now!
> http://pubads.g.doubleclick.net/gampad/clk?id=241902991=/4140___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel


--
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991=/4140
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: thanking contributors

2015-07-04 Thread Richard Eckart de Castilho
Cool ;) 

Thanks,

-- Richard

On 04.07.2015, at 13:33, Daniel Naber daniel.na...@languagetool.org wrote:

 On 2015-07-04 12:40, Richard Eckart de Castilho wrote:
 
 Hi Richard,
 
 Not sure if you consider it worth mentioning because it's not a code
 contribution...
 
 sure your contribution is worth mentioning, I've just added you to the 
 page.
 
 Regards
  Daniel


--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Using result from AnalyzedToken.getLemma()

2015-05-12 Thread Richard Eckart de Castilho
Hi all,

we're using parts of LanguageTool to realize a simple lemmatizer.
Basically, we use lang.getTagger().tag(tokenText) to get readings
and then extract the lemma information from there.

For some wordforms, the lemma appears to contain some structuring, e.g.
besitzt becomes [be]sitzen (the brackets are actually in the string
returned by getLemma).

Are there definite rules for this structure encoding in LanguageTool?
Is there some helper method to strip it from the lemma and get only
the raw lemma?

Cheers,

-- Richard

--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Using result from AnalyzedToken.getLemma()

2015-05-12 Thread Richard Eckart de Castilho
Ok, I guess that means we're on our own regarding how to clean these lemmas.

Thanks for the quick answer!

-- Richard

On 12.05.2015, at 10:58, Daniel Naber daniel.na...@languagetool.org wrote:

 On 2015-05-12 10:27, Richard Eckart de Castilho wrote:
 
 For some wordforms, the lemma appears to contain some structuring, e.g.
 besitzt becomes [be]sitzen (the brackets are actually in the string
 returned by getLemma).
 
 This is the original data as exported from Morphy:
 
 http://www.wolfganglezius.de/doku.php?id=cl:morphy
 
 While the tags are documented, I couldn't find any documentation for 
 these cases.
 
 Regards
  Daniel


--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Using daily builds in Maven

2014-10-03 Thread Richard Eckart de Castilho
You have a continuous integration server, right? Couldn't you configure
that to deploy snapshot builds to the Sonatype OSS Snapshot repository?

Cheers,

-- Richard

On 03.10.2014, at 13:26, Daniel Naber daniel.na...@languagetool.org wrote:

 On 2014-10-03 12:32, Robin Dunn wrote:
 
 Currently it seems only the major released versions of LT are
 available from the Maven public repository e.g. the latest 2.7, but
 the daily builds are not in the public repository?
 
 Yes, Maven Central is only for releases and we don't have set up our own 
 repo to host daily builds. If this is for a local machine, the easiest 
 thing is to build LT yourself: update the source from git and run mvn 
 install -DskipTests. This will install all artifacts into your local 
 repository.
 
 Regards
  Daniel


--
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311iu=/4140/ostg.clktrk
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: LO 4.2 as 64bit for Mac

2014-02-04 Thread Richard Eckart de Castilho
Just upgraded to LO 4.2 final 64bit. I didn't have to reinstall LT - looks like
the plugin was stored in my user profile. Anyway, the plugin continues to work
with the LO 4.2 final 64bit version.

I should mention that I'm running OpenJDK 1.8 early access right now. It may be
useful if anybody could double-check with Oracle JDK 1.7.

Cheers,

-- Richard

On 04.02.2014, at 11:25, Richard Eckart de Castilho richard.eck...@gmail.com 
wrote:

 I tried LT with the LO 4.2 beta 64bit for Mac. It looks like it works.
 
 Eventually I didn't end up using it for my work, because during grammar
 checking, the document doesn't mark the problematic spans in the document.
 The sentence is only shown in the correction dialog.
 
 So, yes - it should solve the OS X problem. I didn't try with LO 4.2
 final though.
 
 Cheers,
 
 -- Richard
 
 On 03.02.2014, at 19:55, Daniel Naber daniel.na...@languagetool.org wrote:
 
 Hi,
 
 the newly released LibreOffice 4.2 now has a 64bit version for Mac. Can 
 anybody confirm that this solves the 64/32 bit problem that LT/LO had on 
 Mac? By confirm I mean that you should actually have tried it...
 
 http://www.libreoffice.org/download
 
 Regards
 Daniel


--
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231iu=/4140/ostg.clktrk
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: How to use the lemmatizer

2014-01-29 Thread Richard Eckart de Castilho
Hi,

thanks for the feedback. I'm really only interested in getting a convenient 
access to the dictionaries, so that I can use them for lemmatization. For this 
particular task, I'm not using any other functionality from LanguageTool, 
including grammatical rules.

So here is what I do:

- run a probabilistic POS tagger
- feed the tokens of my text to the languagetool tagger to get all dictionary 
entries
- find a match between the POS tag created by the probabilistic tagger and the 
returned dictionary entries
- if there is a match, use the respective lemma

Matching is the most annoying part, because the tagset used by the 
probabilistic tagger may not be the same as the one used in the LanguageTool 
dictionary. So now try three matching approaches:

- checking if the POS tag from tagger and the one from dictionary are exactly 
the same?
- checking if the POS tag from the tagger is the same as the first element of 
the dictionary tag (splitting by ':')
- using mapping tables to map both, the tag from the POS tagger and the tag 
from the dictionary, to a coarse-grained scheme of word classes and see if they 
match there

Seems to work quite ok.

Cheers,

-- Richard

On 27.01.2014, at 22:38, Marcin Miłkowski list-addr...@wp.pl wrote:

 Hello,
 
 W dniu 2014-01-27 15:44, Richard Eckart de Castilho pisze:
 Hello everybody,
 
 I may be totally wrong, but I believe the lemmatizers in LanguageTool are 
 implemented based on dictionaries. I suppose a dictionary entry would be 
 made up of a form, a lemma, and a pos tag.
 
 Assuming this is correct, is there a simple way to do a lookup in such a 
 dictionary?
 
 Also, is there a way to find out which tagsets are used by these 
 dictionaries (or maybe there is even some standard in LanguageTool, e.g. 
 verbs are always V and nouns are always N or something like that)?
 
 I would like a method that accepts an inflected form and a pos tag and that 
 returns a single lemma.
 
 
 Currently, I am doing this, but it seems a bit awkward.
 
 ListAnalyzedTokenReadings rawTaggedTokens = 
 lang.getTagger().tag(tokenText);
 AnalyzedSentence as = new AnalyzedSentence(
   rawTaggedTokens.toArray(new 
 AnalyzedTokenReadings[rawTaggedTokens.size()]));
 as = lang.getDisambiguator().disambiguate(as);
 String best = getMostFrequentLemma(as.getTokens()[i]);
 
 In particular, I would like to use a different POS tagger. I have various 
 statistical POS taggers at my disposal that produce a single POS per token - 
 and that is what I want. The LanguageTool POS tagger produces multiple 
 unranked POS tags per token.
 
 Beware that statistical POS taggers will necessarily obfuscate 
 non-grammatical material, as they try to guess the correct tags. This 
 makes them quite useless for writing rules. We've been there, tried 
 that. I haven't yet found a decent English POS tagger, for example, that 
 would be useful.
 
 Note however that if you have frequency info, you can add it to your 
 tagger dictionary. And we indeed can do so using typing frequency lists, 
 so you'd be able to assign the most frequent lemma if you need, I guess. 
 The procedure is described here:
 
 http://wiki.languagetool.org/hunspell-support
 
 See under including frequency data.
 
 Regards,
 Marcin


--
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


How to use the lemmatizer

2014-01-27 Thread Richard Eckart de Castilho
Hello everybody,

I may be totally wrong, but I believe the lemmatizers in LanguageTool are 
implemented based on dictionaries. I suppose a dictionary entry would be made 
up of a form, a lemma, and a pos tag.

Assuming this is correct, is there a simple way to do a lookup in such a 
dictionary? 

Also, is there a way to find out which tagsets are used by these dictionaries 
(or maybe there is even some standard in LanguageTool, e.g. verbs are always V 
and nouns are always N or something like that)?

I would like a method that accepts an inflected form and a pos tag and that 
returns a single lemma.


Currently, I am doing this, but it seems a bit awkward.

ListAnalyzedTokenReadings rawTaggedTokens = lang.getTagger().tag(tokenText);
AnalyzedSentence as = new AnalyzedSentence(
  rawTaggedTokens.toArray(new AnalyzedTokenReadings[rawTaggedTokens.size()]));
as = lang.getDisambiguator().disambiguate(as);
String best = getMostFrequentLemma(as.getTokens()[i]);

In particular, I would like to use a different POS tagger. I have various 
statistical POS taggers at my disposal that produce a single POS per token - 
and that is what I want. The LanguageTool POS tagger produces multiple unranked 
POS tags per token.

Cheers,

-- Richard
--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments  Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


CoGrOO vs LanguageTool?

2014-01-20 Thread Richard Eckart de Castilho
Hi,

I recently stumbled across CoGrOO [1]. It appears to offer similar 
functionality as LanguageTool for Brazilian Portuguese only, but being based 
mostly on statistical models rather than rules.

Did anybody here ever try that or does anybody have an opinion on the project?
Or did anybody possibly compare it to LanguageTool?

Cheers,

-- Richard

[1] http://cogroo.sourceforge.net
--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments  Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: OpenNLP POS tags

2013-11-12 Thread Richard Eckart de Castilho
I think it would rather trigger false positives (good text mistakenly
identified as being bad). If there are too many of these, then the rule
needs to be revised, e.g. by saying in general XXX is not good except
when Y and Z is true). Of course, tools like LT are not perfect. E.g. the
grammar correction of my mail client (not LT) complains about many words
in this mail (mostly because it doesn't like me doing explicit line
breaks :P )

-- Richard

On 12.11.2013, at 22:38, Nina Nina ninacoder2...@gmail.com wrote:

 Right. 
 Also, on the other hand it may trigger False Negative errors, I think.
 
 
 On Tue, Nov 12, 2013 at 4:34 PM, Richard Eckart de Castilho 
 richard.eck...@gmail.com wrote:
 I believe the idea in LT is not that you work with all possible POS tags.
 
 Let's say you want to write a rule like do not have consecutive verbs
 (not a V followed by a V). If you write I like can of coke, then this
 rule would fire because can could be a V. It could also be an N, but in
 your rule, you do not have to worry about this.
 
 Right?
 
 -- Richard


--
DreamFactory - Open Source REST  JSON Services for HTML5  Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471iu=/4140/ostg.clktrk
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Modules for individual supported languages?

2013-10-04 Thread Richard Eckart de Castilho
On 04.10.2013, at 10:03, Marcin Miłkowski list-addr...@wp.pl wrote:

 The only remaining problem is that for bilingual rules, we really need 
 some mechanisms to communicate between the modules, and to download 
 modules on the fly. Office users don't use that, but for CheckMate 
 (translation QA) that could be a problem.

Downloading on the fly could be solved by hooking into the resource
loader mechanism that has been suggested elsewhere. In fact, I that
would be implemented, I was thinking of using it for enabling exactly
that. In DKPro Core, we enabled many of the language analysis modules
to automatically download their models from a Maven repository, but for
LanguageTool, we currently still bundle the whole bunch because this
loader mechanism is lacking.

-- Richard
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Chunker interface added

2013-08-24 Thread Richard Eckart de Castilho
Are you going to build a chunker from scratch or rely on existing
technology, e.g. the OpenNLP Chunker [1]?

Cheers,

-- Richard

[1] 
http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.chunker

Am 24.08.2013 um 18:26 schrieb Daniel Naber list2...@danielnaber.de:

 On 2013-08-24 16:28, R.J. Baars wrote:
 
 This is very promising. I would like to know more about this.
 
 Nothing has been decided yet - it will take some time before I have a 
 working version for English, then we'll see how this can be applied to 
 other languages.
 
 Regards
  Daniel


--
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511iu=/4140/ostg.clktrk
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Chunker interface added

2013-08-24 Thread Richard Eckart de Castilho
Am 24.08.2013 um 21:03 schrieb Daniel Naber list2...@danielnaber.de:

 On 2013-08-24 20:28, Richard Eckart de Castilho wrote:
 
 Are you going to build a chunker from scratch or rely on existing
 technology, e.g. the OpenNLP Chunker [1]?
 
 I'll use the one from OpenNLP for now. It's kind of a black box for us, 
 so I'm not sure yet how to handle those cases where OpenNLP gets it 
 wrong. Any ideas about that?

I'm not familiar with its details, but given that it can be trained, it
would probably be a good solution to start building a corpus of those sentences
it gets wrong and retrain every once in a while with the original corpus plus
the manually corrected samples. 

-- Richard
--
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511iu=/4140/ostg.clktrk
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: some statistics for July

2013-07-31 Thread Richard Eckart de Castilho
Hi Daniel

you can also see some statistics from Maven Central by logging in 
to the Sonatype OSS Nexus and checkout out the statistics for your
group ID there. Mind due to intensive caching of Maven artifacts,
these should be taken with a grain of salt.

Cheers,

-- Richard

Am 31.07.2013 um 11:08 schrieb Daniel Naber list2...@danielnaber.de:

 Hi,
 
 here are some statistics I collected for July:
 
 * The LanguageTool add-on for LibreOffice/OpenOffice has been 
 downloaded 11,000 times.
 
 * The LanguageTool stand-alone version has been downloaded 2,800 times.
 
 * Our public HTTP API received about 114,000 requests. This includes 
 other developers using the API as well as users at languagetool.org who 
 use the online forms.
 
 * The Firefox add-on (LanguageToolFx) now has more than 2,600 users, 
 but downloads have really taken off in the last days: 
 https://addons.mozilla.org/en-US/firefox/addon/languagetoolfx/statistics/downloads/?last=90.
  
 I think this started with better linking from Catalan pages, but you 
 can see in the graphs that other languages profit, too.
 
 * We had about 30,000 visits to languagetool.org - this is twice as 
 much as one year ago.
 
 Regards
  Daniel


--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: releasing beta release

2013-03-03 Thread Richard Eckart de Castilho
Hi,

if unless there is a strong reason for you to deploy this beta to Maven Central,
I'd suggest to stop the process at the point where you would promote the staging
repository. This is the last step which moves the release from the Sonatype 
Nexus
to Maven Central. All automatic validation steps are done before that, when the
repository is closed.

To test the staged release, in some test project, you can add the staging 
repository
to the POM of your test project.

Most artifacts I have seen that use a beta qualified include a - after the 
beta:
2.1-beta-1

Cheers,

-- Richard

Am 03.03.2013 um 18:47 schrieb Daniel Naber list2...@danielnaber.de:

 Hi,
 
 FYI, I'll now try to release 2.1-beta1. Not so much because we need a beta 
 release, but to become familiar with the Maven-based release process. We'll 
 see what happens... This may not even end up on languagetool.org, but only 
 on Maven Central (the server that keeps all Maven artifacts).
 
 Regards
 Daniel

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: releasing beta release

2013-03-03 Thread Richard Eckart de Castilho
Am 03.03.2013 um 20:24 schrieb Daniel Naber list2...@danielnaber.de:

 On 03.03.2013, 18:52:48 Richard Eckart de Castilho wrote:
 
 if unless there is a strong reason for you to deploy this beta to Maven
 Central, I'd suggest to stop the process at the point where you would
 promote the staging repository.
 
 Okay, I will do that. 
 
 How do you make a release without being asked for the password again and 
 again? I already tried everything I could find on the web, including 
 -Dgpg.passphrase=... -Darguments=-Dgpg.passphrase=... and setting a profile 
 in my settings.xml.
 
 As we have 30 modules I don't want to re-enter the password that often.

Very good question. I didn't investigate in detail but I noticed that
I have some projects that keep on asking for passwords and others that
do not. I think it's a matter of the version of the signing plugin being
used. 

You could try running

mvn versions:display-plugin-updates

and add the latest version of the signing plugin to the pluginManagement
section in your aggregator POM. You might also want to take the opportunity
to (upgrade and) fix the versions of all other plugins involved in your build.

Otherwise, setting up and running a gpg-agent might help, but I never tried 
that.

Cheers,

-- Richard
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: releasing beta release

2013-03-03 Thread Richard Eckart de Castilho
Am 03.03.2013 um 23:06 schrieb Daniel Naber list2...@danielnaber.de:

 On 03.03.2013, 20:51:30 Richard Eckart de Castilho wrote:
 
 Otherwise, setting up and running a gpg-agent might help, but I never
 tried that.
 
 That's what worked in the end.
 
 The version is now at
 https://oss.sonatype.org/content/repositories/orglanguagetool-520/org/languagetool/,
 everybody feel free to test it. Actually I deleted the standalone module, 
 because I think publishing that as a Maven artifact doesn't seem to make 
 sense, does it? So this is mostly relevant to developers.
 
 Oh, I just see that languagetool-parent still references standalone (as a 
 module, not as a dependency). Is that a problem?

I cannot imagine why. Once the artifacts have been deployed, the module 
structure
is no longer relevant.

Cheers,

-- Richard
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: switching to Maven - done!

2013-02-08 Thread Richard Eckart de Castilho
Am 09.02.2013 um 01:50 schrieb Dominique Pellé dominique.pe...@gmail.com:

 Richard Eckart de Castilho wrote:
  
 Without having had a look at the build, I would expect at least two things to 
 cause: 
 1) Maven (like ant) is a Java application and it takes a moment to fire up 
 the JVM.
 make is a native application. 2) The package goal always runs the full
 packaging (building of ZIPs and JARs from the compiled sources). So even
 if the compile is up-to-date, doing the packaging takes a moment. If further
 plugins, e.g. JavaDoc, have been activated during normal builds, they may
 further slow down the build.
 
 Understood for the overhead of running the JVM (multiple times?)
 during the build.

As far as I know, the JVM is started once for the main build and may be started 
again for running tests, so that tests are well isolated in their own JVM. 
Depending on the configuration of the surefire plugin, the JVM may be started 
more than once (see forkMode parameter).

 But why would Maven mvn package always re-create the ZIPs and JARs
 when nothing has changed? There might be a good reason, but at first sight
 it seems like a waste of time. Shouldn't a build system try to do the least
 amount of work and rebuild only the targets for which at least one of their
 dependencies has changed, based on a DAG (Directed Acyclic Graph)
 of target/dependencies?

For Java classes, the tool knows the dependencies between files and can avoid 
to recompile files. As far as I know, for artifacts, Maven does not know or 
maintain a record of what files go into the artifacts. It may also be that 
zip/jar archives are not necessarily the best file format for incremental 
updates and need to be rewritten from scratch every time they are changed.

 Running mvn compile when nothing has changed is faster than 
 mvn package but not really fast either.  mvn compile takes 6.5 sec
 on my laptop when nothing has changed (nothing to compile), which is
 presumably much more than what the JVM needs to initialize when
 launching mvn.

I'm probably also spoilt by Eclipse's incremental compile and by Jenkins. In 
both cases, I do run into these problems. In cases where I do run Maven on the 
command line, I probably got used to the  build times. Anyway, firing up the 
JVM is likely much less than 6.5s, but then the application is not initialized. 
I am pretty sure that Maven performs some heavy self-configuration during 
startup, checking that its modules are available, wiring them all together 
internally and so on. Maven itself is build in a highly modular way. I would 
expect this initialization is taking a major part of the startup time.

 I was also searching for parallel builds in the hope of speeding up
 (something like make -j4 with GNU make). I found this...
 
 http://stackoverflow.com/questions/581465/maven-how-to-do-parallel-builds
 
 ... but it does not work.

I never was successful with that, but I had tried it on a way more complex 
build. You might want to update to the version 2.6 of the resources plugin and 
see if that supports parallel builds.

Cheers,

-- Richard
--
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: switching to Maven - done!

2013-01-23 Thread Richard Eckart de Castilho
Great news!

 2. Make sure maven is installed. Type mvn -version on the command line. 
 If Maven returns its version and the version is = 3.0.2, everything is 
 okay. If not, follow the instructions on 
 http://maven.apache.org/guides/getting-started/maven-in-five-minutes.html 
 (ignoring the Creating a Project part). As a Linux user, Maven should be 
 provided by your distribution.

You can add a requirement for a minimum Maven version to the pom:

  prerequisites
maven3.0.2/maven
  /prerequisites

http://maven.apache.org/pom.html#Prerequisites

 3. Check out the new code:
 
 svn co 
 https://languagetool.svn.sourceforge.net/svnroot/languagetool/trunk/languagetool

You may find it more comfortable to reverse this and put trunk tags and 
branches under languagetool:

  
https://languagetool.svn.sourceforge.net/svnroot/languagetool/languagetool/trunk

Probably the maven release plugin can detect the right location to create 
release tags even with your current layout without further configuration, but 
maybe not.

 Your IDE can probably import them all at once if you import the top 
 pom.xml

In case of Eclipse and m2e, mind that the submodules will appear twice(!) in 
the project explorer, once as project and then again as folders within the 
project that resembles the top-level Maven folder. It causes some odd effects 
like search finding files twice. You should get used to it after a while.

A recommendation for adding new modules in Eclipse:

1) open top-level POM
2) create module
3) close the project that was created for the module
4) delete the project from the project explorer (not from disk)
5) go to the top-level project and add the folder of the new module to SVN
6) revert .settings, .project, target, .classpath
7) add .settings, .project, target, .classpath to svn:ignore
8) commit the top-level project (the pom.xml and the module folder)
9) use Import-Existing Maven projects on the top-level project to add the 
module back to the explorer

We found this was the best method to get Eclipse to recognize that the new 
module is in SVN and it causes least confusion to Eclipse's workspace metadata. 

 Instead of ant test, now run mvn clean test before you commit a change. 
 If you go to the sub-projects and type mvn clean test there, only that 
 project will be tested. This is useful for the language modules, as only 
 that language and its rules will be tested, which if quite fast. Be careful 
 though, as the language module depends on the core module and you might get 
 strange errors if the core module has changed and you haven't built **and 
 installed** it yet. To install a module, use mvn clean install. This will 
 compile and test the module(s) and install the result (the *.jar files) in 
 your local Maven repository (~/.m2/repository under Linux). When in doubt, 
 run mvn clean test in the top directory.

If you use Eclipse and run the test goal within Eclipse, checking resolve 
workspace artifacts in the run configuration should allow the tests to pick up 
changes in the core module without doing an install before.

A good next step would be to add the Sonatype OSS pom as a parent pom to the 
top-level languagetool pom and then try to create a release candidate.

Cheers,

-- Richard


--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] Word form dictionary for German

2012-11-03 Thread Richard Eckart de Castilho
Thank you all for your pointers. I found the data and documentation of its tag 
set now. It is also nice to see that the conversion to the FSA is pretty 
straight forward.

Regards,

-- Richard

Am 03.11.2012 um 08:34 schrieb Dominique Pellé dominique.pe...@gmail.com:

 Richard Eckart de Castilho wrote:
 
 Hi,
 
 I noticed today that the german.dict file in LanguageTool is a binary file, 
 I suppose created with Morplogik. Is the original data and the conversion 
 script available somewhere?
 
 Best,
 
 -- Richard
 
 I don't know how the German dictionary is created, but
 I created scripts to create the French and Breton POS tag
 dictionaries. They are available in SVN. Perhaps that can be
 useful to you:
 
 src/main/resources/org/languagetool/resource/fr/create-lexicon.sh
 src/main/resources/org/languagetool/resource/br/create-lexicon.pl
 
 I'll add scripts to create the spelling dictionaries... once I figure
 out how to create them.
 
 Regards
 
 PS: I just added the script for the French dictionary in SVN
 so you may need to checkout the latest code from SVN.
 
 -- Dominique


--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


[Languagetool] LanguageTool 1.9 is on Maven Central

2012-10-10 Thread Richard Eckart de Castilho
Hello folks,

LanguageTool 1.9 is now available from Maven Central:

http://search.maven.org/#search%7Cga%7C1%7Clanguagetool

The artifact does not contain the dev or OpenOffice parts, only the core and 
the language resources. This is fine for embedding LT 1.9 in Java applications.

In a few days, I will also release DKPro Core 1.4.0 with LT 1.9 to Maven 
Central (finally!). In case you wonder, DKPro Core is a collection of 
components for natural language
processing built for the Apache UIMA framework. It uses LT in particular for 
grammar checking. 

Thank you for your support!

Best,

-- Richard
--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] LanguageTool 1.9 staged to Sonatype repository

2012-10-06 Thread Richard Eckart de Castilho
I'd like to make you aware of a slightly confusing behavior.

I did a small check to navigating into the LT 1.9 sources from within Eclipse 
with default
workspace settings on OS X.

The class WordTokenizer has a pattern with odd characters:

+ ,.;()[]{}!?:/|\\\'«»„”“`´‘’‛′…¿¡\t\n\r, 
true);

These are actually UTF-8 characters, but the default encoding setting in 
Eclipse on OS X is
MacRoman.

In the actual LT 1.9 project in my workspace, this is not a problem, because 
Eclipse picks up the encoding settings from the LT 1.9 Maven POM. But Eclipse 
obviously uses the default workspace encoding to display code from 
automatically downloaded source jars.

To fix this, just set the default encoding of your workspace to UTF-8 and all 
is fine.

Best,

-- Richard



Am 04.10.2012 um 19:16 schrieb Daniel Naber list2...@danielnaber.de:

 On 04.10.2012, 17:56:09 Richard Eckart de Castilho wrote:
 
https://oss.sonatype.org/content/repositories/central_bundles-216
 
 If you care, please check if you see any issues with the artifacts.
 
 Thanks! I created a small local project, added the repository and LT 
 dependency to the pom and then created a small program that uses the LT 
 API. Everything worked fine.
 
 Regards
 Daniel


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] Planning move to Maven

2012-10-03 Thread Richard Eckart de Castilho
Hi,

a good path to Maven Central is via the Sonatype OSS repository. This requires 
a LT maintainer to obtain a Jira account from Sonatype and open an issue 
requesting a Maven Central sync for the project. I'll already prep the POM for 
submission via Sonatype OSS.

The process is documented here:

https://docs.sonatype.org/display/Repository/Sonatype+OSS+Maven+Repository+Usage+Guide

This applies to the LT 2.0 release, in particular when being released using the 
Maven Release Plugin. 
LT 1.9 will be a third-party deploy via Sonatype and go a slightly different 
route. I'm almost done with getting the dependencies out.

Cheers,

-- Richard

Am 16.09.2012 um 17:48 schrieb Daniel Naber list2...@danielnaber.de:

 On 16.09.2012, 21:13:38 Nathan Wells wrote:
 
 I think it will be great to have smaller downloads for each language.
 Will this also translate to separate files for LibreOffice/OpenOffice
 extensions to keep them small as well?
 
 Not in the first step I guess, but once we're more modular that should 
 become easier to implement.
 
 Regards
 Daniel


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


[Languagetool] LT 1.9 deploy to Maven Central almost ready

2012-10-03 Thread Richard Eckart de Castilho
Hi there,

I have just done a release of ictclas4j 1.0.1 (with Daniel's patch) and decided 
that lucene-gosen-ipadic-1.2.1 is good enough to go to Central. Even though we 
have some test case failures with lucene-gosen, the do not seem to be critical.

Both dependencies are now pending synchronization to Maven Central. Once they 
are through, LT 1.9 can be uploaded.

It is possible to add a list of developers to the Maven POM. Can you point me 
to a list of active developers on the project or just reply with a list of 
names to this mail, so I can add them? If this is not desired, I'll try to get 
the upload done without the developers in it.

-- Richard
--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] Creating a Java rule?

2012-10-03 Thread Richard Eckart de Castilho
Hi,

do you make a difference between português do Brasil and português de Portugal 
in LT? In my experience, vossa is practically phased out of the spoken 
brazilian portuguese and only used rarely e.g. in religious ceremonies. People 
tend to give me a funny smile when I use vossa instead of de vocês.

Cheers,

-- Richard

Am 03.10.2012 um 00:04 schrieb Marco A.G.Pinto marcoagpi...@mail.telepac.pt:

 Daniel suggested me to post this to the mailing list.
 
 
  Original Message 
 Subject:  Creating a Java rule?
 Date: Tue, 02 Oct 2012 12:48:58 +0100
 From: Marco A.G.Pinto marcoagpi...@mail.telepac.pt
 Reply-To: marcoagpi...@mail.telepac.pt
 To:   Daniel Naber (LanguageTool) na...@danielnaber.de, Juan Martorell 
 (LanguageTool) juan.martor...@gmail.com, Marcin Milkowski (LanguageTool) 
 marcin.milkow...@gmail.com
 
 Hello!
 
 Last night I was on Facebook reading a Brazilian post and I noticed a common 
 mistake in the grammar.
 
 They had written:
  (...) a opinião de vocês (...) 
 
 The correct would be:
  (...) a vossa opinião (...)
 
 In simple words, it would detect:
 1) a
 2) ANYWORD (in this case opinião)
 3) de vocês (to be replaced with vossa)
 
 How can I code this into LanguageTool and make the example show the ANYWORD 
 according to the word used in the sentence?
 
 Thanks!
 
 Kind regards,
Marco A.G.Pinto
  ---
 
 -- 
 emails_signature2012d.png


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


[Languagetool] lucene-gosen-ipadic 1.2.1 for LanguageTool 1.9

2012-10-02 Thread Richard Eckart de Castilho
Hi there,

lucene-gosen-ipadic is a new dependency of LT 1.9. I have prepared a Maven POM 
for it already and I would be ready to upload it. I am a bit reluctant though, 
because some test cases of lucene-gosen are failing.

Could somebody please check out 

http://lucene-gosen.googlecode.com/svn/tags/rel-1.2.1

and try to build it and run the tests?

I had to update the URL for the ipadic in dictionary/ipadic.properties

dic.home=http://chasen.naist.jp/stable/ipadic/ 

The tests that fail for me are:

TestJapaneseTokenizer.testDecomposition3: term 3 expected:マシュー[] but 
was:マシュー[・ホプキンス]
TestJapaneseTokenizer.testTwoSentences: term 3 expected:マシュー[] but 
was:マシュー[・ホプキンス]
BasicDecompositionTest.testDecomposition3: expected 7 but as 5
BasicDecompositionTest.testDifferentDictionary02: Not expected 
exception. java.lang.AssertionError

I have already uploaded the lucene-gosen-ipadic that I built to the UKP Maven 
Repository. If they have a problem, I can still remove them from there or 
update them, but that is not possible when they are on Maven Central.


https://zoidberg.ukp.informatik.tu-darmstadt.de/artifactory/webapp/search/artifact?q=lucene-gosen-ipadic

Best,

-- Richard


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


[Languagetool] Failing test case in LT 1.9: GermanTaggerTest.testTagger

2012-10-02 Thread Richard Eckart de Castilho
Hi there,

while preparing the POM for LT 1.9, I found that I get this test case failure 
in Eclipse and with Maven on the command line:

Failed tests: testTagger(org.languagetool.tagging.de.GermanTaggerTest): 
null expected:…be[Lieblingsbuchstab[e/SUB:NOM]:SIN:MAS] but 
was:...be[Lieblingsbuchstab[/SUB:DAT]:SIN:MAS]

Do I have a setup problem here or is this a test known to fail?

Best,

-- Richard


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] are our libraries up-to-date?

2012-10-02 Thread Richard Eckart de Castilho
I've locally set up a POM for LT 1.9 which draws in all morfologik in version 
1.5.4 from Maven Central. Seems to work well.

Best,

-- Richard

Am 01.10.2012 um 10:57 schrieb Marcin Miłkowski list-addr...@wp.pl:

 W dniu 2012-10-01 00:09, Daniel Naber pisze:
 On 30.09.2012, 11:31:54 Marcin Miłkowski wrote:
 
 I'm not sure about morfologik* libraries. There might be an unreleasedÂ
 version in our code (I fixed a bug with UTF-8 but 1.5.4 was not releasedÂ
 yet).
 
 Could we release the Maven version of LT with 1.5.3 then (with a different
 version then, not 1.9)? Or could you just release morfologik 1.5.4 on
 Maven?
 
 OK, the release will be made today. It should get to Maven Central in 
 6-8 hrs (they are delayed recently).
 
 Best,
 Marcin


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] lucene-gosen-ipadic 1.2.1 for LanguageTool 1.9

2012-10-02 Thread Richard Eckart de Castilho
I tried in Eclipse 4.2 (with whatever ant version comes with it) as well as on 
the command line using Ant 1.8.2 on OS X with Apple JDK 1.6. Except the wrong 
URL and the failing test cases, there seemed to be no issues.

I have a connection.csv in

./dictionary/ipadic/connection.csv
./dictionary/naist-chasen/connection.csv

I did run the build.xml in the dictionary folder to get the dictionaries 
though. Once with a simple ant which gets the ipadic and once with:

ant -Ddictype=naist-chasen

Btw. lucene-gosen 1.2.1 is old, there is a new 2.0.2 version.

-- Richard

Am 02.10.2012 um 19:56 schrieb Daniel Naber list2...@danielnaber.de:

 On 02.10.2012, 19:22:58 Richard Eckart de Castilho wrote:
 
 Hi Richard,
 
 and try to build it and run the tests?
 
 how exactly do you build it? ant complains about a missing connection.csv 
 here.
 
 I have already uploaded the lucene-gosen-ipadic that I built to the UKP
 Maven Repository. If they have a problem, I can still remove them from
 there or update them, but that is not possible when they are on Maven
 Central.
 
 Your build works with LT, at least the tests don't fail. I'm not sure if 
 there's someone on this list who can give a more informed review.
 
 Regards
 Daniel


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] Failing test case in LT 1.9: GermanTaggerTest.testTagger

2012-10-02 Thread Richard Eckart de Castilho
Nope, I wasn't. Now I do and the tests are all green.

Cheers,

-- Richard

Am 02.10.2012 um 20:01 schrieb Daniel Naber list2...@danielnaber.de:

 On 02.10.2012, 19:32:28 Richard Eckart de Castilho wrote:
 
  Failed tests: testTagger(org.languagetool.tagging.de.GermanTaggerTest):
 null expected:…be[Lieblingsbuchstab[e/SUB:NOM]:SIN:MAS] but
 was:...be[Lieblingsbuchstab[/SUB:DAT]:SIN:MAS]
 
 Do I have a setup problem here or is this a test known to fail?
 
 Are you using jwordsplitter 3.4, which was just released a few days ago?
 
 Regards
 Daniel


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] path changes in SVN

2012-09-02 Thread Richard Eckart de Castilho
Hello Daniel,

that won't really make too much of a difference. I've set up a POM which takes 
into account the current project structure.

A better situation for Maven compatibility would split the project into three 
code modules:

- core
- dev
- openoffice

and possibly a number of language resource modules.

Maybe you want to start by having a look at the pom I crafted.

-- Richard

Am 02.09.2012 um 18:00 schrieb Daniel Naber:

 Hi,
 
 to become a bit more Maven compatible, I'm going to move directories in SVN 
 today:
 
 src/test will become src/test/java
 src/java will become src/main/java
 src/dev will become src/main/dev
 
 src/rules will become src/main/resources/rules
 src/resource will become src/main/resources/resource
 
 I will try to do that at 19:00 CET. You might want to commit any local 
 changes before that or create a patch, as I'm not sure if merging in your 
 changes will work when the paths change.
 
 The long-term goal is to build LT with mvn and to also host it in Maven 
 central. That will of course mean a different build process, and I don't 
 know yet how difficult that will be to implement. Anyway, I'll try to do that 
 step-by-step, keeping everything working all the time.
 
 Regards
 Daniel
 
 -- 
 http://www.danielnaber.de
 
 
 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and 
 threat landscape has changed and how IT managers can respond. Discussions 
 will include endpoint security, mobile security and the latest in malware 
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Languagetool-devel mailing list
 Languagetool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/languagetool-devel


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] path changes in SVN

2012-09-02 Thread Richard Eckart de Castilho
I have added a POM for the restructured trunk to the bug as well now. Please 
mind that Eclipse can be quite confused if you use the m2e plugin to switch the 
project to a Maven project, because there is already Eclipse metadata present 
in the source code repository. On the command line, everything should be mostly 
fine - some tests fail in 1.8 and some more fail in trunk.

There is a spelling error in the package of morfologik-speller in the 
org.carrot artifact (it's morflogik instead of morfologik), so I had to 
actually change the imports in LanguageTool to match that before the stuff 
would compile.

Cheers,

-- Richard

Am 02.09.2012 um 19:52 schrieb Richard Eckart de Castilho:

 Hello Daniel,
 
 that won't really make too much of a difference. I've set up a POM which
 takes into account the current project structure.
 
 I know it doesn't help with your short-term goal, but when we build LT with 
 mvn, I think we should use the standard directory layout.
 
 That's a good idea. However, the standard layout only includes src/main/java 
 and src/main/test. I think dev should best go to it's own module. Maybe you 
 are already planning that. Risking that I repeat other things you already 
 though about, I'll just mention what else I noticed when I build the POM:
 
 -  the i18n properties files are in the regular source folder. In the 
 standard Maven layout, they should to go to src/main/resources. 
 - I think that rules and resource would best be kept somewhere under a 
 org/langaugetool package to avoid any potential conflict with other 
 artifacts.
 
 Maybe you want to start by having a look at the pom I crafted.
 
 Could you send the URL again? I think the one in your artifact repo looked 
 almost empty, i.e. there were no dependencies... maybe I just looked at the 
 wrong place.
 
 I am not sure where you were looking. I attached a POM for 1.8 to this issue
 
   
 https://sourceforge.net/tracker/index.php?func=detailaid=3564184group_id=110216atid=655717
 
 I'll also add one for trunk now to the same issue.
 
 Probably I'll go on with trying to get cjftransform and ictclas4j to Maven 
 Central next.
 
 Cheers,
 
 -- Richard


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] path changes in SVN

2012-09-02 Thread Richard Eckart de Castilho
Am 02.09.2012 um 21:37 schrieb Daniel Naber:

 On 02.09.2012, 18:00:41 Daniel Naber wrote:
 
 to become a bit more Maven compatible, I'm going to move directories in
 SVN  today:
 
 It would be nice if an Eclipse user could update the .profile and .classpath 
 files accordingly.

If you plan to move to Maven, I'd recommend not keeping Eclipse metadata in the 
repository.
That would only be useful for Eclipse users that do not use Maven. It would 
confuse things for
Eclipse users that are actually using Maven.

-- Richard


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel