Re: [Languagetool] info request

2012-10-02 Thread Mauro Condarelli
On domenica 30 settembre 2012 20:30:17, Marcin Miłkowski wrote:
 I would also like to know if it's possible to add further information to
 the words (e.g.: pointers to synonyms and antinonyms).
 This is not part-of-speech information, so it does not belong to the
 tagger. But you can add a separate dictionary, and you could even have
 Wordnet encoded as a finite-state machine for a very quick use
 (basically, you'd need to prepare a perfect hash fsa file for words in
 Italian, which is easy; and plan how to encode the Wordnet relationships
 in a graph whose nodes are hash numbers). But this requires some more
 coding.
 Agreed.
 Is there any information beyond what' in developing-a-tagger-dictionary
 wiki page?
 That is a recipe to build the compressed dicts, but it's not obvious how
 to reuse fsa to build something different.
 I will need to study it better.

 Ah, nobody used the dicts for such complex purposes (yet). That's why
 there's no detailed info about it. I'd need to think more to give more
 detailed specs. But overall, fsa dicts can be used for lots of purposes
 with very high performance.
I was unable to find generic documentation on morfologik-fsa package, I 
only found the javadoc API description (which does not give an overall 
picture) and a bunch of pages in Polish (which I can't read; I tried 
with google translate, but got nowhere).
Perhaps someone can point me in the right direction...

TiA
Mauro

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] info request

2012-10-02 Thread Marcin Miłkowski
W dniu 2012-10-02 12:56, Mauro Condarelli pisze:
 On domenica 30 settembre 2012 20:30:17, Marcin Miłkowski wrote:
 I would also like to know if it's possible to add further information to
 the words (e.g.: pointers to synonyms and antinonyms).
 This is not part-of-speech information, so it does not belong to the
 tagger. But you can add a separate dictionary, and you could even have
 Wordnet encoded as a finite-state machine for a very quick use
 (basically, you'd need to prepare a perfect hash fsa file for words in
 Italian, which is easy; and plan how to encode the Wordnet relationships
 in a graph whose nodes are hash numbers). But this requires some more
 coding.
 Agreed.
 Is there any information beyond what' in developing-a-tagger-dictionary
 wiki page?
 That is a recipe to build the compressed dicts, but it's not obvious how
 to reuse fsa to build something different.
 I will need to study it better.

 Ah, nobody used the dicts for such complex purposes (yet). That's why
 there's no detailed info about it. I'd need to think more to give more
 detailed specs. But overall, fsa dicts can be used for lots of purposes
 with very high performance.
 I was unable to find generic documentation on morfologik-fsa package, I
 only found the javadoc API description (which does not give an overall
 picture) and a bunch of pages in Polish (which I can't read; I tried
 with google translate, but got nowhere).
 Perhaps someone can point me in the right direction...

morfologik-fsa is based on Jan Daciuk's package; the papers describing 
it are here:

http://www.eti.pg.gda.pl/katedry/kiw/pracownicy/Jan.Daciuk/personal/fsa.html

There's some documentation on usage of fsa dictionaries there. But don't 
expect anything but scholarly papers about it. Probably a good textbook, 
such as Martin  Jurafsky is a good pointer:

http://tocs.ulb.tu-darmstadt.de/203636384.pdf

(chapter 2 and 3 introduce fsas)

Overall, fsa have one relationship between the nodes, which makes it 
slightly hard to encode all semantic relationships. Maybe it would be 
easier to use just a generic database of some sort.

Regards,
Marcin

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


[Languagetool] lucene-gosen-ipadic 1.2.1 for LanguageTool 1.9

2012-10-02 Thread Richard Eckart de Castilho
Hi there,

lucene-gosen-ipadic is a new dependency of LT 1.9. I have prepared a Maven POM 
for it already and I would be ready to upload it. I am a bit reluctant though, 
because some test cases of lucene-gosen are failing.

Could somebody please check out 

http://lucene-gosen.googlecode.com/svn/tags/rel-1.2.1

and try to build it and run the tests?

I had to update the URL for the ipadic in dictionary/ipadic.properties

dic.home=http://chasen.naist.jp/stable/ipadic/ 

The tests that fail for me are:

TestJapaneseTokenizer.testDecomposition3: term 3 expected:マシュー[] but 
was:マシュー[・ホプキンス]
TestJapaneseTokenizer.testTwoSentences: term 3 expected:マシュー[] but 
was:マシュー[・ホプキンス]
BasicDecompositionTest.testDecomposition3: expected 7 but as 5
BasicDecompositionTest.testDifferentDictionary02: Not expected 
exception. java.lang.AssertionError

I have already uploaded the lucene-gosen-ipadic that I built to the UKP Maven 
Repository. If they have a problem, I can still remove them from there or 
update them, but that is not possible when they are on Maven Central.


https://zoidberg.ukp.informatik.tu-darmstadt.de/artifactory/webapp/search/artifact?q=lucene-gosen-ipadic

Best,

-- Richard


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


[Languagetool] Failing test case in LT 1.9: GermanTaggerTest.testTagger

2012-10-02 Thread Richard Eckart de Castilho
Hi there,

while preparing the POM for LT 1.9, I found that I get this test case failure 
in Eclipse and with Maven on the command line:

Failed tests: testTagger(org.languagetool.tagging.de.GermanTaggerTest): 
null expected:…be[Lieblingsbuchstab[e/SUB:NOM]:SIN:MAS] but 
was:...be[Lieblingsbuchstab[/SUB:DAT]:SIN:MAS]

Do I have a setup problem here or is this a test known to fail?

Best,

-- Richard


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] are our libraries up-to-date?

2012-10-02 Thread Richard Eckart de Castilho
I've locally set up a POM for LT 1.9 which draws in all morfologik in version 
1.5.4 from Maven Central. Seems to work well.

Best,

-- Richard

Am 01.10.2012 um 10:57 schrieb Marcin Miłkowski list-addr...@wp.pl:

 W dniu 2012-10-01 00:09, Daniel Naber pisze:
 On 30.09.2012, 11:31:54 Marcin Miłkowski wrote:
 
 I'm not sure about morfologik* libraries. There might be an unreleasedÂ
 version in our code (I fixed a bug with UTF-8 but 1.5.4 was not releasedÂ
 yet).
 
 Could we release the Maven version of LT with 1.5.3 then (with a different
 version then, not 1.9)? Or could you just release morfologik 1.5.4 on
 Maven?
 
 OK, the release will be made today. It should get to Maven Central in 
 6-8 hrs (they are delayed recently).
 
 Best,
 Marcin


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] Failing test case in LT 1.9: GermanTaggerTest.testTagger

2012-10-02 Thread Daniel Naber
On 02.10.2012, 19:32:28 Richard Eckart de Castilho wrote:

   Failed tests: testTagger(org.languagetool.tagging.de.GermanTaggerTest):
 null expected:…be[Lieblingsbuchstab[e/SUB:NOM]:SIN:MAS] but
 was:...be[Lieblingsbuchstab[/SUB:DAT]:SIN:MAS]
 
 Do I have a setup problem here or is this a test known to fail?

Are you using jwordsplitter 3.4, which was just released a few days ago?

Regards
 Daniel

-- 
http://www.danielnaber.de


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] lucene-gosen-ipadic 1.2.1 for LanguageTool 1.9

2012-10-02 Thread Richard Eckart de Castilho
I tried in Eclipse 4.2 (with whatever ant version comes with it) as well as on 
the command line using Ant 1.8.2 on OS X with Apple JDK 1.6. Except the wrong 
URL and the failing test cases, there seemed to be no issues.

I have a connection.csv in

./dictionary/ipadic/connection.csv
./dictionary/naist-chasen/connection.csv

I did run the build.xml in the dictionary folder to get the dictionaries 
though. Once with a simple ant which gets the ipadic and once with:

ant -Ddictype=naist-chasen

Btw. lucene-gosen 1.2.1 is old, there is a new 2.0.2 version.

-- Richard

Am 02.10.2012 um 19:56 schrieb Daniel Naber list2...@danielnaber.de:

 On 02.10.2012, 19:22:58 Richard Eckart de Castilho wrote:
 
 Hi Richard,
 
 and try to build it and run the tests?
 
 how exactly do you build it? ant complains about a missing connection.csv 
 here.
 
 I have already uploaded the lucene-gosen-ipadic that I built to the UKP
 Maven Repository. If they have a problem, I can still remove them from
 there or update them, but that is not possible when they are on Maven
 Central.
 
 Your build works with LT, at least the tests don't fail. I'm not sure if 
 there's someone on this list who can give a more informed review.
 
 Regards
 Daniel


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] Failing test case in LT 1.9: GermanTaggerTest.testTagger

2012-10-02 Thread Richard Eckart de Castilho
Nope, I wasn't. Now I do and the tests are all green.

Cheers,

-- Richard

Am 02.10.2012 um 20:01 schrieb Daniel Naber list2...@danielnaber.de:

 On 02.10.2012, 19:32:28 Richard Eckart de Castilho wrote:
 
  Failed tests: testTagger(org.languagetool.tagging.de.GermanTaggerTest):
 null expected:…be[Lieblingsbuchstab[e/SUB:NOM]:SIN:MAS] but
 was:...be[Lieblingsbuchstab[/SUB:DAT]:SIN:MAS]
 
 Do I have a setup problem here or is this a test known to fail?
 
 Are you using jwordsplitter 3.4, which was just released a few days ago?
 
 Regards
 Daniel


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] LT 1.9 Maven dependency version differences

2012-10-02 Thread Daniel Naber
On 02.10.2012, 20:23:16 Richard Eckart de Castilho wrote:

 Everything compiles and all the tests are good. 
 
 Is it ok to stay with these deviations?

I think so - we have quite a lot of tests, and if they are okay, everything 
should be fine. Lucene is used for the dev package only, i.e. the part of 
LT that belongs into its own package anyway. It won't be used at runtime 
for common users.

Regards
 Daniel

-- 
http://www.danielnaber.de


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] lucene-gosen-ipadic 1.2.1 for LanguageTool 1.9

2012-10-02 Thread Daniel Naber
On 02.10.2012, 19:22:58 Richard Eckart de Castilho wrote:

 TestJapaneseTokenizer.testDecomposition3: term 3
 expected:マシュー[] but was:マシュー[・ホプキンス]
 TestJapaneseTokenizer.testTwoSentences: term 3 expected:マシュー[] but
 was:マシュー[・ホプキンス] BasicDecompositionTest.testDecomposition3: expected 7
 but as 5 BasicDecompositionTest.testDifferentDictionary02: Not expected
 exception. java.lang.AssertionError

Okay, now it compiles here and I get almost the same failures as you. 
BasicDecompositionTest.testDifferentDictionary02 does not seem to fail, the 
others fail like they do for you.

Regards
 Daniel

-- 
http://www.danielnaber.de


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] Fwd: Creating a Java rule?

2012-10-02 Thread Dominique Pellé
Hi Marco

To match any token is simple: just use token/.  I don't speak
Portuguese, but if I understood correctly from your example,
then the following rule should do what you want:

rule id=A_XXX_DE_VOCES name=a XXX de vocês - a vossa XXX
  pattern
tokena/token
token/
tokende/token
tokenvocês/token
  /pattern
  messageDid you mean suggestion\1 vossa \2/suggestion?/message
  example type=incorrectA opinião de vocês/example
  example type=correcta vossa opinião/example
/rule

No need for Java rules for this as the subject of the email suggested.

Regards
-- Dominique

On Wed, Oct 3, 2012 at 12:04 AM, Marco A.G.Pinto 
marcoagpi...@mail.telepac.pt wrote:

  Daniel suggested me to post this to the mailing list.


  Original Message   Subject: Creating a Java rule?  Date: Tue,
 02 Oct 2012 12:48:58 +0100  From: Marco A.G.Pinto
 marcoagpi...@mail.telepac.pt marcoagpi...@mail.telepac.pt  Reply-To:
 marcoagpi...@mail.telepac.pt  To: Daniel Naber (LanguageTool)
 na...@danielnaber.de na...@danielnaber.de, Juan Martorell
 (LanguageTool) juan.martor...@gmail.com juan.martor...@gmail.com,
 Marcin Milkowski (LanguageTool) 
 marcin.milkow...@gmail.commarcin.milkow...@gmail.com

 Hello!

 Last night I was on Facebook reading a Brazilian post and I noticed a
 common mistake in the grammar.

 They had written:
  (...) *a* opinião *de vocês* (...) 

 The correct would be:
  (...) *a vossa* opinião (...)

 In simple words, it would detect:
 1) a
 2) ANYWORD (in this case opinião)
 3) de vocês (to be replaced with vossa)

 How can I code this into LanguageTool and make the example show the ANYWORD
 according to the word used in the sentence?

 Thanks!

 Kind regards,
Marco A.G.Pinto

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] Fwd: Creating a Java rule?

2012-10-02 Thread Marco A.G.Pinto

  
  
Thanks, I will try it tomorrow :)
  
  On 02-10-2012 23:27, Dominique Pellé wrote:

Hi Marco
  
  To match any token is simple: just use token/.  I don't
  speak
  Portuguese, but if I understood correctly from your example,
  then the following rule should do what you want:
  
  rule id="A_XXX_DE_VOCES" name="a XXX de vocês - a vossa
  XXX"
    pattern
      tokena/token
      token/
      tokende/token
      tokenvocês/token
    /pattern
    messageDid you mean suggestion\1 vossa
  \2/suggestion?/message
    example type="incorrect"A opinião de
  vocês/example
    example type="correct"a vossa opinião/example
  /rule
  
  No need for Java rules for this as the subject of the email
  suggested.
  
  Regards
  -- Dominique
  


-- 
  
  

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel