Re: PT (PRE and POS)

2014-04-02 Thread Daniel Naber
On 2014-04-02 23:21, Marco A.G.Pinto wrote:

>  Where can I find the "github tracker"?

https://github.com/languagetool-org/languagetool/issues?state=open


--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: XML element and attribute statistics

2014-04-02 Thread Dave Pawson
A common xinclude processor is xmllint, part of Daniel Vaillards software.
http://xmlsoft.org/xmllint.html
$xmllint -o outputFile --xinclude inputFile

HTH

On 2 April 2014 18:29, Andriy Rysin  wrote:
> When I was splitting grammar.xml file I actually spent almost a day
> trying to use xml include features to include component grammar files,
> I must say I was not able to make it work properly in all scenarios:
> filesystem/jar, for tests/released version. It could be I just didn't
> use the right approach so if somebody can please point me on how it
> can be done to keep LT working seemlessly in all scenarios I would
> really appreciate it.
>
> If we can't do that can we consider loading all files together
> similarly to how it's done in production code?
>
> Thanks
> Andriy
>
> 2014-04-02 11:13 GMT-04:00 Daniel Naber :
>> On 2014-04-02 16:42, Andriy Rysin wrote:
>>
>>> Provided those rules work in 2.5, do you think we just didn't include
>>> grammar.xml before testing grammar-style.xml in our tests?
>>
>> The test checks one file after the other, so any definition in
>> grammar.xml won't we visible in the other grammar*.xml files. Maybe
>> those definitions could be moved to their own file and then be included
>> via some XML feature?
>>
>> Regards
>>   Daniel
>>
>>
>> --
>> ___
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
> --
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel



-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk

--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: XML element and attribute statistics

2014-04-02 Thread Andriy Rysin
On 04/02/2014 04:44 PM, Daniel Naber wrote:
> On 2014-04-02 19:29, Andriy Rysin wrote:
>
>> When I was splitting grammar.xml file I actually spent almost a day
>> trying to use xml include features to include component grammar files,
>> I must say I was not able to make it work properly in all scenarios:
> I guess you tried this one?
> http://wiki.languagetool.org/tips-and-tricks#toc2
> If that doesn't work, there's no other approach I know of.
yes, that's what i tried, I could not make the url work for both
filesystem and jar, I even seen some differences on how LT code and
xmllint include files (the simple include that worked for xmllint didn't
work in LT) so I abandoned that path
>
>> If we can't do that can we consider loading all files together
>> similarly to how it's done in production code?
> Mhh, I can't see us doing anything special in production code. All files 
> are handled separately. Are you really 100% sure that these rules 
> actually worked? Or did they maybe work by chance, e.g. because the 
>  wasn't actually needed for the examples you tried?
yes I can confirm one of the rules (rulegroup id "SAMYI") works
correctly in 2.5 and takes to account unification.

It looks that PatterRuleTest.validatePatternFile() checks the xml files
one at a time: loading one, validating it, going for next, while
JLanguageTool.activateDefaultPatternRules() loads them all in memory,
which (if I understand correctly) will keep first grammar.xml (which
contains common parts) already loaded and parsed when loading/parsing
rest of them.

I guess we have two ways to go from here: adjust the tests to load files
and keep them (I am not sure how easy it is - depends on how flexible
our XMLValidator is) or change our getRuleFileNames() API to require
those files to be independent (which may not be very efficient if all
rule files will have to load and parse the same common parts, like
unification etc)

Regards,
Andriy


--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: PT (PRE and POS)

2014-04-02 Thread Marco A.G.Pinto

  
  
Hi Daniel,
  
  Where can I find the "github tracker"?
  
  Thanks!
  
  Kind regards,
    >Marco A.G.Pinto
      ---
  
  
  On 02/04/2014 22:10, Daniel Naber wrote:


  On 2014-04-02 16:11, Marco A.G.Pinto wrote:

Hi Marco,


  
 I guess I can start working on the Portuguese, pre-agreement and
post-agreement.

  
  
could you please create an issue for this in the github tracker, 
describing the requirements? This way everything is kept in one place 
and we don't have to search through the mailing list archives.


  
 I am sharing the dictionary files taken from Minho University, on my
Dropbox:

  
  
We will also need the original source URL so we can document where we 
got it from.

Regards
  Daniel


--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel





-- 
  
  

--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: PT (PRE and POS)

2014-04-02 Thread Daniel Naber
On 2014-04-02 16:11, Marco A.G.Pinto wrote:

Hi Marco,

>  I guess I can start working on the Portuguese, pre-agreement and
> post-agreement.

could you please create an issue for this in the github tracker, 
describing the requirements? This way everything is kept in one place 
and we don't have to search through the mailing list archives.

>  I am sharing the dictionary files taken from Minho University, on my
> Dropbox:

We will also need the original source URL so we can document where we 
got it from.

Regards
  Daniel


--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Maven vs. Gradle

2014-04-02 Thread Daniel Naber
Hi,

Gradle is a build system, similar to Maven. I noticed that there's a 
"gradle init" command which automatically turns your Maven project into 
a Gradle project. As our Maven tests are quite slow, I've given it a try 
to see if Gradle is faster. The conversion didn't work 100% but issues 
could be worked around (see below). Here are the numbers for running the 
tests with Gradle:

gradle clean -> 0:08 (i.e. 8 seconds)
gradle test -> 5:47
gradle test -> 0:14 (well, nothing has changed)
now change the German grammar.xml
gradle test -> 2:30
now change a Java file in languagetool-wikipedia
gradle test -> 0:45

You can see here that gradle actually considers the dependencies, i.e. a 
change in a module will run all the module's tests and all the tests of 
the modules that depend on it.

As a comparison, "mvn clean test" takes about 5 minutes on my computer. 
Conclusion? It's probably not worth switching to Gradle, as the full 
test build is even a bit slower than with Maven and one rarely needs to 
run a full test. If anybody here has an issue with Maven and the tests 
being slow, please see http://wiki.languagetool.org/maven-tips to make 
sure you use all the tricks that keep test times down.

If someone actually wants to try, here are the things you need to do 
after "gradle init"'s incomplete conversion:
http://stackoverflow.com/questions/5144325/gradle-test-dependency
http://stackoverflow.com/questions/7459755/how-can-i-make-gradle-include-ftl-files-in-war-file

Regards
  Daniel


--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: XML element and attribute statistics

2014-04-02 Thread Daniel Naber
On 2014-04-02 19:29, Andriy Rysin wrote:

> When I was splitting grammar.xml file I actually spent almost a day
> trying to use xml include features to include component grammar files,
> I must say I was not able to make it work properly in all scenarios:

I guess you tried this one?
http://wiki.languagetool.org/tips-and-tricks#toc2
If that doesn't work, there's no other approach I know of.

> If we can't do that can we consider loading all files together
> similarly to how it's done in production code?

Mhh, I can't see us doing anything special in production code. All files 
are handled separately. Are you really 100% sure that these rules 
actually worked? Or did they maybe work by chance, e.g. because the 
 wasn't actually needed for the examples you tried?

Regards
  Daniel


--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: XML element and attribute statistics

2014-04-02 Thread Andriy Rysin
When I was splitting grammar.xml file I actually spent almost a day
trying to use xml include features to include component grammar files,
I must say I was not able to make it work properly in all scenarios:
filesystem/jar, for tests/released version. It could be I just didn't
use the right approach so if somebody can please point me on how it
can be done to keep LT working seemlessly in all scenarios I would
really appreciate it.

If we can't do that can we consider loading all files together
similarly to how it's done in production code?

Thanks
Andriy

2014-04-02 11:13 GMT-04:00 Daniel Naber :
> On 2014-04-02 16:42, Andriy Rysin wrote:
>
>> Provided those rules work in 2.5, do you think we just didn't include
>> grammar.xml before testing grammar-style.xml in our tests?
>
> The test checks one file after the other, so any definition in
> grammar.xml won't we visible in the other grammar*.xml files. Maybe
> those definitions could be moved to their own file and then be included
> via some XML feature?
>
> Regards
>   Daniel
>
>
> --
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel

--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: XML element and attribute statistics

2014-04-02 Thread Dave Pawson
On 2 April 2014 16:13, Daniel Naber  wrote:
> On 2014-04-02 16:42, Andriy Rysin wrote:
>
>> Provided those rules work in 2.5, do you think we just didn't include
>> grammar.xml before testing grammar-style.xml in our tests?
>
> The test checks one file after the other, so any definition in
> grammar.xml won't we visible in the other grammar*.xml files. Maybe
> those definitions could be moved to their own file and then be included
> via some XML feature?

Preferable xInclude over entities, if your preferred parser supports it?

regards DaveP


>
> Regards
>   Daniel
>
>
> --
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel



-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk

--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: XML element and attribute statistics

2014-04-02 Thread Daniel Naber
On 2014-04-02 16:42, Andriy Rysin wrote:

> Provided those rules work in 2.5, do you think we just didn't include
> grammar.xml before testing grammar-style.xml in our tests?

The test checks one file after the other, so any definition in 
grammar.xml won't we visible in the other grammar*.xml files. Maybe 
those definitions could be moved to their own file and then be included 
via some XML feature?

Regards
  Daniel


--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: XML element and attribute statistics

2014-04-02 Thread Andriy Rysin
Thanks Daniel!

I can't figure out what's wrong with those tests you commented out
though, the error is this:
cvc-id.1: There is no ID/IDREF binding for IDREF 'gender'. Problem
found at line 484, column 9.
but the gender is properly defined in grammar.xml:


Provided those rules work in 2.5, do you think we just didn't include
grammar.xml before testing grammar-style.xml in our tests?

grammar.xml should be returned first in Ukrainian.getRuleFileNames()
list of filenames.

Thanks
Andriy

2014-04-01 16:11 GMT-04:00 Daniel Naber :
> On 2014-04-01 05:00, Andriy Rysin wrote:
>
>> Oops, my bad, but the interesting this is that the tests pass on this
>> file and the rule actually works in the final release...
>
> This is fixed now I think. I commented out the Ukrainian rules that
> would have made the tests fail now.
>
> Regards
>   Daniel
>
>
> --
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel

--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: prototype of new rule editor

2014-04-02 Thread Daniel Naber
On 2014-03-17 17:51, Daniel Naber wrote:

> there's now a prototype of a new rule editor available at
> http://community.languagetool.org/ruleEditor2/. Main features are:

I have released another update. Major new features:

-"Parse existing XML" link to get an existing XML rule into the editor. 
This doesn't support everything, but at least it should tell you which 
element is not supported in those cases.

-Attributes of tokens and exceptions can now be set, even if the editor 
doesn't know about them ('skip' is an example)

-Small user interface improvements

Regards
  Daniel


--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


PT (PRE and POS)

2014-04-02 Thread Marco A.G.Pinto

  
  
Hello!

I guess I can start working on the Portuguese, pre-agreement and
post-agreement.

It should appear in the combo box as:
Portuguese -> PT-PRE
Portuguese -> PT-POS

I am sharing the dictionary files taken from Minho University, on my
Dropbox:
PT-PRE:
https://dl.dropboxusercontent.com/u/30674540/oo4x-pt-PT-preao-14.4.1.1.oxt.zip
PT-POS:
https://dl.dropboxusercontent.com/u/30674540/oo4x-pt-PT-posao-14.1.1.1.oxt.zip
They are both dated from yesterday.

Could someone also create the .txt for the compound words
post-agreement?

I looked in the supermarket and they do have a post-agreement
dictionary, but it is "2013" and I can wait a couple of months or so
for the "2014" to be released... meanwhile I can use the Priberam
site to get some compound words post-agreement. Microsoft Office
2010 uses Priberam.

The grammar.xml works for both, so no need to create another file.

Thanks!

Kind regards,
 >Marco A.G.Pinto
   ---



-- 
  
  

--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel