Re: English dictionary status?

2015-02-27 Thread Marco A.G.Pinto

Hello!

I have been releasing monthly updates and sharing all the missing words 
I add to GB in Kevin's GitHub.


What happens is that I am adding too many words in short periods of time 
and I swamp Kevin's GitHub with many hundreds of missing words.


From the words I share, Kevin only uses some.

Since I have an Oxford Gold Account (which I bought just for the task of 
improving GB - yes, I am silly, you may think), I have access to the 
whole Oxford examples.


What now happens is that, even though I have 40.txt, 50.txt and 60.txt 
from Kevin, I only add the words from there after checking them myself 
one at a time.


I have found that some are American and others are written differently 
in Oxford's dictionary. That is why in this month's update (tonight I 
will release 1-MAR-2015 since it is ready for a day or two) I have 735 
new words, almost none of them from Kevin's txts (this happened because 
of the Gold Account which gave me access to a huge source of words).


This is what I wrote on Mozilla's ML a few days ago (please notice that 
tonight's version has 147 665 words - V2.22):

There are three versions on Mozilla's extensions site:
*1) Mark Tyndall's*
136 404 words

*2) Lucas's **(updated)*
136 404 words - just the FF, TB and SM versions have been changed

*3) Marco A.G.Pinto's (forked) - 2.21*
146 930 words - 10 000 new words
https://addons.mozilla.org/en-US/firefox/addon/british-english-dictionary-2
This is also the official version that ships with Apache OpenOffice:
http://extensions.openoffice.org/en/project/english-dictionaries-apache-openoffice
Every month, around 600 new words are added.
I didn't change the licence, so it is the same as Mark's and Lucas'

I have been trying my GB to become the default in Mozilla and 
LibreOffice (built-in) but I don't know the people I should annoy... 
maybe Kevin and László?


Also, before packing the new GB I added:
5105) Fri (abbreviation: Friday)
5106) Jun (abbreviation: June)
5107) Jul (abbreviation: July)
5108) Sep (abbreviation: September)

Yes, I typed all weekdays and months by hand to make sure they were all 
there, and these four were missing.


Like I mentioned before, I want in a year from now, for people to write 
Masters/PhD thesis using my GB and LanguageTool.


Thanks for your time!

Kind regards from your friend,
   Marco A.G.Pinto
  -


On 27/02/2015 13:06, Daniel Naber wrote:

Hi,

this questions goes mostly to Marco, but it may be interesting for
others, too: I see you're regularly updating your English dictionary,
which is great. Is there progress in getting your dictionary and that of
Kevin Atkinson closer together? Is there a metric for that, like the
size of the diff of both dictionaries (after unmuch and sort)?

Regards
   Daniel





--
--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: splitting grammar.xml

2015-02-27 Thread Daniel Naber
On 2015-02-27 09:06, Milos Sramek wrote:

 We want to split the grammar.xml file in several smaller ones. There
 seem to be two ways to do that:
 - modify the
 languagetool-language-modules/sk/src/main/java/org/languagetool/language/Slovak.java
 file (used for the 'uk' language)
- include other xml file to grammar.xml using !DOCTYPE rules
 [!ENTITY UserRules SYSTEM file:user-rules.xml]
 Should we prefer any of them?

I'd suggest to use the same approach as UK does. The other approach 
caused some problems, even though I don't remember exactly which.

 Also: 'git status' does not show any untracked file, even though tons 
 of
 files were created or downloaded in the build process. Where is the 
 trick?

Maybe these files are listed in .gitignore?

Regards
  Daniel


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


English dictionary status?

2015-02-27 Thread Daniel Naber
Hi,

this questions goes mostly to Marco, but it may be interesting for 
others, too: I see you're regularly updating your English dictionary, 
which is great. Is there progress in getting your dictionary and that of 
Kevin Atkinson closer together? Is there a metric for that, like the 
size of the diff of both dictionaries (after unmuch and sort)?

Regards
  Daniel


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: English dictionary status?

2015-02-27 Thread Daniel Naber
On 2015-02-27 15:46, Marco A.G.Pinto wrote:

Hi Marco,

thanks for the update.

  There are three versions on Mozilla's extensions site:
  1) MARK TYNDALL'S
  136 404 words
 
  2) LUCAS'S (UPDATED)
  136 404 words - just the FF, TB and SM versions have been changed

Are these still maintained? Couldn't these guys be contacted to see if 
they want to help to create one common dictionary?

  I have been trying my GB to become the default in Mozilla and
 LibreOffice (built-in) but I don't know the people I should annoy...
 maybe Kevin and László?

Well, I still think if the dictionaries get closer together this 
problems disappears automatically. If there's only one dictionary, 
there's no discussion about which one to include by default.

Regards
  Daniel


--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: Disabling disambiguator rules

2015-02-27 Thread Marcin Miłkowski
W dniu 2015-02-26 o 21:10, Andriy Rysin pisze:
 Would it make sense to allow to disable disambiguator rules the same
 way we disable checking rules?

They are cascaded, disabling them is like disabling random pieces of 
Java code. It might work but it's very risky due to complexity of 
interrelationships.

 I.e. I have a disambiguator rule that wil remove tokens with :rare tag
 if they overlap with ones without :rare. This produces good results
 for modern texts but does not work as well for books which use older
 or non-standard language features. So when I am running regressions on
 the book texts I could just add another rule id to the -d argument
 to make LT leave :rare tokens in.

Instead of removing those tags, you might simply add a new markup or 
ignore :rare in your checking rules.

Best,
Marcin


 Thanks
 Andriy

 --
 Dive into the World of Parallel Programming The Go Parallel Website, sponsored
 by Intel and developed in partnership with Slashdot Media, is your hub for all
 things parallel software development, from weekly thought leadership blogs to
 news, videos, case studies, tutorials and more. Take a look and join the
 conversation now. http://goparallel.sourceforge.net/
 ___
 Languagetool-devel mailing list
 Languagetool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/languagetool-devel




--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel