Re: English dictionary status?
Hello! I have been releasing monthly updates and sharing all the missing words I add to GB in Kevin's GitHub. What happens is that I am adding too many words in short periods of time and I swamp Kevin's GitHub with many hundreds of missing words. From the words I share, Kevin only uses some. Since I have an Oxford Gold Account (which I bought just for the task of improving GB - yes, I am silly, you may think), I have access to the whole Oxford examples. What now happens is that, even though I have 40.txt, 50.txt and 60.txt from Kevin, I only add the words from there after checking them myself one at a time. I have found that some are American and others are written differently in Oxford's dictionary. That is why in this month's update (tonight I will release 1-MAR-2015 since it is ready for a day or two) I have 735 new words, almost none of them from Kevin's txts (this happened because of the Gold Account which gave me access to a huge source of words). This is what I wrote on Mozilla's ML a few days ago (please notice that tonight's version has 147 665 words - V2.22): There are three versions on Mozilla's extensions site: *1) Mark Tyndall's* 136 404 words *2) Lucas's **(updated)* 136 404 words - just the FF, TB and SM versions have been changed *3) Marco A.G.Pinto's (forked) - 2.21* 146 930 words - 10 000 new words https://addons.mozilla.org/en-US/firefox/addon/british-english-dictionary-2 This is also the official version that ships with Apache OpenOffice: http://extensions.openoffice.org/en/project/english-dictionaries-apache-openoffice Every month, around 600 new words are added. I didn't change the licence, so it is the same as Mark's and Lucas' I have been trying my GB to become the default in Mozilla and LibreOffice (built-in) but I don't know the people I should annoy... maybe Kevin and László? Also, before packing the new GB I added: 5105) Fri (abbreviation: Friday) 5106) Jun (abbreviation: June) 5107) Jul (abbreviation: July) 5108) Sep (abbreviation: September) Yes, I typed all weekdays and months by hand to make sure they were all there, and these four were missing. Like I mentioned before, I want in a year from now, for people to write Masters/PhD thesis using my GB and LanguageTool. Thanks for your time! Kind regards from your friend, Marco A.G.Pinto - On 27/02/2015 13:06, Daniel Naber wrote: Hi, this questions goes mostly to Marco, but it may be interesting for others, too: I see you're regularly updating your English dictionary, which is great. Is there progress in getting your dictionary and that of Kevin Atkinson closer together? Is there a metric for that, like the size of the diff of both dictionaries (after unmuch and sort)? Regards Daniel -- -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: splitting grammar.xml
On 2015-02-27 09:06, Milos Sramek wrote: We want to split the grammar.xml file in several smaller ones. There seem to be two ways to do that: - modify the languagetool-language-modules/sk/src/main/java/org/languagetool/language/Slovak.java file (used for the 'uk' language) - include other xml file to grammar.xml using !DOCTYPE rules [!ENTITY UserRules SYSTEM file:user-rules.xml] Should we prefer any of them? I'd suggest to use the same approach as UK does. The other approach caused some problems, even though I don't remember exactly which. Also: 'git status' does not show any untracked file, even though tons of files were created or downloaded in the build process. Where is the trick? Maybe these files are listed in .gitignore? Regards Daniel -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
English dictionary status?
Hi, this questions goes mostly to Marco, but it may be interesting for others, too: I see you're regularly updating your English dictionary, which is great. Is there progress in getting your dictionary and that of Kevin Atkinson closer together? Is there a metric for that, like the size of the diff of both dictionaries (after unmuch and sort)? Regards Daniel -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: English dictionary status?
On 2015-02-27 15:46, Marco A.G.Pinto wrote: Hi Marco, thanks for the update. There are three versions on Mozilla's extensions site: 1) MARK TYNDALL'S 136 404 words 2) LUCAS'S (UPDATED) 136 404 words - just the FF, TB and SM versions have been changed Are these still maintained? Couldn't these guys be contacted to see if they want to help to create one common dictionary? I have been trying my GB to become the default in Mozilla and LibreOffice (built-in) but I don't know the people I should annoy... maybe Kevin and László? Well, I still think if the dictionaries get closer together this problems disappears automatically. If there's only one dictionary, there's no discussion about which one to include by default. Regards Daniel -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: Disabling disambiguator rules
W dniu 2015-02-26 o 21:10, Andriy Rysin pisze: Would it make sense to allow to disable disambiguator rules the same way we disable checking rules? They are cascaded, disabling them is like disabling random pieces of Java code. It might work but it's very risky due to complexity of interrelationships. I.e. I have a disambiguator rule that wil remove tokens with :rare tag if they overlap with ones without :rare. This produces good results for modern texts but does not work as well for books which use older or non-standard language features. So when I am running regressions on the book texts I could just add another rule id to the -d argument to make LT leave :rare tokens in. Instead of removing those tags, you might simply add a new markup or ignore :rare in your checking rules. Best, Marcin Thanks Andriy -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel