Re: [Languagetool] inflected words from two tokens

2012-05-22 Thread Jaume Ortolà i Font
2012/5/22 Marcin Miłkowski list-addr...@wp.pl However, it's not possible, as far as I remember, to refer to another token's POS tag and inflect some other token based on it (which would involve recursive inclusion of match inside match). It seems pretty much straightforward to implement but I

Re: [Languagetool] new Java Rule: Accentuation Check for Catalan.

2012-05-31 Thread Jaume Ortolà i Font
2012/5/31 Daniel Naber list2...@danielnaber.de On Donnerstag, 31. Mai 2012, Jaume Ortolà i Font wrote: Thanks, looks good. Maybe the match() method can become a bit shorter by extracting some code to private methods? I don't see any easy way to do it. The whole Java rule is equivalent

[Languagetool] problems with unification

2012-06-04 Thread Jaume Ortolà i Font
Hi, I have been testing some rules for three-tokens sequences using unification. I describe here my case and what I have found. I want to match a three-tokens sequence: determinant + possessive + noun (or adjective). There are two features in unification: gender (masc./fem.) and number

[Languagetool] addition to the Hunspell rule

2012-06-11 Thread Jaume Ortolà i Font
There is the possibility that some words that are included in the tagger dictionary (or are tagged in the disambiguation file) are marked as errors by Hunspell, because they are missing in the Hunspell dictionary. In order to avoid it we could add a condition in the Hunspell Java rule: mark as an

Re: [Languagetool] Hunspell tests for affix files

2012-06-11 Thread Jaume Ortolà i Font
. Regards, Jaume Ortolà 2012/6/9 Marcin Miłkowski list-addr...@wp.pl W dniu 2012-06-09 20:26, Jaume Ortolà i Font pisze: 2012/6/9 Marcin Miłkowski list-addr...@wp.pl mailto:list-addr...@wp.pl W dniu 2012-06-09 19:14, Jaume Ortolà i Font pisze: 2012/6/9 Marcin Miłkowski list

Re: [Languagetool] Rule configuration and simplifying interface

2012-06-14 Thread Jaume Ortolà i Font
Hi Marcin, This is very necessary. We need a simpler and more intuitive config dialog. A question. This global settings can be presented in two ways: as checkboxes or as mutually exclusive options. Have you thought about this? Do you have any preference? I think that the two possibilites should

Re: [Languagetool] LT 1.9: feature freeze reminder

2012-09-21 Thread Jaume Ortolà i Font
Daniel, Could you rerun the tests with the Wikipedia corpus? http://community.languagetool.org/corpusMatch/list?lang=en Tell us how many articles are checked. Regards Jaume Ortolà 2012/9/21 Daniel Naber list2...@danielnaber.de Hi, this is a reminder that we're now in feature freeze[1].

Re: [Languagetool] LT 1.9: feature freeze reminder

2012-09-22 Thread Jaume Ortolà i Font
2012/9/22 Daniel Naber list2...@danielnaber.de On 21.09.2012, 12:57:09 Jaume Ortolà i Font wrote: Could you rerun the tests with the Wikipedia corpus? http://community.languagetool.org/corpusMatch/list?lang=en As you might have noticed there are performance problems with the site... I'm

Re: [Languagetool] Fighting false alarms

2012-10-22 Thread Jaume Ortolà i Font
Hi, In the case of Catalan, there are several causes of the high number of positives (in order of importance, I think): - Some rules are used to check regional variants of Catalan. I have disabled them for the Wikipedia corpus check. - There are too many low quality Wikipedia articles. So there

[Languagetool] Fighting true positives

2012-10-31 Thread Jaume Ortolà i Font
2012/10/30 Daniel Naber list2...@danielnaber.de On 22.10.2012, 12:36:08 Jaume Ortolà i Font wrote: - There are too many low quality Wikipedia articles. So there is a lot of true positives. In that case, maybe you could make aware the Catalan Wikipedia community aware of Wikicheck? http

Re: [Languagetool] Fighting false alarms

2012-11-01 Thread Jaume Ortolà i Font
I have detected a source of false alarms (for Catalan) in the Wikipedia interlanguage links [1]. Some of the language codes (be, es, et, hi, li, lo, se, te...) happen to be Catalan words that trigger several grammar rules. In general, words inside any kind of link should be ignored. Regards,

Re: [Languagetool] Fighting false alarms

2012-11-02 Thread Jaume Ortolà i Font
, then there would be no false alarms. Jaume Ortolà 2012/11/2 Jaume Ortolà i Font jaumeort...@gmail.com I have detected a source of false alarms (for Catalan) in the Wikipedia interlanguage links [1]. Some of the language codes (be, es, et, hi, li, lo, se, te...) happen to be Catalan words that trigger

Re: [Languagetool] Firefox extension

2012-11-09 Thread Jaume Ortolà i Font
Hi, Great job! It works fine. I have tested the extension in two kind of real-life situations: composing a message in gmail and using this new WYSIWYG Wikipedia editor.[1] What I miss now for this extension to be really useful is a more direct connection between the error messages and the points

[Languagetool] regional/country variants and configuration dialog

2012-11-12 Thread Jaume Ortolà i Font
Hi, In Catalan there are three main regional variants, which can be handled with a few simple grammar rules. I would like very much to put these rules in separate country grammar files (like in /rules/en/en-GB/grammar.xml). Unfortunately, the codification we are using in LT for country variants

Re: [Languagetool] regional/country variants and configuration dialog

2012-11-12 Thread Jaume Ortolà i Font
2012/11/12 R.J. Baars r.j.ba...@xs4all.nl: Would it be possible to use this new standard and have the old one derived from it using a translation table? Codes like de-DE or en-GB are indeed valid in the BCP-47 standard. No change is needed here. On the other hand, we could use language codes

Re: Italian Language enhancements

2012-12-28 Thread Jaume Ortolà i Font
2012/12/28 Mauro Condarelli mc5...@mclink.it My disambiguation rule needs updating, if someone can suggest how. Mario gli chiese l'ora. 121 rules activated for language Italian S Mario[Mario/NPR] gli[gli/PRO-PERS-CLI-3-M-S,il/ART-M:p] chiese[chiesa/NOUN-F:p]

Re: new disambiguation action implemented (patch included)

2013-01-01 Thread Jaume Ortolà i Font
This bug is fixed now. See revision 8761, changes in DisambiguationPatternRule.java. startPos was lost in the function replaceTokens() and now it is kept. I will document in the wiki the new filterall action and the use of replace and add in multiple tokens. Jaume 2013/1/1 Marcin Miłkowski

unification

2013-01-02 Thread Jaume Ortolà i Font
Hi, I have found a problem with unification. I'm using this pattern: rule id=DAAN_ name=det + adj + adj + nom pattern unify feature id=nombre/ feature id=genere/ marker token postag=D[^R].* postag_regexp=yes/

changes in unification (problems in French tests)

2013-01-02 Thread Jaume Ortolà i Font
think, Dominique? Regards, Jaume Ortolà 2013/1/2 Jaume Ortolà i Font jaumeort...@gmail.com I found a solution. I'm trying to change properly the code. Regards, Jaume 2013/1/2 Jaume Ortolà i Font jaumeort...@gmail.com Hi, I have found a problem with unification. I'm using

isWhitespaceBefore

2013-01-13 Thread Jaume Ortolà i Font
Hi, I would like to add the isWhitespaceBefore information to the historical annotations of the disambiguator, so any problem can be easily spotted and fixed. When isWhitespaceBefore=false then an asterisk will be shown after the postag: word[lemma/POS*]. Is this OK for everybody? Some JUnit

Re: discarding outlying sentences

2013-01-15 Thread Jaume Ortolà i Font
Italian as a primary language and indicate that the undetected paragraphs fall back to English. If I know that I will be using lots of quotations form other languages I can leave the ignore option on and not check them at all. Ciao Paolo On Jan 14, 2013, at 10:05 AM, Jaume Ortolà i

Re: getting LT online

2013-01-17 Thread Jaume Ortolà i Font
Hi, Softcatalà, an organization that promotes software in Catalan and specially linguistic tools (translator, spellchecker, etc.), is willing to use LanguageTool in its website. Its online spellchecker received 500.000 visits last December (a bad month). So perhaps LanguageTool should be

Re: adding suggestions to a pattern rule

2013-01-17 Thread Jaume Ortolà i Font
2013/1/16 Dominique Pellé dominique.pe...@gmail.com Do we really need to put suggestion inside suggestions? It would be less noisy like this: messageyada yada yada/message suggestionxxx/suggestion suggestionyyy/suggestion url.../url example type=incorrect.../example example

Re: Dinamic dictionary handling

2013-01-19 Thread Jaume Ortolà i Font
2013/1/19 Mauro Condarelli mc5...@mclink.it I (slightly) modified MorfologikSpellerRule to accept without further action words having POS tags. This is a welcomed change. Sometimes there are words that are not present in the tagger dictionary but get a POS tag in the disambiguation or in

Re: switching to Maven - done!

2013-01-25 Thread Jaume Ortolà i Font
2013/1/24 Daniel Naber list2...@danielnaber.de You can use this for now (I just made an update, the class was still missing): java -cp languagetool-standalone-2.1-SNAPSHOT.jar org.languagetool.commandline.Main We can either add script files or configure Maven to create another JAR for the

Re: switching to Maven - done!

2013-01-25 Thread Jaume Ortolà i Font
This can be useful for Eclipse users. I installed these plugins: m2e - Maven Integration for Eclipse Subclipse (or other SVN plugin) Maven SCM handler for Subclipse Then in the SVN repository you can check out as a Maven project The result is a duplicated structure like the one explaind by

Re: switching to Maven - done!

2013-01-28 Thread Jaume Ortolà i Font
2013/1/28 Mauro Condarelli mc5...@mclink.it Sorry to disturb, people. I've been using Eclipse previously. Now I followed instructions for the maven repack. Everything went ok, but I can't start the commandline: mcon@vmrunner :/srv/Store/Language/languagetool/languagetool-standalone/target$

Re: switching to Maven - done!

2013-01-28 Thread Jaume Ortolà i Font
On 28/01/2013 09:51, Jaume Ortolà i Font wrote: 2013/1/28 Mauro Condarelli mc5...@mclink.it Sorry to disturb, people. I've been using Eclipse previously. Now I followed instructions for the maven repack. Everything went ok, but I can't start the commandline: mcon@vmrunner :/srv/Store

Re: Uncompounding words

2013-02-06 Thread Jaume Ortolà i Font
In Catalan new words are created by compounding and derivation. It would suffice to have a list of common prefixes and suffixes, to know the class of words to which every affix can be united (i.e. noun, adjective, verb, another affix), and a few rules of ortographical change in the concatenation

Re: using LT from a web form

2013-02-23 Thread Jaume Ortolà i Font
Hi Daniel, Three browsers, three different responses. 1) In FireFox, everything is OK. 2) In MS IE9, I get the results in comunity.languagetool.org 3) In Chrome, I get this error message, and no results: Could not send request to https://languagetool.org:8081/checkDocument Error: GENERAL

Re: LanguageTool release 2.1 in progress

2013-04-01 Thread Jaume Ortolà i Font
This is probably wrong, isn't it? The following changes have been done in version trunk of LanguageTool (xml-based rules only): ca 0 new, 0 improved, 0 removed Regards, Jaume Ortolà Salutacions, Jaume Ortolà www.riuraueditors.cat 2013/4/1 Daniel Naber list2...@danielnaber.de Hi, the

enabling and disabling rules in the LT web server

2013-04-03 Thread Jaume Ortolà i Font
Hi, We are preparing an instance of the LT http server to be used at the Softcatalà webpage. For regional and stylistic variants in Catalan, it is indispensable for us to enable and disable rules (at the same time) from the web interface. The problem is that when you use the enabled parameter

Re: enabling and disabling rules in the LT web server

2013-04-04 Thread Jaume Ortolà i Font
rules in the http server, that isn't possible now. Regards, Jaume Ortolà [1] http://languagetool.org/http-server/ 2013/4/3 Jaume Ortolà i Font jaumeort...@gmail.com Hi, We are preparing an instance of the LT http server to be used at the Softcatalà webpage. For regional and stylistic

Mixed case words

2013-04-07 Thread Jaume Ortolà i Font
Hi, I have made some changes in MorfologikSpeller and in BaseTagger so words written in mixed case are considered spelling errors and are not tagged. Mixed case words are considered valid only if they appear exactly that way in the speller dictionary (for the spelling rule) or in the tagger

Re: Mixed case words

2013-04-08 Thread Jaume Ortolà i Font
2013/4/8 R.J. Baars r.j.ba...@xs4all.nl About case: In Dutch, DVD is a undesirable way to write dvd; This is the only thing in Dutch that seems to need a different treatment. And what about titles? Can they be written in all uppercase letters? This feature (allow to write in all uppercase

Re: checking sanity of the tagger dictionary

2013-04-18 Thread Jaume Ortolà i Font
2013/4/18 Andriy Rysin ary...@gmail.com So now I'll be working on writing rules and beefing up the tags in the dictionary so I have a question in regards to that (and I apologize up front if any of the answer already present somewhere) I'll be changing grammar.xml a lot and I would like to

Re: Improving suggestions in speller rules

2013-04-18 Thread Jaume Ortolà i Font
the 6th suggestion). I suppose that other languages need a similar approach. Regards, Jaume Ortolà 2013/4/7 Marcin Miłkowski list-addr...@wp.pl W dniu 2013-04-07 11:07, Jaume Ortolà i Font pisze: Hi, I have made an improvement in Morfologik speller rule. If few suggestions are found

Re: Improving suggestions in speller rules

2013-04-18 Thread Jaume Ortolà i Font
2013/4/18 Daniel Naber list2...@danielnaber.de the right approach is to add this into the algorithm that traverses the dictionary tree. For German, I needed a solution fast and ended up with a hack in GermanSpellerRule. It's easy to understand, but if you could check the morfologik algorithm

Re: Improving suggestions in speller rules

2013-04-20 Thread Jaume Ortolà i Font
-sensitive character comparison. Best, Jaume Ortolà Salutacions, Jaume Ortolà www.riuraueditors.cat 2013/4/18 Marcin Miłkowski list-addr...@wp.pl W dniu 2013-04-18 16:28, Daniel Naber pisze: On 18.04.2013, 14:41:21 Jaume Ortolà i Font wrote: Hi Jaume, For achieving this, I think that some

Re: Improving suggestions in speller rules

2013-04-23 Thread Jaume Ortolà i Font
2013/4/23 Marcin Miłkowski list-addr...@wp.pl I'm using the tagger dictionary as a speller dictionary, because now it's better than the hunspell generated word list and that way there is only one dictionary to be mantained. The files in the hunspell directory were pending removal. I

Re: Improving suggestions in speller rules

2013-04-23 Thread Jaume Ortolà i Font
This is the modified Speller.java. The idea is more or less the same that is found in Jan Daciuk's code. More testing in different languages is needed, because there are a many details to consider and perhaps it's buggy. When a possible multiple character substitution is found, a new branch is

Re: Improving suggestions in speller rules

2013-04-23 Thread Jaume Ortolà i Font
2013/4/23 Marcin Miłkowski list-addr...@wp.pl If that's the case, then it's a bug in traversing the dictionary. Yes, you were right. OK, then it's a bug. I need to use isBeforeSeparator() more often. Probably in line 313 of Speller.java, instead of: if (!fsa.isArcTerminal(arc)) { we

Re: Improving suggestions in speller rules

2013-04-23 Thread Jaume Ortolà i Font
Marcin, I attach again the Speller.java file with some minor changes. This problem is solved now: There is a problem to be solved. The L - L·L substitution adds a distance of 0, but the L·L- L substitution adds 1. It should be always 0. Best, Jaume Speller.java Description: Binary data

Re: equivalent and optional characters in words

2013-04-24 Thread Jaume Ortolà i Font
You can disambiguate first, so the femenine noun tag is removed. See the explanation here: http://wiki.languagetool.org/developing-a-disambiguator#toc5 Regards, Jaume Ortolà 2013/4/24 Andriy Rysin ary...@gmail.com Thanks Marcin I stole some unifications from Polish grammar.xml, adjusted

Re: Improving suggestions in speller rules

2013-04-24 Thread Jaume Ortolà i Font
2013/4/24 Marcin Miłkowski list-addr...@wp.pl Jaume, and everybody, I started to implement the features we need in MorfologikSpeller (in morfologik repository on github). It looks good. * fsa.dict.speller.runon-words for turning off and on the runon words feature. We have to remember

Re: Failed Test

2013-05-18 Thread Jaume Ortolà i Font
Hi Nathan, There was an error in the Catalan grammar file. But it is solved now. If you do svn update again, there should be no error. Regards, Jaume Ortolà 2013/5/18 Nathan Wells sungk...@gmail.com I haven't updated the Khmer module in a while, but have some new stuff to add. Now that

Re: LanguageTool nightly diff test

2013-05-18 Thread Jaume Ortolà i Font
Hi, It seems that in Italian and French, from a certain point on, all the errors have been removed. I suspect that there is some out of bond exception that stops the checking of articles. The lines I wrote in UppercaseSentenceStart probably are not safe enough. I will change it. Regards, Jaume

Re: LanguageTool nightly diff test

2013-05-21 Thread Jaume Ortolà i Font
2013/5/21 Marcin Miłkowski list-addr...@wp.pl I have a problem with it: all changes create false alarms for Polish. This is a regression. Hi Marcin, Do you mean these changes? [1] They are not creating alarms. They are removing alarms. Are the changes removing true positives? What I did in

Re: LanguageTool nightly diff test

2013-05-21 Thread Jaume Ortolà i Font
2013/5/21 Marcin Miłkowski list-addr...@wp.pl What I did in UppercaseSentenceStart was to eliminate alarms in sentences starting with patterns for enumerated lists like these: a) b. iv. iii) The latter two are genuine mistakes in Polish: Roman numerals are written only in

Re: Help with Khmer Java Rule Creation

2013-05-29 Thread Jaume Ortolà i Font
Hi Nathan, I have just committed the Java rule you asked for. See if everything is correct. Regards, Jaume Ortolà 2013/5/29 Nathan Wells sungk...@gmail.com I need some help creating a java rule for the Khmer language in LanguageTool. Would someone be willing to create what I believe is a

Re: tokeniging numbers with fractions

2013-06-13 Thread Jaume Ortolà i Font
2013/6/12 Andriy Rysin ary...@gmail.com I noticed that numbers with fractions like 2,2 are split into '2', ',', '2' by word tokenizer. In Ukrainian I need to require difference case of the following noun based on whether it's a whole number or fractional so I was planning to adjust Ukrainian

Re: POS tag UNKNOWN for SENT_END?

2013-06-24 Thread Jaume Ortolà i Font
Hi, I don't know what are the test case failures for Catalan. In any way, the SENT_START, SENT_END tags are used for marking the start and the end of a sentence. They work at the sentence level, and they have nothing to do with the POS tag of a token. So the current behavior seems logical to me.

Uppercase Sentence Start Rule (bug #185)

2013-07-02 Thread Jaume Ortolà i Font
Hi, There is a bug report about the behavior of UppercaseSentenceStartRule: https://sourceforge.net/p/languagetool/bugs/185/ I think that the only situation in which we can safely prevent the rule to match is when the previous sentence ends with comma or semicolon. So I propose to implement

Re: Error updating my folder with Tortoise 1.8

2013-07-02 Thread Jaume Ortolà i Font
2013/7/2 Daniel Naber list2...@danielnaber.de Am 02.07.2013 17:01, schrieb Marco A.G.Pinto: Now I can't connect to the repository as it gives an error. Are you sure you're using the right URL to connect? http://svn.code.sf.net/p/languagetool/code/trunk/languagetool/ This was changed at

Re: Uppercase Sentence Start Rule (bug #185)

2013-07-02 Thread Jaume Ortolà i Font
.org/regression-tests/20130702/result_pl_20130702.html 8. http://languagetool .org/regression-tests/20130702/result_it_20130702.html 2013/7/2 Jaume Ortolà i Font jaumeort...@gmail.com Hi, There is a bug report about the behavior of UppercaseSentenceStartRule: https://sourceforge.net/p

Re: suggestions in Morfologik spelling rule

2013-07-15 Thread Jaume Ortolà i Font
dniu 2013-07-02 01:11, Jaume Ortolà i Font pisze: Hi Marcin, I have been using the still unreleased code of morfologik-stemming and I have made improvements to Speller.java for some previously unforseen cases. See the attachement. In order to complete the development, and test debug with all

Re: suggestions in Morfologik spelling rule

2013-07-15 Thread Jaume Ortolà i Font
2013/7/15 Daniel Naber list2...@danielnaber.de: Am 15.07.2013 12:35, schrieb Marcin Miłkowski: Please review my changes. +assertCorrectionsByOrder(rule, Rytmus, Remus, Rhythmus); This new suggestion is not as good as the old one, Rhythmus should be preferred. As this is a classical/typical

Re: suggestions in Morfologik spelling rule

2013-07-15 Thread Jaume Ortolà i Font
-addr...@wp.pl: W dniu 2013-07-15 12:41, Jaume Ortolà i Font pisze: Thanks, Marcin. Some remarks. The improvements I sent to the list 15 days ago have not been added, and moreover I have found more bugs. I'm really sorry but there are 200 mails from the mailing list over the last two weeks

Re: suggestions in Morfologik spelling rule

2013-07-15 Thread Jaume Ortolà i Font
2013/7/15 Marcin Miłkowski list-addr...@wp.pl: Hi Jaume, W dniu 2013-07-15 21:16, Jaume Ortolà i Font pisze: Hi, Marcin. I have tested the current code (1.8.0-SNAPSHOT) and everything is OK, all the changes are there. Thank you. Great. We'll release 1.7.1, this is just a minor bug fix

Re: SimpleReplaceRule improvements

2013-07-22 Thread Jaume Ortolà i Font
Hi, I have just copied the Ukrainian SimpleReplaceRule in the Catalan module. But most of the improvements could be moved up to the abstract rule (AbstractSimpleReplaceRule). 2013/5/16 Andriy Rysin ary...@gmail.com Just wanted to let you know that I recently improved SimpleReplaceRule that's

Re: Re: Error committing compounds.txt

2013-08-20 Thread Jaume Ortolà i Font
2013/8/20 Marco A.G.Pinto marcoagpi...@mail.telepac.pt Could someone add this file again to the Portuguese folder? Done. Jaume -- Introducing Performance Central, a new site from SourceForge and AppDynamics.

Re: trying OpenRegex

2013-08-27 Thread Jaume Ortolà i Font
2013/8/20 Daniel Naber list2...@danielnaber.de On 2013-08-14 18:59, Marcin Miłkowski wrote: For or, I can see two solutions: (a) run-time conversion of such rules to a list of normal rules (when reading the file, in the similar way as phrases are used) -- this is the easiest way and

Re: trying OpenRegex

2013-08-28 Thread Jaume Ortolà i Font
2013/8/28 Daniel Naber list2...@danielnaber.de On 2013-08-27 19:56, Jaume Ortolà i Font wrote: I have implemented this solution for the or. It seems to work. Thanks! Git question. Is it OK to publish my modifications with: git push origin my_local_branch? Did that work or did you

Re: Quick Correction

2013-09-07 Thread Jaume Ortolà i Font
Hi, I see two ways: An empty suggestion (that can be confusing for the user): rule default=off id=ACTUALLYREALLY name=Possible needless emphasis: actually/really pattern token regexp=yes(?-i)actually|really/token /pattern messageConsider if the word is (actually) necessary.

Regional variants of Catalan (ca-ES-valencia)

2013-09-17 Thread Jaume Ortolà i Font
Hi, LibreOffice 4.2 (due in November) will allow using the language code ca-ES-valencia for the Valencian variant of Catalan (default: ca-ES). It would be great to take advantage of this in Languagetool. The only difference between the general and Valencian variants is just that a few grammar

Re: LanguageTool nightly diff test

2013-09-18 Thread Jaume Ortolà i Font
Daniel, In the Wikipedia results there are matches for rules that are disabled in disabled_rules.properties, for example EXIGEIX_VERBS_CENTRAL. Shouldn't they be ignored? Regards, Jaume 2013/9/18 Daniel Naber list2...@danielnaber.de On 2013-09-18 02:04, dna...@users.sourceforge.net wrote:

Re: LanguageTool nightly diff test

2013-09-18 Thread Jaume Ortolà i Font
2013/9/18 Daniel Naber list2...@danielnaber.de On 2013-09-18 13:09, Jaume Ortolà i Font wrote: In the Wikipedia results there are matches for rules that are disabled in disabled_rules.properties, for example EXIGEIX_VERBS_CENTRAL. Shouldn't they be ignored? As this is a regression tests

Re: Regional variants of Catalan (ca-ES-valencia)

2013-09-19 Thread Jaume Ortolà i Font
2013/9/17 Daniel Naber list2...@danielnaber.de On 2013-09-17 10:31, Jaume Ortolà i Font wrote: I will try to implement it. What would be the best way to do it? I see that the Simple German (de-DE-x-simple-language) is implemented in a module outside the other German variants

Re: Regional variants of Catalan (ca-ES-valencia)

2013-09-20 Thread Jaume Ortolà i Font
, Jaume [1] https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/en-GB/grammar.xml Salutacions, Jaume Ortolà www.riuraueditors.cat 2013/9/19 Jaume Ortolà i Font jaumeort...@gmail.com 2013/9/17 Daniel Naber

Re: Regional variants of Catalan (ca-ES-valencia)

2013-09-20 Thread Jaume Ortolà i Font
2013/9/20 Jaume Ortolà i Font jaumeort...@gmail.com The major question is about the country variants in LibreOffice. Is it really working? For example, when using British English in LibreOffice, I don't see any match for apartment or zip code as defined here in the grammar rules for British

Re: feature freeze for LT 2.3

2013-09-21 Thread Jaume Ortolà i Font
Hi Daniel, I am very close to complete the changes for the ca-ES-valencia issue (solving also problems with British English in LibreOffice). This arose very recently and there was little time to do it. I hope to be finished today. Otherwise I will give up for the 2.3 release. I will introduce one

Re: feature freeze for LT 2.3

2013-09-21 Thread Jaume Ortolà i Font
2013/9/21 Daniel Naber list2...@danielnaber.de On 2013-09-21 09:47, Jaume Ortolà i Font wrote: finished today. Otherwise I will give up for the 2.3 release. I will introduce one more string (ca-ES-valencia = Catalan (Valencian)). Or I can do it right now. Is that okay? It's okay, please

Re: Regional variants of Catalan (ca-ES-valencia)

2013-09-21 Thread Jaume Ortolà i Font
Ortolà [1] http://www.openoffice.org/api/docs/common/ref/com/sun/star/lang/Locale.html [2] https://wiki.documentfoundation.org/images/b/b5/LibreOffice_FOSDEM-2013_Language_Tags.pdf [3] http://dev-builds.libreoffice.org/daily/master/ 2013/9/20 Jaume Ortolà i Font jaumeort...@gmail.com 2013/9/20

Re: Regional variants of Catalan (ca-ES-valencia)

2013-09-22 Thread Jaume Ortolà i Font
2013/9/22 Daniel Naber list2...@danielnaber.de Currently the two grammar.xml files look almost the same. Maybe we can avoid that by moving the common parts to its own files and including them, as described here? http://xml.silmaril.ie/includes.html This would need to be tested carefully

LibreOffice goes BCP 47

2013-09-23 Thread Jaume Ortolà i Font
Hi, This could be of interest to you. Eike Rathke: I'll talk about it at the LibreOffice Conference 2013 at Milano, so to get all the details please join me and attend Getting you language in on Thursday, 26 September at 15:30 in Sala Alfa.

Re: Regional variants of Catalan (ca-ES-valencia)

2013-10-02 Thread Jaume Ortolà i Font
2013/9/22 Daniel Naber list2...@danielnaber.de On 2013-09-22 11:51, Jaume Ortolà i Font wrote: default parameters. So yes, I would prefer another way to deal with it. Perhaps what you suggested at first: ValencianCatalan implements getEnabledRules and getDisabledRules

Re: building a synthesizer

2013-10-04 Thread Jaume Ortolà i Font
Daniel, I found the same problem recently. I resorted to the attached perl script for this step. Regards, Jaume Ortolà 2013/10/4 Daniel Naber list2...@danielnaber.de Hi, did anybody recently build a synthesizer? When I follow the instructions at

Re: why unification?

2013-10-16 Thread Jaume Ortolà i Font
2013/10/16 Daniel Naber list2...@danielnaber.de Hi, although I think I understand the technical details of unification, I'm not sure how/why it is used in grammar.xml. For example, if a sequence of words share the same gender and number, that means there's agreement, so you cannot use that

WikiCheck

2013-10-17 Thread Jaume Ortolà i Font
Hi, When usign LanguageTool WikiCheck, if the article has more than 100 errors, you get a warning: More than 100 possible errors found - the remaining errors will not be shown. But when you submit changes to Wikipedia, what you get in Wikipedia is always a no difference page. No change is

Re: WikiCheck

2013-10-18 Thread Jaume Ortolà i Font
it be an option in the comand line? Regards, Jaume Ortolà [1] https://ca.wikipedia.org/wiki/Glicèrid Salutacions, Jaume Ortolà www.riuraueditors.cat 2013/10/17 Daniel Naber list2...@danielnaber.de On 2013-10-17 09:28, Jaume Ortolà i Font wrote: Hi Jaume, But when you submit changes to Wikipedia

Re: Improving spelling suggestions with frequency dictionaries

2013-11-26 Thread Jaume Ortolà i Font
2013/11/25 Daniel Naber list2...@danielnaber.de On 2013-11-25 11:11, Jaume Ortolà i Font wrote: - A method for building the dictionary, assuming that it will be used only for some languages (backward compatible). - A way of using the frequency information in the ordering of suggestions

Re: Improving spelling suggestions with frequency dictionaries

2013-11-26 Thread Jaume Ortolà i Font
2013/11/26 Daniel Naber list2...@danielnaber.de On 2013-11-26 15:27, Jaume Ortolà i Font wrote: Look at these wordlists [1]. They are Apache 2.0. The words are classified in 256 ranges. [1] https://github.com/mozilla-b2g/gaia/tree/master/keyboard/dictionaries The German one looks okay

Re: Improving spelling suggestions with frequency dictionaries

2013-12-08 Thread Jaume Ortolà i Font
, we could consider that the last byte is the frequency data and the separator between POS tag and frequency is not needed. The other solution is to change the separator... Regards, Jaume Ortolà 2013/11/26 Marcin Miłkowski list-addr...@wp.pl W dniu 2013-11-26 18:44, Jaume Ortolà i Font pisze

Re: Improving spelling suggestions with frequency dictionaries

2013-12-09 Thread Jaume Ortolà i Font
2013/12/9 Anton Meixome meix...@certima.net I'm newbie here but I have a question. Why there isn't frequency list for galician in https://github.com/mozilla-b2g/gaia/tree/master/keyboard/dictionaries ? This is not our project. You should ask there. We chosed these lists because there are a

Re: Improving spelling suggestions with frequency dictionaries

2013-12-09 Thread Jaume Ortolà i Font
version of Morfologik. And then we'll be able to rebuild the dictionaries and adjust the tests if needed. Regards, Jaume Ortolà 2013/12/9 Marcin Miłkowski list-addr...@wp.pl W dniu 2013-12-09 00:12, Jaume Ortolà i Font pisze: Hi, I have implemented the use of the frequency word lists

scaping translations

2013-12-21 Thread Jaume Ortolà i Font
Hi, There are some characters in translations that need scaping. I have seen, for example, missing apostrophes in http://community.languagetool.org. So where is the proper place to do the scaping? Is it the responsibility of the translators in Transifex? Regards, Jaume Ortolà

Re: scaping translations

2013-12-21 Thread Jaume Ortolà i Font
+de+Som%C3%A0lialang=ca So should I write apos; or quot;? http://www.riuraueditors.cat Regards, Jaume Ortolà 2013/12/21 Daniel Naber list2...@danielnaber.de On 2013-12-21 12:10, Jaume Ortolà i Font wrote: My question is this. If translating from English to another language, an apostrophe

Re: fsa.dict.speller.replacement-pairs slow down spell check

2013-12-31 Thread Jaume Ortolà i Font
Hi, In the current implementation the number of possible suggestions grows exponentially with the replacement pairs, which is not a good thing... For Milkowski you get 6144 possible suggestions in American English. I fixed a limit of 7 possible simultaneous replacements in a word, which (if the

Re: Token postag OR word

2014-01-28 Thread Jaume Ortolà i Font
2014-01-28 Kumara Bhikkhu kumara.bhik...@gmail.com Can a token be a mixture of postags and words? Example: Can a token match send_end or of|into? If not, how do I indicate this? Yes, you can write this: or token postag=SENT_END / token regexp=yesof|into/token or It's equivalent to using

capitalizing Morfologik Spelling suggestions

2014-02-02 Thread Jaume Ortolà i Font
Hi, This has become a common request from users. The suggestions for a capitalized misspelled word are expected to be also capitalized. I suppose this is not true for all languages in all situations. So what can we do? 1) Capitalize always the suggestion when it is the first word of a sentence.

Re: Large wordlist of exceptions

2014-03-21 Thread Jaume Ortolà i Font
2014-03-21 9:32 GMT+01:00 Nathan Wells sungk...@gmail.com: So I want to create a rule that asks the user to use the Latin colon rather than the Khmer character ៈ except in Khmer words that actually end in the ៈ character. There are 365 Khmer words that can end in a ៈ character. What is

Re: What is wrong with this rule (pt_PT)?

2014-04-30 Thread Jaume Ortolà i Font
Marco, You have a token with vela/velas and then another with bandeira/bandeiras. The rule expects a sentence like arrrear a vela bandeira. Instead of token regexp=yesvela|velas/token token regexp=yesbandeira|bandeiras/token Use token

rules default=off are enabled in Wikipedia check

2014-05-06 Thread Jaume Ortolà i Font
Hi, This happens now in the WikiCheck and in the nightly differences. For example, with this rule from Catalan grammar.xml: rule id=EVITA_DEMOSTRATIUS_AQUEST name=Evita els demostratius 'aquest' default=off It was caused by some change today. Regards, Jaume

Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-08 Thread Jaume Ortolà i Font
2014-07-08 9:37 GMT+02:00 Marcin Miłkowski list-addr...@wp.pl: The Portuguese dictionary is already built. We simply haven't included it yet because we usually start from a certain number of rules, and then add the tagger. Using the tags in rules is a very good idea overall. I agree with

Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-08 Thread Jaume Ortolà i Font
2014-07-08 17:34 GMT+02:00 Marco A.G.Pinto marcoagpi...@mail.telepac.pt: Hello! I have contacted my Minho University friends who make the pt_PT dictionaries for Mozilla and OpenOffice/LibreOffice. They said they can create the postag dictionary and help. Hi Marco, What I and Marcin try

Re: Tagger Dictionary and Minho University - pt_PT

2014-07-08 Thread Jaume Ortolà i Font
, Spanish or Catalan), some existing rules could be used as models, and those who are familiar with them (as myself) could contribute more readily. Regards, Jaume Ortolà On Tue, Jul 8, 2014 at 9:39 PM, Jaume Ortolà i Font jaumeort...@gmail.com wrote: 2014-07-08 21:53 GMT+02:00 Marco A.G.Pinto

sample Portuguese rules

2014-07-08 Thread Jaume Ortolà i Font
Here you can see the results of the sample rules I created in Portuguese: https://languagetool.org/regression-tests/20140708/result_pt_20140708.html Suas is wrongly tagged in the Freeling dictionary as singular. It should be plural. That explains most of the false alarms. But the rule needs

enabling and disabling rules in LT command-line

2014-07-20 Thread Jaume Ortolà i Font
Hi, I need to enable and disable rules at the same time in command-line. This is already done in the server mode with three parameters[1]: enabled = list of rules... disabled = list of rules... enabledOnly = yes [by default, no] Could we implement the same approach in the command-line? Will

Re: enabling and disabling rules in LT command-line

2014-07-20 Thread Jaume Ortolà i Font
2014-07-20 18:07 GMT+02:00 Daniel Naber daniel.na...@languagetool.org: On 2014-07-20 11:22, Jaume Ortolà i Font wrote: enabled = list of rules... disabled = list of rules... enabledOnly = yes [by default, no] Could we implement the same approach in the command-line

Re: The SENT_END challenge

2014-08-09 Thread Jaume Ortolà i Font
Hi, A possible and simple solution is to write two rules. One for sentences with ending punctuation: pattern marker token regexp=yes(you|thei|ou)r/token /marker token regexp=yes[.?!]/token /pattern And another one for sentences without ending

  1   2   >