Re: Inflecting second token with postag from the first

2016-09-13 Thread Dominique Pellé
Jesper wrote: > It looks very strange to me to include ".*" in a replacement expression. I understand that it looks strange. But in some cases, the result of replacement is a regexp. That's why regexp syntax can appear inside the regexp_replace="". I see other examples in: - the Polish

Broken URL in Catalan and Polish grammar files

2016-07-10 Thread Dominique Pellé
Hi There are a few broken URL in the Catalan and Polish grammar,xml files: Catalan: http://esadir.cat/entrades/fitxa/node/maestesa Polish: http://www.ekorekta24.pl/porady-jezykowe/19-interpunkcja/188-lata-90-te-lata-90-czy-lata-90-jak-zapisac-liczebniki-porzadkow

Re: The spell checker issue

2016-06-29 Thread Dominique Pellé
Daniel Naber wrote: > Hi, > > yesterday I tried to update the English dictionary that LT includes. The > details are documented at > https://github.com/languagetool-org/languagetool/issues/329 but in a > nutshell: our spell checking is so complicated that the

Re: Several in same rule?

2016-06-28 Thread Dominique Pellé
Daniel Naber <daniel.na...@languagetool.org> wrote: > On 2015-10-11 11:58, Dominique Pellé wrote: > >> Would be possible to allow for several tags >> in the same rule? > > I don't think it's very difficult. I could put it on my TODO list, but I > cannot make an

Re: HTTP API Migration

2016-05-31 Thread Dominique Pellé
Daniel Naber wrote: > Hi, > > we now have a new JSON API. The keep our software from getting too > complex, this means we should remove the old XML-based API. Here's a > road map how we could do that: > > https://languagetool.org/http-api/migration.php > > Comments

Re: ignoring certain tokens in rules

2016-05-06 Thread Dominique Pellé
Jaume Ortolà i Font wrote: > Hi, > > I think Marcin talked about this idea some time ago. > > Sometimes tokens like quotations (or other characters) should be ignored in > some rules. That is, the sentence should be checked as if this token is not > present. Any idea about

Re: Error trying to update French synthesizer dictionary

2016-04-15 Thread Dominique Pellé
Jaume Ortolà wrote: > Hi Dominique, > > This script can be helpful: > https://github.com/Softcatala/catalan-dict-tools/blob/master/build-morfologik-lt.sh > > Regards, > Jaume Ortolà Thanks Jaume. That was useful and I could upgrade the French dictionaries. I will update the developer's

Re: French: detecting errors with statistics

2016-04-01 Thread Dominique Pellé
Daniel Naber wrote: > Hi, > > even though I don't speak French, I've started adding confusion pairs > for French. Here's an example from fr/confusion_sets.txt: > > quand; quant; 100# p=1.000, > r=0.662, 186+988, 3grams,

Re: mvn clean package fails

2016-03-26 Thread Dominique Pellé
Dominique Pellé wrote: > Hi > > Running "mvn clean package" fails on my xubuntu-14.04.4 > Linux machine. > > Does anybody know why? Here is the log: Replying to myself. I fixed it by removing the ~/.m2 directory so mvn re-downloaded all packages. I'm not sure

mvn clean package fails

2016-03-26 Thread Dominique Pellé
Hi Running "mvn clean package" fails on my xubuntu-14.04.4 Linux machine. Does anybody know why? Here is the log: === BEGIN QUOTE === pel@pel-laptop:~/sb/languagetool$ mvn clean package [INFO] Scanning for projects... [INFO]

Re: Android Spellchecker using LanguageTool

2016-02-27 Thread Dominique Pellé
Andriy Rysin wrote: > Thanks guys, just a little note that I would be nice to have context > for strings like %d (last %s) as translating that without context is > hard. > > Thanks > Andriy I agree with Andry. In Transifex, we can add comment about the strings. Can this be

Re: Android Spellchecker using LanguageTool

2016-02-27 Thread Dominique Pellé
Jordi Mas wrote: > Hello guys, > > LanguageTool Proofreader is available for download in Google Play: > > https://play.google.com/store/apps/details?id=org.softcatala.corrector > > We did a small beta with Catalan users and we have around 200 active users. > > It currently

Re: regex performance in tests

2016-02-22 Thread Dominique Pellé
Daniel Naber wrote: > Hi, > > there's a regex that makes tests quite slow in PatternTestTools.java: > >CHAR_SET_PATTERN = > Pattern.compile("(\\(\\?-i\\))?.*(? > I don't fully understand it, does it need to be that complicated? If I > simplify it like this: > >

Re: New Language - constraint grammar importing tool

2016-01-29 Thread Dominique Pellé
curon wrote: > A few years ago I started looking at developing An Gramadóir, > as work had already been done for the Welsh language. > Unfortunately this project has had no development for some > time, and the only proprietary checker is fairly limited. > I did have my eye on LanguageTool, but

Re: Multithreaded LT optimization (take 2)

2016-01-28 Thread Dominique Pellé
Andriy Rysin wrote: > Then I realized that in the check method we split rules into callables > and their count is # of cores available (in my case 8), as I have 347 > rules this means each bucket is 43 rules and rules being not equal in > complexity this could lead to quite unequal time for each

Re: new feature: show examples on languagetool.org

2016-01-20 Thread Dominique Pellé
Daniel Naber <daniel.na...@languagetool.org> wrote: > On 2016-01-21 04:22, Dominique Pellé wrote: > >> It's still wrong in a different way now: >> I no longer see the correct examples if I click >> on "Examples..." for the word "ankaux" in

Re: new feature: show examples on languagetool.org

2016-01-20 Thread Dominique Pellé
Daniel Naber <daniel.na...@languagetool.org> wrote: > On 2016-01-20 18:13, Dominique Pellé wrote: > > Hi Dominique, > > thanks for testing. > >> I see a bug though. Going to https://languagetool.org/eo/ >> then clicking on the highlighted error "ankaux&q

Re: new feature: show examples on languagetool.org

2016-01-20 Thread Dominique Pellé
Daniel Naber wrote: > Hi, > > I've added a new feature on https://languagetool.org: in the menu of > every error you can now open a dialog that shows some examples of the > error. Note that a few rules don't have examples (Java rules - the XML > rules should all

Re: introduce new color for style errors

2016-01-15 Thread Dominique Pellé
Daniel Naber wrote: > On 2016-01-04 13:36, Daniel Naber wrote: > >> are difficult to find. I suggest to: >> >> 1) introduce a less intrusive color for these errors, e.g. a light >> yellow > > > There's now a new color on languagetool.org for some categories in

Re: introduce new color for style errors

2016-01-15 Thread Dominique Pellé
Daniel Naber <daniel.na...@languagetool.org> wrote: > On 2016-01-15 11:46, Dominique Pellé wrote: > >> I think that we need to increase the color difference between >> slightly blue highlighting for style errors and the white background. > > Could you send color cod

Re: introduce new color for style errors

2016-01-09 Thread Dominique Pellé
Daniel Naber wrote: > On 2016-01-04 13:36, Daniel Naber wrote: > >> are difficult to find. I suggest to: >> >> 1) introduce a less intrusive color for these errors, e.g. a light >> yellow > > > There's now a new color on languagetool.org for some categories in

Re: Anybody using Discourse?

2016-01-03 Thread Dominique Pellé
Daniel Naber wrote: > On 2016-01-03 13:42, Daniel Naber wrote: > > > The migration to a new forum is now in progress. The old forum has been > > set to read-only, its contents will be migrated to the new forum. I'll > > send a notice with the new forum's address as

Re: new syntax available

2015-12-31 Thread Dominique Pellé
2015-12-29 22:07 GMT+01:00 Dominique Pellé <dominique.pe...@gmail.com>: > Daniel Naber <daniel.na...@languagetool.org> wrote: > > > On 2015-10-14 14:01, Dominique Pellé wrote: > ... > >> It would also be useful if each group captured in the regexp >

Re: new syntax available

2015-12-29 Thread Dominique Pellé
Daniel Naber <daniel.na...@languagetool.org> wrote: > On 2015-10-14 14:01, Dominique Pellé wrote: ... >> It would also be useful if each group captured in the regexp >> could be re-used with \1 \2 \3 etc. (or ...) inside >> the or . > > That's possible already

Need help to update the Breton spelling FSA dictionary

2015-12-19 Thread Dominique Pellé
Hi I'm trying to update the FSA spelling dictionary for Breton but I have a problem. I had a script using Morfologik which used to work: languagetool-language-modules/br/src/main/resources/org/languagetool/resource/br/hunspell/create-fsa-spell-dictionary.sh ... but I see that

Re: Need help to update the Breton spelling FSA dictionary

2015-12-19 Thread Dominique Pellé
Daniel Naber wrote: > On 2015-12-19 17:31, Dominique Pellé wrote: > >> org.languagetool.dev.SpellDictionaryBuilder \ > > Actually the class is deprecated, its non-deprecated version is now at > org.languagetool.tools.SpellDictionaryBuilder and it should also have

Re: Need help to update the Breton spelling FSA dictionary

2015-12-19 Thread Dominique Pellé
Daniel Naber <daniel.na...@languagetool.org> wrote: > On 2015-12-19 22:18, Dominique Pellé wrote: > >> I've mentioned it in the past, I find surprising that most >> languages do not provide the scripts that they use to create >> binary dictionaries. Providing s

Broken URL in grammar.xml of Catalan, English, Dutch and Polish

2015-12-19 Thread Dominique Pellé
Hi I used the attached script to find 22 broken URL in grammar.xml of Catalan, English, Dutch, Polish: $ cd languagetool $ ./test-broken-url.sh Checking [languagetool-language-modules/ast/src/main/resources/org/languagetool/rules/ast/grammar.xml]... Checking

Question before updating French and Breton dictionaries

2015-12-14 Thread Dominique Pellé
Hi I would like to update the French and Breton POS tag dictionaries, ideally before the next LanguageTool release. However, I'm asking whether that's OK as I read that updating the dictionaries is problematic with git (it increases the size of the git repository). So what to do? Should I wait

Re: LanguageTool in 2015 + the future

2015-12-08 Thread Dominique Pellé
Daniel Naber wrote: > Hi, > > the year is slowly coming to an end, so I thought I'd try to summarize what > we've achieved this year and how we can move LT forward in the future. In > 2015, we... > > * made three releases so far (2.9, 3.0, 3.1), another one is

Re: help with suggested Dutch rule

2015-11-10 Thread Dominique Pellé
Daniel Naber wrote: > Hi, > > could a Dutch native speaker have a look at this rule, is it okay? > > http://languagetool-user-forum.2306527.n4.nabble.com/New-Dutch-rule-referentie-used-as-an-Anglicism-td4643279.html > > Regards > Daniel My Dutch is too rudimentary to assess the rule. However,

Re: False error given on the online Esperanto checker, can't reproduce it in command line

2015-10-29 Thread Dominique Pellé
Daniel Naber <daniel.na...@languagetool.org> wrote: On 2015-10-29 22:58, Dominique Pellé wrote: > > > I can't make sense of it. And I can't reproduce the > > error in the command line either since this gives no > > error: > > *Maybe* t

Re: False error given on the online Esperanto checker, can't reproduce it in command line

2015-10-29 Thread Dominique Pellé
On Thu, Oct 29, 2015 at 11:30 PM, Jaume Ortolà i Font wrote: > Hi Dominique, > > When there is no space at the end of the sentence, the last token has the > POS tag "PARA_END", and this tag makes rule match: > > > > You can see the difference (with space vs without

False error given on the online Esperanto checker, can't reproduce it in command line

2015-10-29 Thread Dominique Pellé
Hi Here is presumably a bug which I do not understand. Hopefully someone can help. If I copy/paste the following 3-word sentence in the Esperanto grammar checker at https://www.languagetool.org/eo/ Pri la kategorio ... and then press the button "Check text", LT highlights the last 2 words in

Re: LanguageTool speed measurements, from LT-1.8 till LT-3.2-SNAPSHOT

2015-10-25 Thread Dominique Pellé
Daniel Naber <daniel.na...@languagetool.org> wrote: On 2015-10-25 05:13, Dominique Pellé wrote: > > Hi Dominique, > > > I measured LT speed using command line version of LanguageTool. > > Recorded numbers are user time reported by Linux time command. > > thanks

Re: LanguageTool speed measurements, from LT-1.8 till LT-3.2-SNAPSHOT

2015-10-24 Thread Dominique Pellé
Dominique Pellé wrote: Multi-threading was introduced in LT-2.7 but above numbers don't show > improvements. Maybe I needed to use a bigger document than 500 lines. > I need to correct this: it's LT-2.3 which introduced multi-threading. I also made more measurements with older versions.

LanguageTool speed measurements, from LT-1.8 till LT-3.2-SNAPSHOT

2015-10-24 Thread Dominique Pellé
Hi I measured LT speed using command line version of LanguageTool. Recorded numbers are user time reported by Linux time command. Measurements were made on my laptop: - xubuntu-14.04.3 - i5-3317U CPU, 1.7Ghz, 4 cores, SSD - java version "1.8.0_60" I measured: - several versions of LT (from 1.8

Re: Rule to check common mistakes in URL

2015-10-24 Thread Dominique Pellé
Purodha Blissenbach wrote: > Hi, > >>http:/www.google.com (there should be 2 slashes after >> protocole) > > This is valid, at least protocolwise. I refers to a directory > /www.google.com on the current server. Good warning, of course, if there > is at least a

Rule to check common mistakes in URL

2015-10-23 Thread Dominique Pellé
Hi I've added a rule in French grammar.xml to check for common mistakes in URLs in this checkin: https://github.com/languagetool-org/languagetool/commit/4bd2109242ad02f2d50e1f597580764a1dd45d97 Some examples of mistakes detected: http//www.google.com (missing colon)

Re: using ngram data to detect errors

2015-10-19 Thread Dominique Pellé
Daniel Naber wrote: > Hi, > > this is just a reminder that the data for statistical error detection > exists, now people just need to use it... > > Regards > Daniel Hi Daniel Yes, I have not forgotten, but I really have little time at home these days. I can

Re: new syntax available

2015-10-14 Thread Dominique Pellé
Daniel Naber wrote: > On 2015-10-11 12:31, Daniel Naber wrote: > > >> Use of "exact-meaning" would be very rare. > >> Maybe a better name: > > > > I think that's okay with me, but I need to think more about it. Maybe > > the easiest implementation would be to just

Re: nightly regressions

2015-10-13 Thread Dominique Pellé
Daniel Naber wrote: > Hi, > > we have quite some changes in the nightly tests today. I'm not sure what > the cause is, could you check your language and see if the changes are > good or bad? > > https://languagetool.org/regression-tests/20151013/ > > Regards >

Re: Behavior of non-breaking space U+00A0 in LanguageTool

2015-10-12 Thread Dominique Pellé
Andre Couture wrote: > Hi > I did not follow the entire conversation here but I was curious as of why > would someone put a non breaking space between two words? > We face that in other areas of our code as well. > > If the idea of the nbsp is to keep the two apparent words together, would > it

Several in same rule?

2015-10-11 Thread Dominique Pellé
Hi Would be possible to allow for several tags in the same rule? It seems that we can only give one. I'd like to be able to use several (at least 2): * one to make sure that part of regexp matches a postag * another one to make sure that part of the regexp does not match a postag I tried

Behavior of non-breaking space U+00A0 in LanguageTool

2015-10-11 Thread Dominique Pellé
Hi Consider this very simple rule in the English grammar.xml: egg yoke The rule works fine of the 2 words are separated with at least spaces, tabs or newlines. However, it does not work when the 2 words are separated with a non-breaking space (U+000A0). I wonder why. With a

Re: new syntax available

2015-10-09 Thread Dominique Pellé
Daniel Naber wrote: > On 2015-10-09 07:32, Dominique Pellé wrote: > >> I suppose that I care more than most because I only use LT to check >> text files where the situation is frequent. > > I think normalizing the text makes sense if: > 1) single line breaks get re

Re: new syntax available

2015-10-08 Thread Dominique Pellé
Daniel Naber wrote: > On 2015-10-08 06:59, Dominique Pellé wrote: >> ... then the regexp rule does not detect all the errors >> that the rule detected. It does not detect errors >> in "foo bar" (2 spaces or more, or tabs) or when there is a >> new line

Re: new syntax available

2015-10-08 Thread Dominique Pellé
Mike Unwalla wrote: > I agree with Purodha. Do not be 'smart'. Do not change the meaning of a > regexp. > > Regards, > > Mike Unwalla OK. It looks like the majority does not want to pre-processs the sentence to remove consecutive spaces (including tabs, dos/unix new

Re: new syntax available

2015-10-07 Thread Dominique Pellé
Daniel Naber wrote: > On 2015-10-07 06:41, Dominique Pellé wrote: > > Hi Dominique, > > thanks for your feedback. One more remark: If I replace a rule like... foo bar ... into ... foo bar ... then the regexp rule does not detect all the errors that the rule de

Re: new syntax available

2015-10-07 Thread Dominique Pellé
Daniel Naber <daniel.na...@languagetool.org> wrote: > On 2015-10-07 06:41, Dominique Pellé wrote: > > Hi Dominique, > > thanks for your feedback. > >> 1) How do I highlight only a subset of the match? Trying the above >> rule, I see this: > > Th

Re: new syntax available

2015-10-06 Thread Dominique Pellé
Daniel Naber wrote: > Hi, > > there's now a first and limited implementation of the syntax in > master. Instead of > > foo > > you can now use > > foo > > But be aware that this is a real regular expression that ignores tokens, > so it matches anything with the

Invalid XML in Dutch grammar.xml

2015-10-05 Thread Dominique Pellé
Hi I noticed that the Dutch rule OT_DOOR_DE_WAR contains invalid XML. See the spurious > after the word "in" in the 2 lines below: Juist is in> de war. Het ligt door de war. I also wonder why LT accepts the XML file without giving errors. Regards Dominique

Re: Invalid XML in Dutch grammar.xml

2015-10-05 Thread Dominique Pellé
Dominique Pellé wrote: > Hi > > I noticed that the Dutch rule OT_DOOR_DE_WAR > contains invalid XML. See the spurious > after the > word "in" in the 2 lines below: > > Juist is in> de war. > Het ligt door de war. > > I also wonder why LT accepts t

Re: Idea to introduce tag in LT grammar rules.

2015-09-28 Thread Dominique Pellé
Daniel Naber <daniel.na...@languagetool.org> wrote: > On 2015-09-05 22:53, Dominique Pellé wrote: > >> It is similar to what Daniel wrote earlier as well: >> >> a (plein temps|chaque fois|rude épreuve|vol >> d’oiseau) >> >> It would make some su

Common English grammar errors & misspellings on wikipedia

2015-09-28 Thread Dominique Pellé
Hi I'm sharing a link that looks useful for the English LanguageTool: https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/Grammar_and_miscellaneous Regards Dominique --

Re: languagetool.org on mobile

2015-09-11 Thread Dominique Pellé
Daniel Naber wrote: > Hi, > > on mobile, languagetool.org doesn't show the textarea where you can try > LT. Originally, this was on purpose, but nowadays smartphone displays > have good resolution and I think we should show it at least for modern > devices. The

Re: Idea to introduce tag in LT grammar rules.

2015-09-07 Thread Dominique Pellé
Daniel Naber <daniel.na...@languagetool.org> wrote: > On 2015-09-05 22:53, Dominique Pellé wrote: > >> It is similar to what Daniel wrote earlier as well: >> >> a (plein temps|chaque fois|rude épreuve|vol >> d’oiseau) > > So instead of ... we wou

Re: Idea to introduce tag in LT grammar rules.

2015-09-05 Thread Dominique Pellé
Jaume Ortolà i Font <jaumeort...@gmail.com> wrote: 2015-09-05 16:11 GMT+02:00 Daniel Naber <daniel.na...@languagetool.org>: > >> On 2015-09-04 23:21, Dominique Pellé wrote: >> >> > I wish I could write a rule pattern like this: >> > >&g

Idea to introduce tag in LT grammar rules.

2015-09-04 Thread Dominique Pellé
Hi Say I want to detect invalid use of word "a" (= has, verb) instead of "à" (= at, preposition) in many French expressions such as: a nouveau -> à nouveau a plein temps -> à plein temps a rude épreuve -> à rude épreuve a vol d'oiseau -> à vol d'oiseau etc. I wish I could write a

Re: regular expression detection inside token

2015-05-03 Thread Dominique Pellé
Andriy Rysin ary...@gmail.com wrote: I started working on some abbreviations with dots in Ukrainian and added some of them to the dictionary. But now when I specify tokenрр./token in rules LT warns me: token [2], contains рр. that is not marked as regular expression but probably is one I

Running grammar checks in one language only

2015-02-01 Thread Dominique Pellé
Hi I used to check grammar rules in one language only using: $ mvn —projects languagetool-language-modules/fr —also-make clean test It's documented here: http://wiki.languagetool.org/maven-tips It used to work, but it does not work anymore. It gives this error: [ERROR] BUILD FAILURE [INFO]

Re: Running grammar checks in one language only

2015-02-01 Thread Dominique Pellé
Daniel Naber daniel.na...@languagetool.org wrote: On 2015-02-01 13:22, Dominique Pellé wrote: $ mvn —projects languagetool-language-modules/fr —also-make clean test You need to use two dashes (--) instead of — for the 'projects' and 'also-make' parameter. I'll fix the Wiki page. Regards

Re: Improving spelling suggestions with frequency dictionaries

2014-12-23 Thread Dominique Pellé
Hi I have this script... languagetool/languagetool-language-modules/fr/src/main/resources/org/languagetool/resource/fr/create-lexicon.sh ... which works by assuming that SynthDictionaryBuilder java program creates its output files in /tmp/... But it would be trivial to modify the script if -o

Re: switching from Hunspell to Morfologik

2014-10-14 Thread Dominique Pellé
Daniel Naber daniel.na...@languagetool.org wrote: Hi, to provide LT as a 100% pure Java software, I'd like to switch from Hunspell (native code) to Morfologik (Java-based). For that, I think the following languages are easy to switch: Asturian Galician Khmer Spanish

Re: And an issue, for Dutch ...

2014-10-08 Thread Dominique Pellé
R.J. Baars wrote: A long time ago, I chose to have the - as a word char, not separating word parts that really belong together. That is now in the way for the date rules, since a normal date in Dutch can also be 15-1-1958. Is there a solution for this issue? Like tokenizing when the dash

Re: improving LT coverage

2014-10-08 Thread Dominique Pellé
Daniel Naber daniel.na...@languagetool.org wrote: Hi, I'm looking for ideas to systematically improve LT's error coverage. The last years, I've mostly worked by simply adding rules for errors that I coincidentally found on the web or in emails. Is anybody of you working systematically in

Re: Duplicate entries in compounds.txt in ru, nl

2014-10-07 Thread Dominique Pellé
Yakov Reztsov yakovr...@mail.ru wrote: Hi, Mon, 6 Oct 2014 21:47:40 +0200 от Dominique Pellé: Hi I've noticed that the Russian and Dutch compounds.txt files contain duplicate entries. Either the dupes should be removed, or maybe some of the dupe were meant to be the plural form

Re: looking for more semantic rules

2014-10-07 Thread Dominique Pellé
Hi Ruud You can have a look at the Java files DateCheckFilter.java for Catalan, Breton or Esperanto, for which there is also no Java locale. Dominique PM, R.J. Baars r.j.ba...@xs4all.nl wrote: About more semantic rule, what about time consistency? About the date check, I have been looking

Duplicate entries in compounds.txt in ru, nl

2014-10-06 Thread Dominique Pellé
Hi I've noticed that the Russian and Dutch compounds.txt files contain duplicate entries. Either the dupes should be removed, or maybe some of the dupe were meant to be the plural form or some other flexions. Can the language maintainers check the duplicate entries in the following compounds.txt

Re: Duplicate entries in compounds.txt in ru, nl

2014-10-06 Thread Dominique Pellé
Hi Ruud Duplicate entries are at best not necessary, so they should be removed. But at worse, it can be that the intention was to put a plural for example. I found such errors in the French compound.txt where I had the word casse-gueule twice instead of having casse-gueule and casse-gueules.

Bug with \realDay in data checking rule if it appears before \1 \2 \3....

2014-09-22 Thread Dominique Pellé
Hi I noticed a bug in the date checking rule: the \1 \2 (etc) substitutions in message do not work when they appear after \realDay. I noticed this while writing the Breton date rule. I had to change the message somehow to so that \realDay appeared at the end of the message to make it work. Here

Re: Some advice needed

2014-09-15 Thread Dominique Pellé
R.J. Baars r.j.ba...@xs4all.nl wrote: There is an official advice for Dutch, stating that for understandable reading, an average of no more than 12 words a sentence is required. Since I can only make rule per sentence, I made a rule, warning for sentences of more than 18 words. That rule

Re: new date/weekday consistency rule

2014-09-13 Thread Dominique Pellé
Daniel Nab er wrote: Hi, I've implemented a 'filter' element for XML which can be used to modify, keep, or reject a rule match. The first use case is a rule that checks if a weekday matches its date, e.g. Monday, 7 October 2014 is inconsistent, as 2014-10-07 is not a Monday. The rule for

Re: Suggestion: find POS tag of portion of a word in XML rules

2014-09-10 Thread Dominique Pellé
Marcin Miłkowski list-addr...@wp.pl wrote: W dniu 2014-09-09 23:10, Dominique Pellé pisze: Daniel Naber daniel.na...@languagetool.org mailto:daniel.na...@languagetool.org wrote: On 2014-09-09 22:38, Dominique Pellé wrote: * why does your example give a message

Re: Suggestion: find POS tag of portion of a word in XML rules

2014-09-10 Thread Dominique Pellé
Marcin Miłkowski list-addr...@wp.pl wrote: W dniu 2014-09-10 11:34, Dominique Pellé pisze: Marcin Miłkowski list-addr...@wp.pl mailto:list-addr...@wp.pl wrote: W dniu 2014-09-09 23:10, Dominique Pellé pisze: Daniel Naber daniel.na...@languagetool.org mailto:daniel.na

Re: Suggestion: find POS tag of portion of a word in XML rules

2014-09-09 Thread Dominique Pellé
Daniel Naber daniel.na...@languagetool.org wrote: On 2014-04-27 22:18, Dominique Pellé wrote: I wish I could check the POS tag of a portion of a token. (Replying to an old thread here...) I think the new rule filter offers a solution for this that does not require any changes to the XML

Re: Bug is disambiguator?

2014-09-03 Thread Dominique Pellé
disambig action=filter postag=N.*/ /rule Regards, Jaume Ortolà 2014-09-03 6:22 GMT+02:00 Dominique Pellé dominique.pe...@gmail.com: Hi Have a look in the following debug output of LanguageTool where a token gets non-sensical POS tag N.* (multiple times) after

Bug is disambiguator?

2014-09-02 Thread Dominique Pellé
Hi Have a look in the following debug output of LanguageTool where a token gets non-sensical POS tag N.* (multiple times) after a disambiguation rule is applied. Is it a bug in the disambiguator? Or am writing an incorrect disambiguation rule? $ echo An eil| java -jar

Re: Current limitations of MorfologikSpeller

2014-09-01 Thread Dominique Pellé
Marcin Miłkowski list-addr...@wp.pl wrote: W dniu 2014-09-01 20:04, Daniel Naber pisze: Hi, our Wiki at http://wiki.languagetool.org/hunspell-support says ICONV/OCONV isn't supported in Morfologik, but I see there are the fsa.dict.input-conversion and fsa.dict.output-conversion options. So

Re: Questions about new date checking rule

2014-08-30 Thread Dominique Pellé
Daniel Naber daniel.na...@languagetool.org wrote: On 2014-08-29 21:50, Dominique Pellé wrote: Message: The date 31 September 2014 is not a Monday, but a Wednesday. Monday, 31 September 2014 I've now made date parsing more strict, but the rule won't complain about these dates and just

Re: Questions about new date checking rule

2014-08-29 Thread Dominique Pellé
R.J. Baars r.j.ba...@xs4all.nl wrote: A different question: what about dates like '08-07-2014'or '2014/08/07' One cannot tell which is month and which is day, isn't it? Are both options considered then? And what of notations '04/05/06'; it is completely unclear which is month, year and day.

Re: Questions about new date checking rule

2014-08-29 Thread Dominique Pellé
Daniel Naber daniel.na...@languagetool.org wrote: On 2014-08-29 07:47, Dominique Pellé wrote: Would there be a way to say something like instead: The date October 7, 2014 is not a Monday, but a Tuesday. This is now implemented, you can use \realDay in your message and it will be replaced

Questions about new date checking rule

2014-08-28 Thread Dominique Pellé
Hi I saw that date checking was added to LT. Thanks for that. I've added support for date checkin in French (as was done already in en, de, pl, ca). I have 2 remarks: 1) LT detects date inconsistency in French as in: * Vendredi 28/08/2014 (it should be a Thursday, not a Friday) * Vendredi 28

Re: Rule not working as expected

2014-08-19 Thread Dominique Pellé
R.J. Baars r.j.ba...@xs4all.nl wrote: I discovered that the rule below is not working very well. It look like 'skip' also skips over sentence boundaries. Is that intentional? Or is something else wrong? In case it is intentional, is there an option to forbid that? Ruud rule id=nr738

Re: Build broken, French Synthesizer

2014-08-06 Thread Dominique Pellé
Daniel Naber daniel.na...@languagetool.org wrote: On 2014-08-06 14:13, Juan Martorell wrote: testSynthesizeStringString (java.lang.Error: Unresolved compilation problem: The declared package does not match the expected package org.languagetool.synthesis.fr [1] I cannot reproduce that

Is exception\2/exception supposed to work?

2014-07-17 Thread Dominique Pellé
Hi I wrote a French rule which contains exception\2/exception but it does not work. Should things like \2 work inside exception.../exception? The rule checks that the two words vu, vus, vue or vues are identical as in vu de mes yeux vu (correct), vus de mes yeux vus (correct), and it's supposed

Re: Is exception\2/exception supposed to work?

2014-07-17 Thread Dominique Pellé
Daniel Naber daniel.na...@languagetool.org wrote On 2014-07-17 08:20, Dominique Pellé wrote: Should things like \2 work inside exception.../exception? match no=2/ should work. Regards Daniel Hi Daniel exceptionmatch no=2//exception does not work either just like exception\2

Re: Is exception\2/exception supposed to work?

2014-07-17 Thread Dominique Pellé
On Thu, Jul 17, 2014 at 9:18 AM, Daniel Naber daniel.na...@languagetool.org wrote: On 2014-07-17 08:59, Dominique Pellé wrote: exceptionmatch no=2//exception does not work either Okay, I thought it worked because I see it's being used in the Polish grammar.xml. But maybe it never worked

Re: Is exception\2/exception supposed to work?

2014-07-17 Thread Dominique Pellé
Daniel Naber daniel.na...@languagetool.org wrote: On 2014-07-17 10:52, Dominique Pellé wrote: I glanced at the Polish grammar.xml, but I could not find such rules. Sorry, I guess my grep command was wrong and I actually found match outside the exception element. cvc-complex-type.2.4.d

Re: Is exception\2/exception supposed to work?

2014-07-17 Thread Dominique Pellé
On Thu, Jul 17, 2014 at 2:49 PM, Dominique Pellé dominique.pe...@gmail.com wrote: Daniel Naber daniel.na...@languagetool.org wrote: On 2014-07-17 10:52, Dominique Pellé wrote: I glanced at the Polish grammar.xml, but I could not find such rules. Sorry, I guess my grep command was wrong

Re: New Member to LT - for Tamil

2014-07-14 Thread Dominique Pellé
Elanjelian Venugopal tamil...@gmail.com wrote: Added a second rule group to the grammar.xml Hi Elanjelian Since you have non-trivial suggestions with \1 etc. such as suggestion\1க்\2/suggestion, I would advise to use correction='...'. Ex, replace... example type='incorrect'புது markerமா

Re: Questions about creating a synthesizer dictionary

2014-07-13 Thread Dominique Pellé
Daniel Naber daniel.na...@languagetool.org wrote: On 2014-07-11 22:43, Dominique Pellé wrote: 1/ Why does the above command create files in /tmp rather than providing command line options to specify the outputs? There's no specific reason that I can remember. Feel free to change

Questions about creating a synthesizer dictionary

2014-07-11 Thread Dominique Pellé
Hi I'd like to create a synthesizer dictionaries for French and Breton in order to be able to give better suggestions based on the synthesizer. I just started to experiment based on information http://wiki.languagetool.org/developing-a-tagger-dictionary#toc9 and I see that I can create a

Found in a few grammar.xml files (en, de, ru)

2014-05-28 Thread Dominique Pellé
Hi Searching for in grammar.xml files, I see things that are wrong, or at least suspicious: $ ack-grep --xml '' languagetool-language-modules/*/src languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/grammar.xml 25390:token negate=yes/token 25400:

incorrect antipattern IDs (bug in XML parser?) + antipattern sanity check

2014-05-05 Thread Dominique Pellé
Hi I've added antipattern sanity checks. It detects some problems in antipatterns for German and Polish. However, I have not checked-in yet because the antiPattern.getId() is incorrect. It seems to contain the ID of the previous rule, rather than the rule owning the antipattern. I believe that

Another regexp sanity check for things like .|;

2014-05-04 Thread Dominique Pellé
Hi For your information, I've added yet another sanity check for regexp in grammar disambiguation files in checkin 8838a7edef0f7a24d5c63533df7a15fc154c777d It finds regexps that are most certainly wrong such as .|; Since the dot can match any char, the ; in the disjunction is useless. There is

Possible bug in XML rule/disambiguation parsing

2014-05-03 Thread Dominique Pellé
Hi I've added a new pattern rule checker (commit commit e26967dc4663283574a8d536308c13ad188b44a0) and it finds this issue: The Catalan rule: FORCA2:6, token [1], contains força that contains token separators, so can't possibly be matched. The Catalan rule: FORCA2:7, token

Re: What is wrong with this rule (pt_PT)?

2014-04-30 Thread Dominique Pellé
Jaume Ortolà i Font jaumeort...@gmail.com wrote: Marco, You have a token with vela/velas and then another with bandeira/bandeiras. The rule expects a sentence like arrrear a vela bandeira. Instead of token regexp=yesvela|velas/token token

Re: Suggestion: find POS tag of portion of a word in XML rules

2014-04-28 Thread Dominique Pellé
Daniel Naber daniel.na...@languagetool.org wrote: On 2014-04-27 22:18, Dominique Pellé wrote: token regexp=yes postag_group1=fooez-(.*)/token I'm not sure how this could be implemented in a clean way... wouldn't this be a rather ugly special case in the tagger to ignore the tokenization

Suggestion: find POS tag of portion of a word in XML rules

2014-04-27 Thread Dominique Pellé
Hi I wish I could check the POS tag of a portion of a token. For example, in a Breton word such as ez-c'hlas, I wish I could check the POS tag of c'hlas in XML rules. I don't think that's currently possible, unless: - I write a Java rule - or I change the tokenizer to split on hyphen - but

  1   2   3   >