Jesper wrote:
> It looks very strange to me to include ".*" in a replacement expression.
I understand that it looks strange. But in some cases, the result of
replacement
is a regexp. That's why regexp syntax can appear inside the
regexp_replace="".
I see other examples in:
- the Polish
Hi
There are a few broken URL in the Catalan and Polish grammar,xml files:
Catalan:
http://esadir.cat/entrades/fitxa/node/maestesa
Polish:
http://www.ekorekta24.pl/porady-jezykowe/19-interpunkcja/188-lata-90-te-lata-90-czy-lata-90-jak-zapisac-liczebniki-porzadkow
Daniel Naber wrote:
> Hi,
>
> yesterday I tried to update the English dictionary that LT includes. The
> details are documented at
> https://github.com/languagetool-org/languagetool/issues/329 but in a
> nutshell: our spell checking is so complicated that the
Daniel Naber <daniel.na...@languagetool.org> wrote:
> On 2015-10-11 11:58, Dominique Pellé wrote:
>
>> Would be possible to allow for several tags
>> in the same rule?
>
> I don't think it's very difficult. I could put it on my TODO list, but I
> cannot make an
Daniel Naber wrote:
> Hi,
>
> we now have a new JSON API. The keep our software from getting too
> complex, this means we should remove the old XML-based API. Here's a
> road map how we could do that:
>
> https://languagetool.org/http-api/migration.php
>
> Comments
Jaume Ortolà i Font wrote:
> Hi,
>
> I think Marcin talked about this idea some time ago.
>
> Sometimes tokens like quotations (or other characters) should be ignored in
> some rules. That is, the sentence should be checked as if this token is not
> present. Any idea about
Jaume Ortolà wrote:
> Hi Dominique,
>
> This script can be helpful:
> https://github.com/Softcatala/catalan-dict-tools/blob/master/build-morfologik-lt.sh
>
> Regards,
> Jaume Ortolà
Thanks Jaume. That was useful and I could upgrade the
French dictionaries.
I will update the developer's
Daniel Naber wrote:
> Hi,
>
> even though I don't speak French, I've started adding confusion pairs
> for French. Here's an example from fr/confusion_sets.txt:
>
> quand; quant; 100# p=1.000,
> r=0.662, 186+988, 3grams,
Dominique Pellé wrote:
> Hi
>
> Running "mvn clean package" fails on my xubuntu-14.04.4
> Linux machine.
>
> Does anybody know why? Here is the log:
Replying to myself. I fixed it by removing the ~/.m2 directory
so mvn re-downloaded all packages. I'm not sure
Hi
Running "mvn clean package" fails on my xubuntu-14.04.4
Linux machine.
Does anybody know why? Here is the log:
=== BEGIN QUOTE ===
pel@pel-laptop:~/sb/languagetool$ mvn clean package
[INFO] Scanning for projects...
[INFO]
Andriy Rysin wrote:
> Thanks guys, just a little note that I would be nice to have context
> for strings like %d (last %s) as translating that without context is
> hard.
>
> Thanks
> Andriy
I agree with Andry. In Transifex, we can add comment
about the strings. Can this be
Jordi Mas wrote:
> Hello guys,
>
> LanguageTool Proofreader is available for download in Google Play:
>
> https://play.google.com/store/apps/details?id=org.softcatala.corrector
>
> We did a small beta with Catalan users and we have around 200 active users.
>
> It currently
Daniel Naber wrote:
> Hi,
>
> there's a regex that makes tests quite slow in PatternTestTools.java:
>
>CHAR_SET_PATTERN =
> Pattern.compile("(\\(\\?-i\\))?.*(?
> I don't fully understand it, does it need to be that complicated? If I
> simplify it like this:
>
>
curon wrote:
> A few years ago I started looking at developing An Gramadóir,
> as work had already been done for the Welsh language.
> Unfortunately this project has had no development for some
> time, and the only proprietary checker is fairly limited.
> I did have my eye on LanguageTool, but
Andriy Rysin wrote:
> Then I realized that in the check method we split rules into callables
> and their count is # of cores available (in my case 8), as I have 347
> rules this means each bucket is 43 rules and rules being not equal in
> complexity this could lead to quite unequal time for each
Daniel Naber <daniel.na...@languagetool.org> wrote:
> On 2016-01-21 04:22, Dominique Pellé wrote:
>
>> It's still wrong in a different way now:
>> I no longer see the correct examples if I click
>> on "Examples..." for the word "ankaux" in
Daniel Naber <daniel.na...@languagetool.org> wrote:
> On 2016-01-20 18:13, Dominique Pellé wrote:
>
> Hi Dominique,
>
> thanks for testing.
>
>> I see a bug though. Going to https://languagetool.org/eo/
>> then clicking on the highlighted error "ankaux&q
Daniel Naber wrote:
> Hi,
>
> I've added a new feature on https://languagetool.org: in the menu of
> every error you can now open a dialog that shows some examples of the
> error. Note that a few rules don't have examples (Java rules - the XML
> rules should all
Daniel Naber wrote:
> On 2016-01-04 13:36, Daniel Naber wrote:
>
>> are difficult to find. I suggest to:
>>
>> 1) introduce a less intrusive color for these errors, e.g. a light
>> yellow
>
>
> There's now a new color on languagetool.org for some categories in
Daniel Naber <daniel.na...@languagetool.org> wrote:
> On 2016-01-15 11:46, Dominique Pellé wrote:
>
>> I think that we need to increase the color difference between
>> slightly blue highlighting for style errors and the white background.
>
> Could you send color cod
Daniel Naber wrote:
> On 2016-01-04 13:36, Daniel Naber wrote:
>
>> are difficult to find. I suggest to:
>>
>> 1) introduce a less intrusive color for these errors, e.g. a light
>> yellow
>
>
> There's now a new color on languagetool.org for some categories in
Daniel Naber wrote:
> On 2016-01-03 13:42, Daniel Naber wrote:
>
> > The migration to a new forum is now in progress. The old forum has been
> > set to read-only, its contents will be migrated to the new forum. I'll
> > send a notice with the new forum's address as
2015-12-29 22:07 GMT+01:00 Dominique Pellé <dominique.pe...@gmail.com>:
> Daniel Naber <daniel.na...@languagetool.org> wrote:
>
> > On 2015-10-14 14:01, Dominique Pellé wrote:
> ...
> >> It would also be useful if each group captured in the regexp
>
Daniel Naber <daniel.na...@languagetool.org> wrote:
> On 2015-10-14 14:01, Dominique Pellé wrote:
...
>> It would also be useful if each group captured in the regexp
>> could be re-used with \1 \2 \3 etc. (or ...) inside
>> the or .
>
> That's possible already
Hi
I'm trying to update the FSA spelling dictionary for Breton but I have
a problem. I had a script using Morfologik which used to work:
languagetool-language-modules/br/src/main/resources/org/languagetool/resource/br/hunspell/create-fsa-spell-dictionary.sh
... but I see that
Daniel Naber wrote:
> On 2015-12-19 17:31, Dominique Pellé wrote:
>
>> org.languagetool.dev.SpellDictionaryBuilder \
>
> Actually the class is deprecated, its non-deprecated version is now at
> org.languagetool.tools.SpellDictionaryBuilder and it should also have
Daniel Naber <daniel.na...@languagetool.org> wrote:
> On 2015-12-19 22:18, Dominique Pellé wrote:
>
>> I've mentioned it in the past, I find surprising that most
>> languages do not provide the scripts that they use to create
>> binary dictionaries. Providing s
Hi
I used the attached script to find 22 broken URL
in grammar.xml of Catalan, English, Dutch, Polish:
$ cd languagetool
$ ./test-broken-url.sh
Checking
[languagetool-language-modules/ast/src/main/resources/org/languagetool/rules/ast/grammar.xml]...
Checking
Hi
I would like to update the French and Breton POS tag dictionaries,
ideally before the next LanguageTool release. However, I'm asking
whether that's OK as I read that updating the dictionaries is problematic
with git (it increases the size of the git repository).
So what to do? Should I wait
Daniel Naber wrote:
> Hi,
>
> the year is slowly coming to an end, so I thought I'd try to summarize what
> we've achieved this year and how we can move LT forward in the future. In
> 2015, we...
>
> * made three releases so far (2.9, 3.0, 3.1), another one is
Daniel Naber wrote:
> Hi,
>
> could a Dutch native speaker have a look at this rule, is it okay?
>
> http://languagetool-user-forum.2306527.n4.nabble.com/New-Dutch-rule-referentie-used-as-an-Anglicism-td4643279.html
>
> Regards
> Daniel
My Dutch is too rudimentary to assess the rule.
However,
Daniel Naber <daniel.na...@languagetool.org> wrote:
On 2015-10-29 22:58, Dominique Pellé wrote:
>
> > I can't make sense of it. And I can't reproduce the
> > error in the command line either since this gives no
> > error:
>
> *Maybe* t
On Thu, Oct 29, 2015 at 11:30 PM, Jaume Ortolà i Font wrote:
> Hi Dominique,
>
> When there is no space at the end of the sentence, the last token has the
> POS tag "PARA_END", and this tag makes rule match:
>
>
>
> You can see the difference (with space vs without
Hi
Here is presumably a bug which I do not understand.
Hopefully someone can help.
If I copy/paste the following 3-word sentence in the
Esperanto grammar checker at https://www.languagetool.org/eo/
Pri la kategorio
... and then press the button "Check text", LT highlights
the last 2 words in
Daniel Naber <daniel.na...@languagetool.org> wrote:
On 2015-10-25 05:13, Dominique Pellé wrote:
>
> Hi Dominique,
>
> > I measured LT speed using command line version of LanguageTool.
> > Recorded numbers are user time reported by Linux time command.
>
> thanks
Dominique Pellé wrote:
Multi-threading was introduced in LT-2.7 but above numbers don't show
> improvements. Maybe I needed to use a bigger document than 500 lines.
>
I need to correct this: it's LT-2.3 which introduced multi-threading.
I also made more measurements with older versions.
Hi
I measured LT speed using command line version of LanguageTool.
Recorded numbers are user time reported by Linux time command.
Measurements were made on my laptop:
- xubuntu-14.04.3
- i5-3317U CPU, 1.7Ghz, 4 cores, SSD
- java version "1.8.0_60"
I measured:
- several versions of LT (from 1.8
Purodha Blissenbach wrote:
> Hi,
>
>>http:/www.google.com (there should be 2 slashes after
>> protocole)
>
> This is valid, at least protocolwise. I refers to a directory
> /www.google.com on the current server. Good warning, of course, if there
> is at least a
Hi
I've added a rule in French grammar.xml to check for common mistakes in URLs
in this checkin:
https://github.com/languagetool-org/languagetool/commit/4bd2109242ad02f2d50e1f597580764a1dd45d97
Some examples of mistakes detected:
http//www.google.com (missing colon)
Daniel Naber wrote:
> Hi,
>
> this is just a reminder that the data for statistical error detection
> exists, now people just need to use it...
>
> Regards
> Daniel
Hi Daniel
Yes, I have not forgotten, but I really have little time at home these days.
I can
Daniel Naber wrote:
> On 2015-10-11 12:31, Daniel Naber wrote:
>
> >> Use of "exact-meaning" would be very rare.
> >> Maybe a better name:
> >
> > I think that's okay with me, but I need to think more about it. Maybe
> > the easiest implementation would be to just
Daniel Naber wrote:
> Hi,
>
> we have quite some changes in the nightly tests today. I'm not sure what
> the cause is, could you check your language and see if the changes are
> good or bad?
>
> https://languagetool.org/regression-tests/20151013/
>
> Regards
>
Andre Couture wrote:
> Hi
> I did not follow the entire conversation here but I was curious as of why
> would someone put a non breaking space between two words?
> We face that in other areas of our code as well.
>
> If the idea of the nbsp is to keep the two apparent words together, would
> it
Hi
Would be possible to allow for several tags
in the same rule?
It seems that we can only give one.
I'd like to be able to use several (at least 2):
* one to make sure that part of regexp matches a postag
* another one to make sure that part of the regexp does not match a postag
I tried
Hi
Consider this very simple rule in the English grammar.xml:
egg
yoke
The rule works fine of the 2 words are separated with
at least spaces, tabs or newlines. However, it does not
work when the 2 words are separated with a non-breaking
space (U+000A0). I wonder why.
With a
Daniel Naber wrote:
> On 2015-10-09 07:32, Dominique Pellé wrote:
>
>> I suppose that I care more than most because I only use LT to check
>> text files where the situation is frequent.
>
> I think normalizing the text makes sense if:
> 1) single line breaks get re
Daniel Naber wrote:
> On 2015-10-08 06:59, Dominique Pellé wrote:
>> ... then the regexp rule does not detect all the errors
>> that the rule detected. It does not detect errors
>> in "foo bar" (2 spaces or more, or tabs) or when there is a
>> new line
Mike Unwalla wrote:
> I agree with Purodha. Do not be 'smart'. Do not change the meaning of a
> regexp.
>
> Regards,
>
> Mike Unwalla
OK. It looks like the majority does not want to pre-processs the sentence
to remove consecutive spaces (including tabs, dos/unix new
Daniel Naber wrote:
> On 2015-10-07 06:41, Dominique Pellé wrote:
>
> Hi Dominique,
>
> thanks for your feedback.
One more remark:
If I replace a rule like...
foo
bar
... into ...
foo bar
... then the regexp rule does not detect all the errors
that the rule de
Daniel Naber <daniel.na...@languagetool.org> wrote:
> On 2015-10-07 06:41, Dominique Pellé wrote:
>
> Hi Dominique,
>
> thanks for your feedback.
>
>> 1) How do I highlight only a subset of the match? Trying the above
>> rule, I see this:
>
> Th
Daniel Naber wrote:
> Hi,
>
> there's now a first and limited implementation of the syntax in
> master. Instead of
>
> foo
>
> you can now use
>
> foo
>
> But be aware that this is a real regular expression that ignores tokens,
> so it matches anything with the
Hi
I noticed that the Dutch rule OT_DOOR_DE_WAR
contains invalid XML. See the spurious > after the
word "in" in the 2 lines below:
Juist is in> de war.
Het ligt door de war.
I also wonder why LT accepts the XML file without giving errors.
Regards
Dominique
Dominique Pellé wrote:
> Hi
>
> I noticed that the Dutch rule OT_DOOR_DE_WAR
> contains invalid XML. See the spurious > after the
> word "in" in the 2 lines below:
>
> Juist is in> de war.
> Het ligt door de war.
>
> I also wonder why LT accepts t
Daniel Naber <daniel.na...@languagetool.org> wrote:
> On 2015-09-05 22:53, Dominique Pellé wrote:
>
>> It is similar to what Daniel wrote earlier as well:
>>
>> a (plein temps|chaque fois|rude épreuve|vol
>> d’oiseau)
>>
>> It would make some su
Hi
I'm sharing a link that looks useful for the English LanguageTool:
https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/Grammar_and_miscellaneous
Regards
Dominique
--
Daniel Naber wrote:
> Hi,
>
> on mobile, languagetool.org doesn't show the textarea where you can try
> LT. Originally, this was on purpose, but nowadays smartphone displays
> have good resolution and I think we should show it at least for modern
> devices. The
Daniel Naber <daniel.na...@languagetool.org> wrote:
> On 2015-09-05 22:53, Dominique Pellé wrote:
>
>> It is similar to what Daniel wrote earlier as well:
>>
>> a (plein temps|chaque fois|rude épreuve|vol
>> d’oiseau)
>
> So instead of ... we wou
Jaume Ortolà i Font <jaumeort...@gmail.com> wrote:
2015-09-05 16:11 GMT+02:00 Daniel Naber <daniel.na...@languagetool.org>:
>
>> On 2015-09-04 23:21, Dominique Pellé wrote:
>>
>> > I wish I could write a rule pattern like this:
>> >
>&g
Hi
Say I want to detect invalid use of word "a" (= has, verb)
instead of "à" (= at, preposition) in many French expressions
such as:
a nouveau -> à nouveau
a plein temps -> à plein temps
a rude épreuve -> à rude épreuve
a vol d'oiseau -> à vol d'oiseau
etc.
I wish I could write a
Andriy Rysin ary...@gmail.com wrote:
I started working on some abbreviations with dots in Ukrainian and
added some of them to the dictionary. But now when I specify
tokenрр./token in rules LT warns me:
token [2], contains рр. that is not marked as regular expression but
probably is one
I
Hi
I used to check grammar rules in one language only using:
$ mvn —projects languagetool-language-modules/fr —also-make clean test
It's documented here:
http://wiki.languagetool.org/maven-tips
It used to work, but it does not work anymore. It gives this error:
[ERROR] BUILD FAILURE
[INFO]
Daniel Naber daniel.na...@languagetool.org wrote:
On 2015-02-01 13:22, Dominique Pellé wrote:
$ mvn —projects languagetool-language-modules/fr —also-make clean test
You need to use two dashes (--) instead of — for the 'projects' and
'also-make' parameter. I'll fix the Wiki page.
Regards
Hi
I have this script...
languagetool/languagetool-language-modules/fr/src/main/resources/org/languagetool/resource/fr/create-lexicon.sh
... which works by assuming that SynthDictionaryBuilder java
program creates its output files in /tmp/...
But it would be trivial to modify the script if -o
Daniel Naber daniel.na...@languagetool.org wrote:
Hi,
to provide LT as a 100% pure Java software, I'd like to switch from
Hunspell (native code) to Morfologik (Java-based). For that, I think the
following languages are easy to switch:
Asturian
Galician
Khmer
Spanish
R.J. Baars wrote:
A long time ago, I chose to have the - as a word char, not separating word
parts that really belong together.
That is now in the way for the date rules, since a normal date in Dutch
can also be 15-1-1958.
Is there a solution for this issue? Like tokenizing when the dash
Daniel Naber daniel.na...@languagetool.org wrote:
Hi,
I'm looking for ideas to systematically improve LT's error coverage. The
last years, I've mostly worked by simply adding rules for errors that I
coincidentally found on the web or in emails. Is anybody of you working
systematically in
Yakov Reztsov yakovr...@mail.ru wrote:
Hi,
Mon, 6 Oct 2014 21:47:40 +0200 от Dominique Pellé:
Hi
I've noticed that the Russian and Dutch
compounds.txt files contain duplicate entries.
Either the dupes should be removed, or maybe
some of the dupe were meant to be the plural
form
Hi Ruud
You can have a look at the Java files DateCheckFilter.java for
Catalan, Breton or Esperanto, for which there is also no Java
locale.
Dominique
PM, R.J. Baars r.j.ba...@xs4all.nl wrote:
About more semantic rule, what about time consistency?
About the date check, I have been looking
Hi
I've noticed that the Russian and Dutch
compounds.txt files contain duplicate entries.
Either the dupes should be removed, or maybe
some of the dupe were meant to be the plural
form or some other flexions. Can the language
maintainers check the duplicate entries in the
following compounds.txt
Hi Ruud
Duplicate entries are at best not necessary, so they
should be removed. But at worse, it can be that the
intention was to put a plural for example. I found
such errors in the French compound.txt where
I had the word casse-gueule twice instead of
having casse-gueule and casse-gueules.
Hi
I noticed a bug in the date checking rule:
the \1 \2 (etc) substitutions in message do not
work when they appear after \realDay.
I noticed this while writing the Breton date
rule. I had to change the message somehow
to so that \realDay appeared at the end of the
message to make it work.
Here
R.J. Baars r.j.ba...@xs4all.nl wrote:
There is an official advice for Dutch, stating that for understandable
reading, an average of no more than 12 words a sentence is required.
Since I can only make rule per sentence, I made a rule, warning for
sentences of more than 18 words. That rule
Daniel Nab er wrote:
Hi,
I've implemented a 'filter' element for XML which can be used to modify,
keep, or reject a rule match. The first use case is a rule that checks
if a weekday matches its date, e.g. Monday, 7 October 2014 is
inconsistent, as 2014-10-07 is not a Monday. The rule for
Marcin Miłkowski list-addr...@wp.pl wrote:
W dniu 2014-09-09 23:10, Dominique Pellé pisze:
Daniel Naber daniel.na...@languagetool.org
mailto:daniel.na...@languagetool.org wrote:
On 2014-09-09 22:38, Dominique Pellé wrote:
* why does your example give a message
Marcin Miłkowski list-addr...@wp.pl wrote:
W dniu 2014-09-10 11:34, Dominique Pellé pisze:
Marcin Miłkowski list-addr...@wp.pl mailto:list-addr...@wp.pl wrote:
W dniu 2014-09-09 23:10, Dominique Pellé pisze:
Daniel Naber daniel.na...@languagetool.org
mailto:daniel.na
Daniel Naber daniel.na...@languagetool.org wrote:
On 2014-04-27 22:18, Dominique Pellé wrote:
I wish I could check the POS tag of a portion of
a token.
(Replying to an old thread here...)
I think the new rule filter offers a solution for this that does not
require any changes to the XML
disambig action=filter postag=N.*/
/rule
Regards,
Jaume Ortolà
2014-09-03 6:22 GMT+02:00 Dominique Pellé dominique.pe...@gmail.com:
Hi
Have a look in the following debug output
of LanguageTool where a token gets non-sensical
POS tag N.* (multiple times) after
Hi
Have a look in the following debug output
of LanguageTool where a token gets non-sensical
POS tag N.* (multiple times) after a disambiguation
rule is applied.
Is it a bug in the disambiguator?
Or am writing an incorrect disambiguation rule?
$ echo An eil| java -jar
Marcin Miłkowski list-addr...@wp.pl wrote:
W dniu 2014-09-01 20:04, Daniel Naber pisze:
Hi,
our Wiki at http://wiki.languagetool.org/hunspell-support says
ICONV/OCONV isn't supported in Morfologik, but I see there are the
fsa.dict.input-conversion and fsa.dict.output-conversion options. So
Daniel Naber daniel.na...@languagetool.org wrote:
On 2014-08-29 21:50, Dominique Pellé wrote:
Message: The date 31 September 2014 is not a Monday, but a Wednesday.
Monday, 31 September 2014
I've now made date parsing more strict, but the rule won't complain
about these dates and just
R.J. Baars r.j.ba...@xs4all.nl wrote:
A different question: what about dates like '08-07-2014'or '2014/08/07'
One cannot tell which is month and which is day, isn't it? Are both
options considered then? And what of notations '04/05/06'; it is
completely unclear which is month, year and day.
Daniel Naber daniel.na...@languagetool.org wrote:
On 2014-08-29 07:47, Dominique Pellé wrote:
Would there be a way to say something like instead:
The date October 7, 2014 is not a Monday, but a Tuesday.
This is now implemented, you can use \realDay in your message and it
will be replaced
Hi
I saw that date checking was added to LT. Thanks for that.
I've added support for date checkin in French (as was done
already in en, de, pl, ca). I have 2 remarks:
1) LT detects date inconsistency in French as in:
* Vendredi 28/08/2014 (it should be a Thursday, not a Friday)
* Vendredi 28
R.J. Baars r.j.ba...@xs4all.nl wrote:
I discovered that the rule below is not working very well.
It look like 'skip' also skips over sentence boundaries.
Is that intentional? Or is something else wrong?
In case it is intentional, is there an option to forbid that?
Ruud
rule id=nr738
Daniel Naber daniel.na...@languagetool.org wrote:
On 2014-08-06 14:13, Juan Martorell wrote:
testSynthesizeStringString (java.lang.Error: Unresolved compilation
problem:
The declared package does not match the expected package
org.languagetool.synthesis.fr [1]
I cannot reproduce that
Hi
I wrote a French rule which contains exception\2/exception
but it does not work.
Should things like \2 work inside exception.../exception?
The rule checks that the two words vu, vus, vue or vues
are identical as in vu de mes yeux vu (correct),
vus de mes yeux vus (correct), and it's supposed
Daniel Naber daniel.na...@languagetool.org wrote
On 2014-07-17 08:20, Dominique Pellé wrote:
Should things like \2 work inside exception.../exception?
match no=2/ should work.
Regards
Daniel
Hi Daniel
exceptionmatch no=2//exception does not work either
just like exception\2
On Thu, Jul 17, 2014 at 9:18 AM, Daniel Naber daniel.na...@languagetool.org
wrote:
On 2014-07-17 08:59, Dominique Pellé wrote:
exceptionmatch no=2//exception does not work either
Okay, I thought it worked because I see it's being used in the Polish
grammar.xml. But maybe it never worked
Daniel Naber daniel.na...@languagetool.org wrote:
On 2014-07-17 10:52, Dominique Pellé wrote:
I glanced at the Polish grammar.xml, but I could not find such rules.
Sorry, I guess my grep command was wrong and I actually found match
outside the exception element.
cvc-complex-type.2.4.d
On Thu, Jul 17, 2014 at 2:49 PM, Dominique Pellé dominique.pe...@gmail.com
wrote:
Daniel Naber daniel.na...@languagetool.org wrote:
On 2014-07-17 10:52, Dominique Pellé wrote:
I glanced at the Polish grammar.xml, but I could not find such rules.
Sorry, I guess my grep command was wrong
Elanjelian Venugopal tamil...@gmail.com wrote:
Added a second rule group to the grammar.xml
Hi Elanjelian
Since you have non-trivial suggestions with \1 etc.
such as suggestion\1க்\2/suggestion, I would
advise to use correction='...'. Ex, replace...
example type='incorrect'புது markerமா
Daniel Naber daniel.na...@languagetool.org wrote:
On 2014-07-11 22:43, Dominique Pellé wrote:
1/ Why does the above command create files in /tmp rather than
providing command line options to specify the outputs?
There's no specific reason that I can remember. Feel free to change
Hi
I'd like to create a synthesizer dictionaries for
French and Breton in order to be able to give
better suggestions based on the synthesizer.
I just started to experiment based on information
http://wiki.languagetool.org/developing-a-tagger-dictionary#toc9
and I see that I can create a
Hi
Searching for in grammar.xml files, I see things that
are wrong, or at least suspicious:
$ ack-grep --xml '' languagetool-language-modules/*/src
languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/grammar.xml
25390:token negate=yes/token
25400:
Hi
I've added antipattern sanity checks.
It detects some problems in antipatterns for German
and Polish.
However, I have not checked-in yet because the
antiPattern.getId() is incorrect. It seems to contain the ID
of the previous rule, rather than the rule owning the
antipattern. I believe that
Hi
For your information, I've added yet another sanity check
for regexp in grammar disambiguation files in checkin
8838a7edef0f7a24d5c63533df7a15fc154c777d
It finds regexps that are most certainly wrong such as .|;
Since the dot can match any char, the ; in the disjunction is
useless. There is
Hi
I've added a new pattern rule checker
(commit commit e26967dc4663283574a8d536308c13ad188b44a0)
and it finds this issue:
The Catalan rule: FORCA2:6, token [1], contains força
that contains token separators, so can't possibly be
matched.
The Catalan rule: FORCA2:7, token
Jaume Ortolà i Font jaumeort...@gmail.com wrote:
Marco,
You have a token with vela/velas and then another with bandeira/bandeiras.
The rule expects a sentence like arrrear a vela bandeira.
Instead of
token regexp=yesvela|velas/token
token
Daniel Naber daniel.na...@languagetool.org wrote:
On 2014-04-27 22:18, Dominique Pellé wrote:
token regexp=yes postag_group1=fooez-(.*)/token
I'm not sure how this could be implemented in a clean way... wouldn't
this be a rather ugly special case in the tagger to ignore the
tokenization
Hi
I wish I could check the POS tag of a portion of
a token.
For example, in a Breton word such as ez-c'hlas, I wish
I could check the POS tag of c'hlas in XML rules.
I don't think that's currently possible, unless:
- I write a Java rule
- or I change the tokenizer to split on hyphen - but
1 - 100 of 224 matches
Mail list logo