2012/5/22 Marcin Miłkowski list-addr...@wp.pl
However, it's not possible, as far as I remember, to refer to another
token's POS tag and inflect some other token based on it (which would
involve recursive inclusion of match inside match). It seems pretty much
straightforward to implement but I
2012/5/31 Daniel Naber list2...@danielnaber.de
On Donnerstag, 31. Mai 2012, Jaume Ortolà i Font wrote:
Thanks, looks good. Maybe the match() method can become a bit shorter by
extracting some code to private methods?
I don't see any easy way to do it. The whole Java rule is equivalent
Hi,
I have been testing some rules for three-tokens sequences using
unification. I describe here my case and what I have found.
I want to match a three-tokens sequence: determinant + possessive + noun
(or adjective). There are two features in unification: gender (masc./fem.)
and number
There is the possibility that some words that are included in the tagger
dictionary (or are tagged in the disambiguation file) are marked as errors
by Hunspell, because they are missing in the Hunspell dictionary. In order
to avoid it we could add a condition in the Hunspell Java rule: mark as an
.
Regards,
Jaume Ortolà
2012/6/9 Marcin Miłkowski list-addr...@wp.pl
W dniu 2012-06-09 20:26, Jaume Ortolà i Font pisze:
2012/6/9 Marcin Miłkowski list-addr...@wp.pl mailto:list-addr...@wp.pl
W dniu 2012-06-09 19:14, Jaume Ortolà i Font pisze:
2012/6/9 Marcin Miłkowski list
Hi Marcin,
This is very necessary. We need a simpler and more intuitive config dialog.
A question. This global settings can be presented in two ways: as
checkboxes or as mutually exclusive options. Have you thought about
this? Do you have any preference? I think that the two possibilites should
Daniel,
Could you rerun the tests with the Wikipedia corpus?
http://community.languagetool.org/corpusMatch/list?lang=en
Tell us how many articles are checked.
Regards
Jaume Ortolà
2012/9/21 Daniel Naber list2...@danielnaber.de
Hi,
this is a reminder that we're now in feature freeze[1].
2012/9/22 Daniel Naber list2...@danielnaber.de
On 21.09.2012, 12:57:09 Jaume Ortolà i Font wrote:
Could you rerun the tests with the Wikipedia corpus?
http://community.languagetool.org/corpusMatch/list?lang=en
As you might have noticed there are performance problems with the site...
I'm
Hi,
In the case of Catalan, there are several causes of the high number of
positives (in order of importance, I think):
- Some rules are used to check regional variants of Catalan. I have
disabled them for the Wikipedia corpus check.
- There are too many low quality Wikipedia articles. So there
2012/10/30 Daniel Naber list2...@danielnaber.de
On 22.10.2012, 12:36:08 Jaume Ortolà i Font wrote:
- There are too many low quality Wikipedia articles. So there is a lot
of true positives.
In that case, maybe you could make aware the Catalan Wikipedia community
aware of Wikicheck?
http
I have detected a source of false alarms (for Catalan) in the Wikipedia
interlanguage links [1]. Some of the language codes (be, es, et, hi, li,
lo, se, te...) happen to be Catalan words that trigger several grammar
rules. In general, words inside any kind of link should be ignored.
Regards,
, then there
would be no false alarms.
Jaume Ortolà
2012/11/2 Jaume Ortolà i Font jaumeort...@gmail.com
I have detected a source of false alarms (for Catalan) in the Wikipedia
interlanguage links [1]. Some of the language codes (be, es, et, hi, li,
lo, se, te...) happen to be Catalan words that trigger
Hi,
Great job! It works fine. I have tested the extension in two kind of
real-life situations: composing a message in gmail and using this new
WYSIWYG Wikipedia editor.[1]
What I miss now for this extension to be really useful is a more
direct connection between the error messages and the points
Hi,
In Catalan there are three main regional variants, which can be
handled with a few simple grammar rules. I would like very much to put
these rules in separate country grammar files (like in
/rules/en/en-GB/grammar.xml). Unfortunately, the codification we are
using in LT for country variants
2012/11/12 R.J. Baars r.j.ba...@xs4all.nl:
Would it be possible to use this new standard and have the old one derived
from it using a translation table?
Codes like de-DE or en-GB are indeed valid in the BCP-47 standard.
No change is needed here. On the other hand, we could use language
codes
2012/12/28 Mauro Condarelli mc5...@mclink.it
My disambiguation rule needs updating, if someone can suggest how.
Mario gli chiese l'ora.
121 rules activated for language Italian
S Mario[Mario/NPR] gli[gli/PRO-PERS-CLI-3-M-S,il/ART-M:p]
chiese[chiesa/NOUN-F:p]
This bug is fixed now. See revision 8761, changes in
DisambiguationPatternRule.java. startPos was lost in the function
replaceTokens() and now it is kept.
I will document in the wiki the new filterall action and the use of
replace and add in multiple tokens.
Jaume
2013/1/1 Marcin Miłkowski
Hi,
I have found a problem with unification. I'm using this pattern:
rule id=DAAN_ name=det + adj + adj + nom
pattern
unify
feature id=nombre/
feature id=genere/
marker
token postag=D[^R].* postag_regexp=yes/
think,
Dominique?
Regards,
Jaume Ortolà
2013/1/2 Jaume Ortolà i Font jaumeort...@gmail.com
I found a solution. I'm trying to change properly the code.
Regards,
Jaume
2013/1/2 Jaume Ortolà i Font jaumeort...@gmail.com
Hi,
I have found a problem with unification. I'm using
Hi,
I would like to add the isWhitespaceBefore information to the historical
annotations of the disambiguator, so any problem can be easily spotted and
fixed. When isWhitespaceBefore=false then an asterisk will be shown after
the postag: word[lemma/POS*]. Is this OK for everybody? Some JUnit
Italian as a primary language and indicate that the undetected
paragraphs fall back to English. If I know that I will be using lots of
quotations form other languages I can leave the ignore option on and not
check them at all.
Ciao
Paolo
On Jan 14, 2013, at 10:05 AM, Jaume Ortolà i
Hi,
Softcatalà, an organization that promotes software in Catalan and specially
linguistic tools (translator, spellchecker, etc.), is willing to use
LanguageTool in its website. Its online spellchecker received 500.000
visits last December (a bad month). So perhaps LanguageTool should be
2013/1/16 Dominique Pellé dominique.pe...@gmail.com
Do we really need to put suggestion inside suggestions?
It would be less noisy like this:
messageyada yada yada/message
suggestionxxx/suggestion
suggestionyyy/suggestion
url.../url
example type=incorrect.../example
example
2013/1/19 Mauro Condarelli mc5...@mclink.it
I (slightly) modified
MorfologikSpellerRule to accept without further action words having POS
tags.
This is a welcomed change. Sometimes there are words that are not present
in the tagger dictionary but get a POS tag in the disambiguation or in
2013/1/24 Daniel Naber list2...@danielnaber.de
You can use this for now (I just made an update, the class was still
missing):
java -cp languagetool-standalone-2.1-SNAPSHOT.jar
org.languagetool.commandline.Main
We can either add script files or configure Maven to create
another JAR for the
This can be useful for Eclipse users.
I installed these plugins:
m2e - Maven Integration for Eclipse
Subclipse (or other SVN plugin)
Maven SCM handler for Subclipse
Then in the SVN repository you can check out as a Maven project The
result is a duplicated structure like the one explaind by
2013/1/28 Mauro Condarelli mc5...@mclink.it
Sorry to disturb, people.
I've been using Eclipse previously.
Now I followed instructions for the maven repack.
Everything went ok, but I can't start the commandline:
mcon@vmrunner
:/srv/Store/Language/languagetool/languagetool-standalone/target$
On 28/01/2013 09:51, Jaume Ortolà i Font wrote:
2013/1/28 Mauro Condarelli mc5...@mclink.it
Sorry to disturb, people.
I've been using Eclipse previously.
Now I followed instructions for the maven repack.
Everything went ok, but I can't start the commandline:
mcon@vmrunner
:/srv/Store
In Catalan new words are created by compounding and derivation. It would
suffice to have a list of common prefixes and suffixes, to know the class
of words to which every affix can be united (i.e. noun, adjective, verb,
another affix), and a few rules of ortographical change in the
concatenation
Hi Daniel,
Three browsers, three different responses.
1) In FireFox, everything is OK.
2) In MS IE9, I get the results in comunity.languagetool.org
3) In Chrome, I get this error message, and no results:
Could not send request to https://languagetool.org:8081/checkDocument
Error: GENERAL
This is probably wrong, isn't it?
The following changes have been done in version trunk of LanguageTool
(xml-based rules only):
ca 0 new, 0 improved, 0 removed
Regards,
Jaume Ortolà
Salutacions,
Jaume Ortolà
www.riuraueditors.cat
2013/4/1 Daniel Naber list2...@danielnaber.de
Hi,
the
Hi,
We are preparing an instance of the LT http server to be used at the
Softcatalà webpage.
For regional and stylistic variants in Catalan, it is indispensable for us
to enable and disable rules (at the same time) from the web interface. The
problem is that when you use the enabled parameter
rules in the http server, that isn't possible now.
Regards,
Jaume Ortolà
[1] http://languagetool.org/http-server/
2013/4/3 Jaume Ortolà i Font jaumeort...@gmail.com
Hi,
We are preparing an instance of the LT http server to be used at the
Softcatalà webpage.
For regional and stylistic
Hi,
I have made some changes in MorfologikSpeller and in BaseTagger so words
written in mixed case are considered spelling errors and are not tagged.
Mixed case words are considered valid only if they appear exactly that way
in the speller dictionary (for the spelling rule) or in the tagger
2013/4/8 R.J. Baars r.j.ba...@xs4all.nl
About case:
In Dutch, DVD is a undesirable way to write dvd;
This is the only thing in Dutch that seems to need a different treatment.
And what about titles? Can they be written in all uppercase letters?
This feature (allow to write in all uppercase
2013/4/18 Andriy Rysin ary...@gmail.com
So now I'll be working on writing rules and beefing up the tags in the
dictionary so I have a question in regards to that (and I apologize up
front if any of the answer already present somewhere) I'll be changing
grammar.xml a lot and I would like to
the 6th suggestion).
I suppose that other languages need a similar approach.
Regards,
Jaume Ortolà
2013/4/7 Marcin Miłkowski list-addr...@wp.pl
W dniu 2013-04-07 11:07, Jaume Ortolà i Font pisze:
Hi,
I have made an improvement in Morfologik speller rule. If few
suggestions are found
2013/4/18 Daniel Naber list2...@danielnaber.de
the right approach is to add this into the algorithm that traverses the
dictionary tree. For German, I needed a solution fast and ended up with a
hack in GermanSpellerRule. It's easy to understand, but if you could check
the morfologik algorithm
-sensitive character comparison.
Best,
Jaume Ortolà
Salutacions,
Jaume Ortolà
www.riuraueditors.cat
2013/4/18 Marcin Miłkowski list-addr...@wp.pl
W dniu 2013-04-18 16:28, Daniel Naber pisze:
On 18.04.2013, 14:41:21 Jaume Ortolà i Font wrote:
Hi Jaume,
For achieving this, I think that some
2013/4/23 Marcin Miłkowski list-addr...@wp.pl
I'm using the tagger dictionary as a speller dictionary, because now
it's better than the hunspell generated word list and that way there is
only one dictionary to be mantained. The files in the hunspell directory
were pending removal. I
This is the modified Speller.java. The idea is more or less the same that
is found in Jan Daciuk's code. More testing in different languages is
needed, because there are a many details to consider and perhaps it's
buggy.
When a possible multiple character substitution is found, a new branch is
2013/4/23 Marcin Miłkowski list-addr...@wp.pl
If that's the case, then it's a bug in traversing the dictionary.
Yes, you were right. OK, then it's a bug. I need to use
isBeforeSeparator() more often. Probably in line 313 of Speller.java,
instead of:
if (!fsa.isArcTerminal(arc)) {
we
Marcin,
I attach again the Speller.java file with some minor changes. This problem
is solved now:
There is a problem to be solved. The L - L·L substitution adds a distance
of 0, but the L·L- L substitution adds 1. It should be always 0.
Best,
Jaume
Speller.java
Description: Binary data
You can disambiguate first, so the femenine noun tag is removed. See the
explanation here:
http://wiki.languagetool.org/developing-a-disambiguator#toc5
Regards,
Jaume Ortolà
2013/4/24 Andriy Rysin ary...@gmail.com
Thanks Marcin
I stole some unifications from Polish grammar.xml, adjusted
2013/4/24 Marcin Miłkowski list-addr...@wp.pl
Jaume, and everybody,
I started to implement the features we need in MorfologikSpeller (in
morfologik repository on github).
It looks good.
* fsa.dict.speller.runon-words for turning off and on the runon words
feature.
We have to remember
Hi Nathan,
There was an error in the Catalan grammar file. But it is solved now. If
you do svn update again, there should be no error.
Regards,
Jaume Ortolà
2013/5/18 Nathan Wells sungk...@gmail.com
I haven't updated the Khmer module in a while, but have some new stuff to
add.
Now that
Hi,
It seems that in Italian and French, from a certain point on, all the
errors have been removed. I suspect that there is some out of bond
exception that stops the checking of articles. The lines I wrote in
UppercaseSentenceStart
probably are not safe enough. I will change it.
Regards,
Jaume
2013/5/21 Marcin Miłkowski list-addr...@wp.pl
I have a problem with it: all changes create false alarms for Polish.
This is a regression.
Hi Marcin,
Do you mean these changes? [1] They are not creating alarms. They are
removing alarms. Are the changes removing true positives?
What I did in
2013/5/21 Marcin Miłkowski list-addr...@wp.pl
What I did in UppercaseSentenceStart was to eliminate alarms in
sentences starting with patterns for enumerated lists like these:
a)
b.
iv.
iii)
The latter two are genuine mistakes in Polish: Roman numerals are
written only in
Hi Nathan,
I have just committed the Java rule you asked for. See if everything is
correct.
Regards,
Jaume Ortolà
2013/5/29 Nathan Wells sungk...@gmail.com
I need some help creating a java rule for the Khmer language in
LanguageTool. Would someone be willing to create what I believe is a
2013/6/12 Andriy Rysin ary...@gmail.com
I noticed that numbers with fractions like 2,2 are split into '2',
',', '2' by word tokenizer. In Ukrainian I need to require difference
case of the following noun based on whether it's a whole number or
fractional so I was planning to adjust Ukrainian
Hi,
I don't know what are the test case failures for Catalan. In any way, the
SENT_START, SENT_END tags are used for marking the start and the end of a
sentence. They work at the sentence level, and they have nothing to do with
the POS tag of a token. So the current behavior seems logical to me.
Hi,
There is a bug report about the behavior of UppercaseSentenceStartRule:
https://sourceforge.net/p/languagetool/bugs/185/
I think that the only situation in which we can safely prevent the rule to
match is when the previous sentence ends with comma or semicolon. So I
propose to implement
2013/7/2 Daniel Naber list2...@danielnaber.de
Am 02.07.2013 17:01, schrieb Marco A.G.Pinto:
Now I can't connect to the repository as it gives an error.
Are you sure you're using the right URL to connect?
http://svn.code.sf.net/p/languagetool/code/trunk/languagetool/
This was changed at
.org/regression-tests/20130702/result_pl_20130702.html
8. http://languagetool
.org/regression-tests/20130702/result_it_20130702.html
2013/7/2 Jaume Ortolà i Font jaumeort...@gmail.com
Hi,
There is a bug report about the behavior of UppercaseSentenceStartRule:
https://sourceforge.net/p
dniu 2013-07-02 01:11, Jaume Ortolà i Font pisze:
Hi Marcin,
I have been using the still unreleased code of morfologik-stemming and I
have made improvements to Speller.java for some previously unforseen
cases. See the attachement.
In order to complete the development, and test debug with all
2013/7/15 Daniel Naber list2...@danielnaber.de:
Am 15.07.2013 12:35, schrieb Marcin Miłkowski:
Please review my changes.
+assertCorrectionsByOrder(rule, Rytmus, Remus, Rhythmus);
This new suggestion is not as good as the old one, Rhythmus should be
preferred. As this is a classical/typical
-addr...@wp.pl:
W dniu 2013-07-15 12:41, Jaume Ortolà i Font pisze:
Thanks, Marcin.
Some remarks. The improvements I sent to the list 15 days ago have not
been added, and moreover I have found more bugs.
I'm really sorry but there are 200 mails from the mailing list over the
last two weeks
2013/7/15 Marcin Miłkowski list-addr...@wp.pl:
Hi Jaume,
W dniu 2013-07-15 21:16, Jaume Ortolà i Font pisze:
Hi, Marcin.
I have tested the current code (1.8.0-SNAPSHOT) and everything is OK,
all the changes are there. Thank you.
Great. We'll release 1.7.1, this is just a minor bug fix
Hi,
I have just copied the Ukrainian SimpleReplaceRule in the Catalan module.
But most of the improvements could be moved up to the abstract rule
(AbstractSimpleReplaceRule).
2013/5/16 Andriy Rysin ary...@gmail.com
Just wanted to let you know that I recently improved SimpleReplaceRule
that's
2013/8/20 Marco A.G.Pinto marcoagpi...@mail.telepac.pt
Could someone add this file again to the Portuguese folder?
Done.
Jaume
--
Introducing Performance Central, a new site from SourceForge and
AppDynamics.
2013/8/20 Daniel Naber list2...@danielnaber.de
On 2013-08-14 18:59, Marcin Miłkowski wrote:
For or, I can see two solutions:
(a) run-time conversion of such rules to a list of normal rules (when
reading the file, in the similar way as phrases are used) -- this is
the
easiest way and
2013/8/28 Daniel Naber list2...@danielnaber.de
On 2013-08-27 19:56, Jaume Ortolà i Font wrote:
I have implemented this solution for the or. It seems to work.
Thanks!
Git question. Is it OK to publish my modifications with: git push
origin my_local_branch?
Did that work or did you
Hi,
I see two ways:
An empty suggestion (that can be confusing for the user):
rule default=off id=ACTUALLYREALLY name=Possible needless emphasis:
actually/really
pattern
token regexp=yes(?-i)actually|really/token
/pattern
messageConsider if the word is (actually) necessary.
Hi,
LibreOffice 4.2 (due in November) will allow using the language code
ca-ES-valencia for the Valencian variant of Catalan (default: ca-ES). It
would be great to take advantage of this in Languagetool. The only
difference between the general and Valencian variants is just that a few
grammar
Daniel,
In the Wikipedia results there are matches for rules that are disabled
in disabled_rules.properties, for example EXIGEIX_VERBS_CENTRAL. Shouldn't
they be ignored?
Regards,
Jaume
2013/9/18 Daniel Naber list2...@danielnaber.de
On 2013-09-18 02:04, dna...@users.sourceforge.net wrote:
2013/9/18 Daniel Naber list2...@danielnaber.de
On 2013-09-18 13:09, Jaume Ortolà i Font wrote:
In the Wikipedia results there are matches for rules that are disabled
in disabled_rules.properties, for example EXIGEIX_VERBS_CENTRAL.
Shouldn't they be ignored?
As this is a regression tests
2013/9/17 Daniel Naber list2...@danielnaber.de
On 2013-09-17 10:31, Jaume Ortolà i Font wrote:
I will try to implement it. What would be the best way to do it? I see
that the Simple German (de-DE-x-simple-language) is implemented in a
module outside the other German variants
,
Jaume
[1]
https://github.com/languagetool-org/languagetool/blob/master/languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/en-GB/grammar.xml
Salutacions,
Jaume Ortolà
www.riuraueditors.cat
2013/9/19 Jaume Ortolà i Font jaumeort...@gmail.com
2013/9/17 Daniel Naber
2013/9/20 Jaume Ortolà i Font jaumeort...@gmail.com
The major question is about the country variants in LibreOffice. Is it
really working? For example, when using British English in LibreOffice, I
don't see any match for apartment or zip code as defined here in the
grammar rules for British
Hi Daniel,
I am very close to complete the changes for the ca-ES-valencia issue
(solving also problems with British English in LibreOffice). This arose
very recently and there was little time to do it. I hope to be finished
today. Otherwise I will give up for the 2.3 release. I will introduce one
2013/9/21 Daniel Naber list2...@danielnaber.de
On 2013-09-21 09:47, Jaume Ortolà i Font wrote:
finished today. Otherwise I will give up for the 2.3 release. I will
introduce one more string (ca-ES-valencia = Catalan (Valencian)). Or
I can do it right now. Is that okay?
It's okay, please
Ortolà
[1]
http://www.openoffice.org/api/docs/common/ref/com/sun/star/lang/Locale.html
[2]
https://wiki.documentfoundation.org/images/b/b5/LibreOffice_FOSDEM-2013_Language_Tags.pdf
[3] http://dev-builds.libreoffice.org/daily/master/
2013/9/20 Jaume Ortolà i Font jaumeort...@gmail.com
2013/9/20
2013/9/22 Daniel Naber list2...@danielnaber.de
Currently the two grammar.xml files look almost the same. Maybe we can
avoid that by moving the common parts to its own files and including
them, as described here? http://xml.silmaril.ie/includes.html
This would need to be tested carefully
Hi,
This could be of interest to you.
Eike Rathke: I'll talk about it at the LibreOffice Conference 2013 at
Milano, so to get all the details please join me and attend Getting you
language in on Thursday, 26 September at 15:30 in Sala Alfa.
2013/9/22 Daniel Naber list2...@danielnaber.de
On 2013-09-22 11:51, Jaume Ortolà i Font wrote:
default parameters. So yes, I would prefer another way to deal with
it. Perhaps what you suggested at first: ValencianCatalan implements
getEnabledRules and getDisabledRules
Daniel,
I found the same problem recently. I resorted to the attached perl script
for this step.
Regards,
Jaume Ortolà
2013/10/4 Daniel Naber list2...@danielnaber.de
Hi,
did anybody recently build a synthesizer? When I follow the instructions
at
2013/10/16 Daniel Naber list2...@danielnaber.de
Hi,
although I think I understand the technical details of unification, I'm
not sure how/why it is used in grammar.xml. For example, if a sequence
of words share the same gender and number, that means there's agreement,
so you cannot use that
Hi,
When usign LanguageTool WikiCheck, if the article has more than 100 errors,
you get a warning: More than 100 possible errors found - the remaining
errors will not be shown.
But when you submit changes to Wikipedia, what you get in Wikipedia is
always a no difference page. No change is
it be an option in the comand line?
Regards,
Jaume Ortolà
[1] https://ca.wikipedia.org/wiki/Glicèrid
Salutacions,
Jaume Ortolà
www.riuraueditors.cat
2013/10/17 Daniel Naber list2...@danielnaber.de
On 2013-10-17 09:28, Jaume Ortolà i Font wrote:
Hi Jaume,
But when you submit changes to Wikipedia
2013/11/25 Daniel Naber list2...@danielnaber.de
On 2013-11-25 11:11, Jaume Ortolà i Font wrote:
- A method for building the dictionary, assuming that it will be
used only for some languages (backward compatible).
- A way of using the frequency information in the ordering of
suggestions
2013/11/26 Daniel Naber list2...@danielnaber.de
On 2013-11-26 15:27, Jaume Ortolà i Font wrote:
Look at these wordlists [1]. They are Apache 2.0. The words are
classified in 256 ranges.
[1]
https://github.com/mozilla-b2g/gaia/tree/master/keyboard/dictionaries
The German one looks okay
, we could consider that the last byte is
the frequency data and the separator between POS tag and frequency is not
needed.
The other solution is to change the separator...
Regards,
Jaume Ortolà
2013/11/26 Marcin Miłkowski list-addr...@wp.pl
W dniu 2013-11-26 18:44, Jaume Ortolà i Font pisze
2013/12/9 Anton Meixome meix...@certima.net
I'm newbie here but I have a question. Why there isn't frequency list
for galician in
https://github.com/mozilla-b2g/gaia/tree/master/keyboard/dictionaries ?
This is not our project. You should ask there. We chosed these lists
because there are a
version of Morfologik. And then
we'll be able to rebuild the dictionaries and adjust the tests if needed.
Regards,
Jaume Ortolà
2013/12/9 Marcin Miłkowski list-addr...@wp.pl
W dniu 2013-12-09 00:12, Jaume Ortolà i Font pisze:
Hi,
I have implemented the use of the frequency word lists
Hi,
There are some characters in translations that need scaping. I have seen,
for example, missing apostrophes in http://community.languagetool.org. So
where is the proper place to do the scaping? Is it the responsibility of
the translators in Transifex?
Regards,
Jaume Ortolà
+de+Som%C3%A0lialang=ca
So should I write apos; or quot;?
http://www.riuraueditors.cat
Regards,
Jaume Ortolà
2013/12/21 Daniel Naber list2...@danielnaber.de
On 2013-12-21 12:10, Jaume Ortolà i Font wrote:
My question is this. If translating from English to another language,
an apostrophe
Hi,
In the current implementation the number of possible suggestions grows
exponentially with the replacement pairs, which is not a good thing...
For Milkowski you get 6144 possible suggestions in American English. I
fixed a limit of 7 possible simultaneous replacements in a word, which (if
the
2014-01-28 Kumara Bhikkhu kumara.bhik...@gmail.com
Can a token be a mixture of postags and words? Example: Can a token
match send_end or of|into? If not, how do I indicate this?
Yes, you can write this:
or
token postag=SENT_END /
token regexp=yesof|into/token
or
It's equivalent to using
Hi,
This has become a common request from users. The suggestions for a
capitalized misspelled word are expected to be also capitalized. I suppose
this is not true for all languages in all situations.
So what can we do?
1) Capitalize always the suggestion when it is the first word of a
sentence.
2014-03-21 9:32 GMT+01:00 Nathan Wells sungk...@gmail.com:
So I want to create a rule that asks the user to use the Latin colon
rather than the Khmer character ៈ except in Khmer words that actually end
in the ៈ character.
There are 365 Khmer words that can end in a ៈ character.
What is
Marco,
You have a token with vela/velas and then another with bandeira/bandeiras.
The rule expects a sentence like arrrear a vela bandeira.
Instead of
token regexp=yesvela|velas/token
token regexp=yesbandeira|bandeiras/token
Use
token
Hi,
This happens now in the WikiCheck and in the nightly differences. For
example, with this rule from Catalan grammar.xml:
rule id=EVITA_DEMOSTRATIUS_AQUEST name=Evita els demostratius 'aquest'
default=off
It was caused by some change today.
Regards,
Jaume
2014-07-08 9:37 GMT+02:00 Marcin Miłkowski list-addr...@wp.pl:
The Portuguese dictionary is already built. We simply haven't included
it yet because we usually start from a certain number of rules, and then
add the tagger. Using the tags in rules is a very good idea overall.
I agree with
2014-07-08 17:34 GMT+02:00 Marco A.G.Pinto marcoagpi...@mail.telepac.pt:
Hello!
I have contacted my Minho University friends who make the pt_PT
dictionaries for Mozilla and OpenOffice/LibreOffice.
They said they can create the postag dictionary and help.
Hi Marco,
What I and Marcin try
, Spanish or Catalan), some
existing rules could be used as models, and those who are familiar with
them (as myself) could contribute more readily.
Regards,
Jaume Ortolà
On Tue, Jul 8, 2014 at 9:39 PM, Jaume Ortolà i Font jaumeort...@gmail.com
wrote:
2014-07-08 21:53 GMT+02:00 Marco A.G.Pinto
Here you can see the results of the sample rules I created in Portuguese:
https://languagetool.org/regression-tests/20140708/result_pt_20140708.html
Suas is wrongly tagged in the Freeling dictionary as singular. It should
be plural. That explains most of the false alarms.
But the rule needs
Hi,
I need to enable and disable rules at the same time in command-line.
This is already done in the server mode with three parameters[1]:
enabled = list of rules...
disabled = list of rules...
enabledOnly = yes [by default, no]
Could we implement the same approach in the command-line? Will
2014-07-20 18:07 GMT+02:00 Daniel Naber daniel.na...@languagetool.org:
On 2014-07-20 11:22, Jaume Ortolà i Font wrote:
enabled = list of rules...
disabled = list of rules...
enabledOnly = yes [by default, no]
Could we implement the same approach in the command-line
Hi,
A possible and simple solution is to write two rules. One for sentences
with ending punctuation:
pattern
marker
token regexp=yes(you|thei|ou)r/token
/marker
token regexp=yes[.?!]/token
/pattern
And another one for sentences without ending
1 - 100 of 173 matches
Mail list logo