Re: Inflecting second token with postag from the first

2016-09-13 Thread Jaume Ortolà i Font
2016-09-13 22:27 GMT+02:00 Andriy Rysin : > Sorry if this is already written somewhere - I looked at wiki pages but > could not find anything relevant. > > I have two tokens (first name and last name) and in the suggestion I want > to inflect second token the same as the first.

Re: Help creating rule pt_PT

2016-07-15 Thread Jaume Ortolà i Font
arco A.G.Pinto > --- > > On 15/07/2016 11:54, Jaume Ortolà i Font wrote: > > Hi, > > Most languages have the postag NCMS000, including "Latim". Try: > > * * > > Regards, > Jaume Ortolà > > 2016-07-15 12:45 GMT+02:00 Marco

Re: Help creating rule pt_PT

2016-07-15 Thread Jaume Ortolà i Font
Hi, Most languages have the postag NCMS000, including "Latim". Try: * * Regards, Jaume Ortolà 2016-07-15 12:45 GMT+02:00 Marco A.G.Pinto : > Hello! > > I am trying to create the following rule: > traduzir *em *LANG -> traduzir *para *LANG > (translate TO LANG) >

Re: bulk corrections in Wikipedia using LT

2016-06-30 Thread Jaume Ortolà i Font
2016-06-30 13:02 GMT+02:00 Juan Martorell : > Great job, Jaume! > > However I found some too-greedy corrections in change_always.txt for > Spanish: > > "esta formada" > "esta constituida" > > Recently, the rule excluded the diacritic tilde for referrers, so to speak: > >

Re: bulk corrections in Wikipedia using LT

2016-06-27 Thread Jaume Ortolà i Font
2016-06-27 16:12 GMT+02:00 Mike Unwalla : > > [2] > https://github.com/jaumeortola/cawiki-roofreading/blob/master/examples/example_Spanish.txt > > I get a 404 not found message. > Sorry, I deleted a character inadvertently. Try this:

bulk corrections in Wikipedia using LT

2016-06-27 Thread Jaume Ortolà i Font
Hi, For some time now I have been using the results of LT analysis to make corrections in the Catalan Wikipedia. I have done almost a million edits. There are very different types of edits. Some are just typos fixed with simple “search and replace”, and others are LT rules that need more or less

Re: Improving the rules from yesterday

2016-06-14 Thread Jaume Ortolà i Font
Try this: É pois [!.] Usar vírgula: \1, Será verdade? É pois! Regards, Jaume Ortolà 2016-06-14 11:32 GMT+02:00 Marco A.G.Pinto : > Hello Jaume, > > I was wondering if you could help

Re: Need help creating rule

2016-06-13 Thread Jaume Ortolà i Font
Hi Marco, If you want to take into account every possibility (only one comma present, no comma at all) and always give the proper suggestion, you'll need to write different rules. One rule for: "é pois" é pois [,;:–—\(]

Re: Chrome extension update

2016-06-01 Thread Jaume Ortolà i Font
Daniel, Could we add this option? Assume this variety of Catalan: Catalan = ca-ES Catalan (Valencian) = ca-ES-valencia Regards, Jaume Ortolà 2016-05-31 15:54 GMT+02:00 Daniel Naber : > On 2016-05-30 18:32, Daniel Naber wrote: > > > could everyone please test

Re: new HTTP API with JSON output

2016-05-25 Thread Jaume Ortolà i Font
2016-05-25 15:35 GMT+02:00 Daniel Naber : > > A prototype of a new API is now online and can be tested here: > https://languagetool.org/http-api/swagger-ui/#/default -- please provide > feedback, this API is supposed to be stable for the next 10 years... It looks

ideas for English rules

2016-05-24 Thread Jaume Ortolà i Font
Hi, This document can provide ideas for new English rules: "Misused English words and expressions in EU publications" [1] Some rules should be quite straightforward: * with the aim to (do) > with the aim of (doing) * competences > powers, jurisdiction * Concerning.../ For what concerns... >

Re: ignoring certain tokens in rules

2016-05-06 Thread Jaume Ortolà i Font
relatively rare. Regards, Jaume Ortolà 2016-05-05 16:22 GMT+02:00 Jaume Ortolà i Font <jaumeort...@gmail.com>: > Hi, > > I think Marcin talked about this idea some time ago. > > Sometimes tokens like quotations (or other characters) should be ignored > in some r

ignoring certain tokens in rules

2016-05-05 Thread Jaume Ortolà i Font
Hi, I think Marcin talked about this idea some time ago. Sometimes tokens like quotations (or other characters) should be ignored in some rules. That is, the sentence should be checked as if this token is not present. Any idea about how could it be implemented? Alternatively, tokens like this

Re: Error trying to update French synthesizer dictionary

2016-04-15 Thread Jaume Ortolà i Font
Hi Dominique, This script can be helpful: https://github.com/Softcatala/catalan-dict-tools/blob/master/build-morfologik-lt.sh Regards, Jaume Ortolà 2016-04-15 22:33 GMT+02:00 Dominique Pellé : > Hi > > I'm trying to upgrade the French POS tag and synthesizer >

Re: DictionaryExporter error

2016-04-15 Thread Jaume Ortolà i Font
Hi, Juan. This is fixed now. In the last update of these tools, I tried not to change the input and ouput formats. But the use of "*" as a separator was an unexpeted choice. Regards, Jaume Ortolà 2016-04-15 11:51 GMT+02:00 Juan Martorell : > Hi, > > I created a

Re: Roadmap for Spanish

2016-04-06 Thread Jaume Ortolà i Font
2016-04-06 20:27 GMT+02:00 Marcin Miłkowski : > > > To transform one adjective into an adverb, in English you use the suffix > > `-ly` and in Spanish you use the suffix `-mente`: > > > > Equal --> equally > > Igual --> igualmente > > > > I found 18340 candidates for

Re: Roadmap for Spanish

2016-04-06 Thread Jaume Ortolà i Font
2016-04-06 14:55 GMT+02:00 Juan Martorell : > But more important are some derivatives, both suffixed and prefixed. > Hi Juan, I can tell you my experience in these points. > To transform one adjective into an adverb, in English you use the suffix > `-ly` and in

Re: Roadmap for Spanish

2016-04-05 Thread Jaume Ortolà i Font
2014-06-06 20:45 GMT+02:00 Juan Martorell : > > *1st and foremost: disambiguator:* > > My current strategy for disambiguation is starting by the longer > constructions and then downsizing to the two tokens constructions. Positive > and negative examples should be

Re: Preventing inflections in suggestions

2016-03-12 Thread Jaume Ortolà i Font
2016-03-12 10:22 GMT+01:00 Marcin Miłkowski : > I remove archaic forms for English and Polish words altogether. You're > right, removing individual forms from the synthesizer is the easiest way > (not to mention it will be computationally cheap). > > I believe I also did this

Re: updating to Morfologik 2.1.0

2016-03-08 Thread Jaume Ortolà i Font
2016-03-08 18:02 GMT+01:00 Marcin Miłkowski : > I think it's almost completely irrelevant. And for some languages, the > differences are much bigger (e.g., for Polish), so fsa5 is definitely > not the best format. So please go ahead with CFSA2. > Ok. In any case, there is no

updating to Morfologik 2.1.0

2016-03-08 Thread Jaume Ortolà i Font
Hi, I have done the changes required in LT for updating to Morfologik 2.1.0. You can see them in the branch "updatemorfologik" (a code clean-up is pending). Someone should test these changes before I push them. The inputs for the dictionary builders are the same as before. As for the ouputs,

MS Word add-in translations

2016-02-07 Thread Jaume Ortolà i Font
Hi, If you want to translate the LanguageTool MS Word add-in into your language, you can do it now at transifex.com. See the file WinFormStrings.resx. Most of the strings are already translated using existing translations. Regards, Jaume Ortolà

Re: MS Word add-in for LT

2016-02-03 Thread Jaume Ortolà i Font
2016-02-03 10:30 GMT+01:00 Mike Unwalla : > Hello Jaume, > > The add-in works now. (It is not necessary for the Windows Firewall to > have an entry for Microsoft Word. Without an entry, I can still access the > server on languagetool.org.) > > Refer to the attachments for

Re: MS Word add-in for LT

2016-02-03 Thread Jaume Ortolà i Font
2016-02-02 22:24 GMT+01:00 Daniel Naber <daniel.na...@languagetool.org>: > On 2016-02-02 21:54, Jaume Ortolà i Font wrote: > > > I am not able to test every language, specially non-latin ones > > (Japanese, etc.). > > You could use the same document we a

Re: MS Word add-in for LT

2016-02-03 Thread Jaume Ortolà i Font
2016-02-03 16:10 GMT+01:00 Andriy Rysin : > Hi Jaume > > it seems that Ukrainian (uk-UA) is not in the list, can you please > take a look at that? > Sorry. I truncated the list in one place. It will be fixed in the next release. Jaume

Re: MS Word add-in for LT

2016-02-02 Thread Jaume Ortolà i Font
Thanks, Daniel. That was the bug. I have fixed it and published a new release. I have also completed the list of languages supported by LT [1]. It seems that Asturian, Breton and Tagalog cannot be defined in a MS Word document. In order to use LanguageTool with these languages, the user has to

Re: MS Word add-in for LT

2016-02-01 Thread Jaume Ortolà i Font
2016-02-01 19:42 GMT+01:00 Mike Unwalla : > Hello, > > I am struggling to use the Word add-in with Word 2010. I tried to install > on 2 different computers (Windows 7, Windows 8) and get the same problems > on both computers. > > I set LT to run as server on port 8081. I

Re: MS Word add-in for LT

2016-01-30 Thread Jaume Ortolà i Font
2016-01-30 9:55 GMT+01:00 Marcin Miłkowski : > > Why not simply port some of the code that we have for listing all > categories of rules -- or even write up a small piece of Java code to > create a resource file that would be used to create a localized dialog > for a given

Re: MS Word add-in for LT

2016-01-29 Thread Jaume Ortolà i Font
2016-01-29 14:07 GMT+01:00 Marcin Miłkowski <list-addr...@wp.pl>: > W dniu 29.01.2016 o 12:27, Jaume Ortolà i Font pisze: > Just tested and it works in MS Word 2007. > > There are some settings that seem to be relevant only for Catalan, > though, in the Settings dialog box (

Re: MS Word add-in for LT

2016-01-29 Thread Jaume Ortolà i Font
will give them > a chance to use LT in their work. > > Regards, > Andriy > > 2016-01-26 17:42 GMT-05:00 Marcin Miłkowski <list-addr...@wp.pl>: > > Hi Jaume, > > > > this is very good news! > > > > W dniu 26.01.2016 o 10:47, Jaume Ortolà i Font

MS Word add-in for LT

2016-01-26 Thread Jaume Ortolà i Font
Hi, I have made a beta release of a MS Word add-in for LanguageTool [1]. ("Add-in" is Microsoft terminology for "plug-in"). It has some limitations, but I think it can work fine and be useful. The checking is made only in a dialog box, with the usual options in these dialogues. Unfortunately the

Re: introduce new color for style errors

2016-01-05 Thread Jaume Ortolà i Font
Hi, In some installations of LanguageTool I use the "type" attribute in the elements "category", "rulegroup", "rule" to assign the colors. [1] For example: * Red: type="misspelling" * Blue, by default including: type="grammar" type="typographical" ... * Green:

Re: strange results in languagetool.org

2015-12-25 Thread Jaume Ortolà i Font
2015-12-25 12:00 GMT+01:00 Daniel Naber <daniel.na...@languagetool.org>: > On 2015-12-25 11:20, Jaume Ortolà i Font wrote: > > > Thanks, it works for me now. And what about the other problem? Words > > like "elapé" or "macroprocés" are

Re: strange results in languagetool.org

2015-12-25 Thread Jaume Ortolà i Font
2015-12-25 17:58 GMT+01:00 Daniel Naber <daniel.na...@languagetool.org>: > On 2015-12-25 15:29, Jaume Ortolà i Font wrote: > > > The problem is caused probably by an old file that was removed from > > the project, but remains in the server installation used in > > la

strange results in languagetool.org

2015-12-24 Thread Jaume Ortolà i Font
Hi, I've found very strange results in the web interface of languagetool.org. The server running for languagetool.org seems to be updated daily. It founds an error in a sentence with a rule I wrote yesterday (in Catalan): - Per què sigui així. But the "rule implementation" is not available

limit the numer of suggestions in LibreOffice

2015-12-21 Thread Jaume Ortolà i Font
Hi, I would like to limit the maximum number of suggestions that are shown in LibreOffice. In Catalan the Morfologik speller is used for spelling suggestions, and this number is sometimes excessive. I'm thinking of 15 suggestions. Is it okay to do it here for everybody? [1] Regards, Jaume

Re: LanguageTool in 2015 + the future

2015-12-14 Thread Jaume Ortolà i Font
2015-12-07 19:30 GMT+01:00 Marcin Miłkowski : > I think there's a community that we haven't addressed at all: language > professionals, be it proofreaders or translators (and translation > agencies). Translators are using suboptimal tools, such as Apsic XBench, > for their

Re: False error given on the online Esperanto checker, can't reproduce it in command line

2015-10-29 Thread Jaume Ortolà i Font
Hi Dominique, When there is no space at the end of the sentence, the last token has the POS tag "PARA_END", and this tag makes rule match: You can see the difference (with space vs without space) here: http://community.languagetool.org/analysis/analyzeText Regards, Jaume Ortolà 2015-10-29

Re: LanguageTool for Chrome

2015-10-27 Thread Jaume Ortolà i Font
It works for me! I just found that entities like or (not visible) are detected as a spelling errors. Regards, Jaume Ortolà 2015-10-27 18:48 GMT+01:00 Daniel Naber : > On 2015-10-27 18:29, Xavi Ivars wrote: > > > I just did som esmall tests in Gmail, and it

Re: LanguageTool for Chrome

2015-10-23 Thread Jaume Ortolà i Font
2015-10-23 16:40 GMT+02:00 Daniel Naber <daniel.na...@languagetool.org>: > On 2015-10-22 11:21, Jaume Ortolà i Font wrote: > > I have tested it and I found some strange behavior when trying to > > replace errors with suggestions. > > thanks for your feedback, a new ve

Re: LanguageTool for Chrome

2015-10-22 Thread Jaume Ortolà i Font
Hi Daniel, Good job! I have tested it and I found some strange behavior when trying to replace errors with suggestions. - When the error is the first word of the text, the replacement is not done. Nothing happens. - In a text area, when the text has more than one newline, and I try to make a

Re: Idea to introduce tag in LT grammar rules.

2015-09-05 Thread Jaume Ortolà i Font
2015-09-05 16:11 GMT+02:00 Daniel Naber : > On 2015-09-04 23:21, Dominique Pellé wrote: > > > I wish I could write a rule pattern like this: > > > > plein temps#chaque fois#rude épreuve#vol > > d’oiseau > > What about a more radical approach (which would be

Re: improvements in Morfologik speller

2015-06-11 Thread Jaume Ortolà i Font
2015-06-10 9:38 GMT+02:00 Daniel Naber daniel.na...@languagetool.org: On 2015-06-08 21:27, Jaume Ortolà i Font wrote: You are right. These results are not expected. I will look at them again. I have another problem now with the Morfologik snapshot and release: is is a typo (German

improvements in Morfologik speller

2015-06-02 Thread Jaume Ortolà i Font
Hi, I'm testing some minor improvements in the Morfologik speller. They are here: https://github.com/jaumeortola/morfologik-stemming The most important are: - Try all possible replacements at the same point of a word (not only the longest one). [1] - Apply the properties ignore-diacritics and

non-breaking space / spacebefore=no

2015-05-18 Thread Jaume Ortolà i Font
Hi, A non-breaking space in a pattern rule is considered as spacebefore=no. Is there a reason for this behavior? Regrads, Jaume Ortolà -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest

command-line XML output

2015-05-09 Thread Jaume Ortolà i Font
Hi, I need to use the command-line XML output (with the --api option). The list of unkown words is needed but it is missing. Can we add this list to the XML? It would be something like this: matches language ... / error ... / error ... / unknown_words word/word word/word

Re: Multiple zero of min occurances

2015-04-29 Thread Jaume Ortolà i Font
2015-04-29 19:38 GMT+02:00 Andriy Rysin ary...@gmail.com: I just found out that if I have multiple tokens with min=0 my patterns don't match. Looking at the code it seems like if min=0 we only check for next pattern to match but that next may also have 0 mins. I wrote little patch with tests

Re: Concordance error pt_PT

2015-04-21 Thread Jaume Ortolà i Font
will try again after I am back home. Thanks! PS-We are in the good path! Kind regards, Marco A.G.Pinto -- On 21/04/2015 13:13, Jaume Ortolà i Font wrote: Hi, You have a problem in the example correction. The rule should look like this: rule id=AS_A

Re: Concordance error - pt_PT

2015-04-14 Thread Jaume Ortolà i Font
Hi Marco, You need a tagger dictionary if you want to find concordance errors. We talked some time ago about adding a tagger dictionary for Portuguese. Is there any news on this? Regards, Jaume 2015-04-14 15:20 GMT+02:00 Marco A.G.Pinto marcoagpi...@mail.telepac.pt: Hello! Could someone

Re: MultiThreadedJLanguageTool

2015-02-22 Thread Jaume Ortolà i Font
2015-02-22 15:04 GMT+01:00 Andriy Rysin ary...@gmail.com: No, the only thing I pushed that will lead to regressions was remove more than one consequitive overlapping matches in SameRuleGroupFilter (and also make sure we remove conequitive overlaps produced by multiple threads). The

proofreading long documents

2015-01-29 Thread Jaume Ortolà i Font
Hi, I use LanguageTool in command-line for proof-reading long documents (whole books) and I'd like to make this process easily available to more people (without additional scripts). It could become a web service, but some people doesn't want to send copyrighted material to a public webpage, and

Tests fail: concurrency problem?

2014-12-23 Thread Jaume Ortolà i Font
Hi, I get test errors in HTTPServerLoadTest with the current master branch (no other changes), the same or similar errors in different machines. Has anyone else seen this error? Regards, Jaume Ortolà Tests in error: HTTPServerLoadTest.testHTTPServer:61 » Execution

Re: added.txt activated for most languages

2014-12-22 Thread Jaume Ortolà i Font
Hi Daniel, I use the manual-tagger not only as a way to add new words and tags, but also as a means of fixing tags temporarily until the next dictionary update. So if there is a manual tag, the dictionary tag is ignored. I think that makes sense. Could we do it likewise in the CombiningTagger?

Re: Changes in UpperCaseSentenceStart

2014-12-21 Thread Jaume Ortolà i Font
2014 11:36:13 +0100 от Jaume Ortolà i Font : Hi, I have modified the rule UpperCaseSentenceStart so that there is a match in sentences starting with quotes like « or “ and a lower case word. In the nightly tests there are some new matches for different languages. Tell me if there is any

Changes in UpperCaseSentenceStart

2014-12-20 Thread Jaume Ortolà i Font
Hi, I have modified the rule UpperCaseSentenceStart so that there is a match in sentences starting with quotes like « or “ and a lower case word. In the nightly tests there are some new matches for different languages. Tell me if there is any problem. In French there are new matches caused by

Re: bug: morfologik rule with word ls

2014-11-29 Thread Jaume Ortolà i Font
= Pattern.compile(.* + + \\d+ + .*); to make it more robust. I have found some segments of words converted unexpectedly in accepted words. Regards, Jaume Ortolà 2014-11-29 11:05 GMT+01:00 Daniel Naber daniel.na...@languagetool.org: On 2014-11-28 23:46, Jaume Ortolà i Font wrote: I have found

bug: morfologik rule with word ls

2014-11-28 Thread Jaume Ortolà i Font
I have found a strange bug. Take the non-existent word ls (LS). This happens in Catalan: 1) The POS tag is null. The word is not in the dictionary and it is not tagged by the tagger. OK 2) In MorfologikCatalanSpellerRuleTest there is a rule match. OK 3) There is no rule match in the LT web

Re: Question about Spanish language

2014-11-03 Thread Jaume Ortolà i Font
Hi, I'm not sure I understand the question. In Spanish LL and RR are usually double letters o digraphs except in a few cases. RR are two independent letters when they come from adding a prefix to a word: inter+relacionar = interrelacionar; hiper+realismo =hiperrealismo, etc. But the spelling of

Re: Applying matched token's POS tag to another matched token

2014-10-31 Thread Jaume Ortolà i Font
Currently it's not possible. I have need it too sometimes. Regards, Jaume Ortolà 2014-10-30 17:37 GMT+01:00 Linas Valiukas shirshe...@gmail.com: Hi there, LanguageTool seems to provide an ability to apply POS tag of a match to a word, like this (taken from Development Overview page):

Re: Case sensitivity in MultiWordChunker

2014-10-26 Thread Jaume Ortolà i Font
2014-10-26 14:03 GMT+01:00 R.J. Baars r.j.ba...@xs4all.nl: What does Multiwordchunker do? See a previous thread in this list: spell checker enhancement (sept 16). Jaume --

Wikicheck not working for some articles

2014-10-14 Thread Jaume Ortolà i Font
Hi, Wikicheck is not working now for articles with titles that include some diacritic. See, for example, [1]. It used to work well. Regards, Jaume Ortolà [1] http://tools.wmflabs.org/languagetool/pageCheck/index?lang=caurl=Llista_dels_rius_m%C3%A9s_llargs

Re: IndexOutOfBoundsException with min=0 attribute in pattern rule

2014-10-13 Thread Jaume Ortolà i Font
Hi, I think a token min=0 at the end or at the start of a pattern is useless. The pattern is equivalent with or without this token. The error probably comes from a bug. Nobody tried token min=0 at the end of a pattern precisely because it is useless. Regards, Jaume Ortolà 2014-10-13 17:07

Re: Morfologik speller

2014-10-03 Thread Jaume Ortolà i Font
2014-10-03 14:50 GMT+02:00 Marcin Miłkowski list-addr...@wp.pl: W dniu 2014-10-03 o 13:22, R.J. Baars pisze: Marcin, would it be possible to use the morfologik speller as a separate program, to throw a list of words at, and get the alternatives? No. It does not tokenize words, and you

Re: Large amount of rules ...

2014-09-27 Thread Jaume Ortolà i Font
2014-09-27 11:06 GMT+02:00 R.J. Baars r.j.ba...@xs4all.nl: It is all about suggesting a Dutch word for a loanword. Then why don't you use a simple replace rule (in Java)? You can use the existing one (or adapt it) and put the list of words in a text file. Jaume

Re: spell checker enhancement

2014-09-16 Thread Jaume Ortolà i Font
2014-09-16 11:21 GMT+02:00 R.J. Baars r.j.ba...@xs4all.nl: We don't agree. There is a spellchecker, but also a single word ignore list for it. There are XML rules, but also a Simplereplace rule, a compounding rule. So apart from the hammer and the screwdriver, there are more tools. There

Re: spell checker enhancement

2014-09-16 Thread Jaume Ortolà i Font
...@xs4all.nl: Jaume, thanks, but I am not sure. Depends on its implementation I think. Where can I find more info? Ruud Op 16-09-14 om 12:26 schreef Jaume Ortolà i Font: 2014-09-16 11:21 GMT+02:00 R.J. Baars r.j.ba...@xs4all.nl: We don't agree. There is a spellchecker, but also

Re: spell checker enhancement

2014-09-16 Thread Jaume Ortolà i Font
this should be optionally changed (ie, tag the inside tokens too). Regards, Jaume (Might come in handy for just this tagging..) Ruud Op 16-09-14 om 12:56 schreef Jaume Ortolà i Font: Hi, Ruud. I don't find any documentation. It is used in Polish, French, Catalan, Russian, Ukrainian

Re: spell checker enhancement

2014-09-16 Thread Jaume Ortolà i Font
Op 16-09-14 om 13:23 schreef Jaume Ortolà i Font: 2014-09-16 13:03 GMT+02:00 R.Baars baar...@xs4all.nl: I see. This is probably of no use for spellchecking, but it is for postagging. It gives no suggestions, but it can be used for avoiding false positives in spellchecking, if you set

Re: Multiple suggestions by SimpleReplaceRule

2014-09-13 Thread Jaume Ortolà i Font
2014-09-13 10:24 GMT+02:00 R.J. Baars r.j.ba...@xs4all.nl: I was wondering if the simplereplacerule supports multiple suggestions. I wanted to suggest 'd.m.v.' and 'door middel van' for 'dmv'. dmv=d.m.v dmv=door middel van You can write: dmv=d.m.v.|door middel van The rule also supports

Re: Suggestion: find POS tag of portion of a word in XML rules

2014-09-10 Thread Jaume Ortolà i Font
Hi Dominique, I think the best thing to do is to change the tokenization appropriately, and segment the pronouns in different tokens. That's what it's done in Catalan. Of course, the tokenizer gets a little more complex. But, after that, you can do many more things in the rules. The alternatives

Re: Bug is disambiguator?

2014-09-03 Thread Jaume Ortolà i Font
Dominique, As far as I remember (it is documented somwhere), that is what happens when you try to filter a non-existent tag. You try to filter N.* but there is no N.* tag in the token. In your sentence eil is not tagged with N. You need something like this: rule pattern token

Re: Bug is disambiguator?

2014-09-03 Thread Jaume Ortolà i Font
2014-09-03 12:12 GMT+02:00 Marcin Miłkowski list-addr...@wp.pl: You will see that the Catalan pattern rule breaks then. Please fix it, and I'll see if that's everything we need. Thanks. Fixed a couple of disambiguation rules. Try now. (If you want to run all the tests right now, you need to

Re: locqualityissuetype

2014-08-27 Thread Jaume Ortolà i Font
2014-08-27 19:26 GMT+02:00 R.J. Baars r.j.ba...@xs4all.nl: I see. But don't understand. What I do understand is it meant to specify something, out of an issue list. Is there an issue list somewhere (these documents are so complicated...) See the list of values here:

Re: The SENT_END challenge

2014-08-09 Thread Jaume Ortolà i Font
Hi, A possible and simple solution is to write two rules. One for sentences with ending punctuation: pattern marker token regexp=yes(you|thei|ou)r/token /marker token regexp=yes[.?!]/token /pattern And another one for sentences without ending

enabling and disabling rules in LT command-line

2014-07-20 Thread Jaume Ortolà i Font
Hi, I need to enable and disable rules at the same time in command-line. This is already done in the server mode with three parameters[1]: enabled = list of rules... disabled = list of rules... enabledOnly = yes [by default, no] Could we implement the same approach in the command-line? Will

Re: enabling and disabling rules in LT command-line

2014-07-20 Thread Jaume Ortolà i Font
2014-07-20 18:07 GMT+02:00 Daniel Naber daniel.na...@languagetool.org: On 2014-07-20 11:22, Jaume Ortolà i Font wrote: enabled = list of rules... disabled = list of rules... enabledOnly = yes [by default, no] Could we implement the same approach in the command-line

Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-08 Thread Jaume Ortolà i Font
2014-07-08 9:37 GMT+02:00 Marcin Miłkowski list-addr...@wp.pl: The Portuguese dictionary is already built. We simply haven't included it yet because we usually start from a certain number of rules, and then add the tagger. Using the tags in rules is a very good idea overall. I agree with

Re: Morphologic Analyser to solve concordance issue for Portuguese

2014-07-08 Thread Jaume Ortolà i Font
2014-07-08 17:34 GMT+02:00 Marco A.G.Pinto marcoagpi...@mail.telepac.pt: Hello! I have contacted my Minho University friends who make the pt_PT dictionaries for Mozilla and OpenOffice/LibreOffice. They said they can create the postag dictionary and help. Hi Marco, What I and Marcin try

Re: Tagger Dictionary and Minho University - pt_PT

2014-07-08 Thread Jaume Ortolà i Font
, Spanish or Catalan), some existing rules could be used as models, and those who are familiar with them (as myself) could contribute more readily. Regards, Jaume Ortolà On Tue, Jul 8, 2014 at 9:39 PM, Jaume Ortolà i Font jaumeort...@gmail.com wrote: 2014-07-08 21:53 GMT+02:00 Marco A.G.Pinto

sample Portuguese rules

2014-07-08 Thread Jaume Ortolà i Font
Here you can see the results of the sample rules I created in Portuguese: https://languagetool.org/regression-tests/20140708/result_pt_20140708.html Suas is wrongly tagged in the Freeling dictionary as singular. It should be plural. That explains most of the false alarms. But the rule needs

rules default=off are enabled in Wikipedia check

2014-05-06 Thread Jaume Ortolà i Font
Hi, This happens now in the WikiCheck and in the nightly differences. For example, with this rule from Catalan grammar.xml: rule id=EVITA_DEMOSTRATIUS_AQUEST name=Evita els demostratius 'aquest' default=off It was caused by some change today. Regards, Jaume

Re: What is wrong with this rule (pt_PT)?

2014-04-30 Thread Jaume Ortolà i Font
Marco, You have a token with vela/velas and then another with bandeira/bandeiras. The rule expects a sentence like arrrear a vela bandeira. Instead of token regexp=yesvela|velas/token token regexp=yesbandeira|bandeiras/token Use token

Re: Large wordlist of exceptions

2014-03-21 Thread Jaume Ortolà i Font
2014-03-21 9:32 GMT+01:00 Nathan Wells sungk...@gmail.com: So I want to create a rule that asks the user to use the Latin colon rather than the Khmer character ៈ except in Khmer words that actually end in the ៈ character. There are 365 Khmer words that can end in a ៈ character. What is

capitalizing Morfologik Spelling suggestions

2014-02-02 Thread Jaume Ortolà i Font
Hi, This has become a common request from users. The suggestions for a capitalized misspelled word are expected to be also capitalized. I suppose this is not true for all languages in all situations. So what can we do? 1) Capitalize always the suggestion when it is the first word of a sentence.

Re: Token postag OR word

2014-01-28 Thread Jaume Ortolà i Font
2014-01-28 Kumara Bhikkhu kumara.bhik...@gmail.com Can a token be a mixture of postags and words? Example: Can a token match send_end or of|into? If not, how do I indicate this? Yes, you can write this: or token postag=SENT_END / token regexp=yesof|into/token or It's equivalent to using

Re: fsa.dict.speller.replacement-pairs slow down spell check

2013-12-31 Thread Jaume Ortolà i Font
Hi, In the current implementation the number of possible suggestions grows exponentially with the replacement pairs, which is not a good thing... For Milkowski you get 6144 possible suggestions in American English. I fixed a limit of 7 possible simultaneous replacements in a word, which (if the

scaping translations

2013-12-21 Thread Jaume Ortolà i Font
Hi, There are some characters in translations that need scaping. I have seen, for example, missing apostrophes in http://community.languagetool.org. So where is the proper place to do the scaping? Is it the responsibility of the translators in Transifex? Regards, Jaume Ortolà

Re: scaping translations

2013-12-21 Thread Jaume Ortolà i Font
+de+Som%C3%A0lialang=ca So should I write apos; or quot;? http://www.riuraueditors.cat Regards, Jaume Ortolà 2013/12/21 Daniel Naber list2...@danielnaber.de On 2013-12-21 12:10, Jaume Ortolà i Font wrote: My question is this. If translating from English to another language, an apostrophe

Re: Improving spelling suggestions with frequency dictionaries

2013-12-09 Thread Jaume Ortolà i Font
2013/12/9 Anton Meixome meix...@certima.net I'm newbie here but I have a question. Why there isn't frequency list for galician in https://github.com/mozilla-b2g/gaia/tree/master/keyboard/dictionaries ? This is not our project. You should ask there. We chosed these lists because there are a

Re: Improving spelling suggestions with frequency dictionaries

2013-12-09 Thread Jaume Ortolà i Font
version of Morfologik. And then we'll be able to rebuild the dictionaries and adjust the tests if needed. Regards, Jaume Ortolà 2013/12/9 Marcin Miłkowski list-addr...@wp.pl W dniu 2013-12-09 00:12, Jaume Ortolà i Font pisze: Hi, I have implemented the use of the frequency word lists

Re: Improving spelling suggestions with frequency dictionaries

2013-12-08 Thread Jaume Ortolà i Font
, we could consider that the last byte is the frequency data and the separator between POS tag and frequency is not needed. The other solution is to change the separator... Regards, Jaume Ortolà 2013/11/26 Marcin Miłkowski list-addr...@wp.pl W dniu 2013-11-26 18:44, Jaume Ortolà i Font pisze

Re: Improving spelling suggestions with frequency dictionaries

2013-11-26 Thread Jaume Ortolà i Font
2013/11/25 Daniel Naber list2...@danielnaber.de On 2013-11-25 11:11, Jaume Ortolà i Font wrote: - A method for building the dictionary, assuming that it will be used only for some languages (backward compatible). - A way of using the frequency information in the ordering of suggestions

Re: Improving spelling suggestions with frequency dictionaries

2013-11-26 Thread Jaume Ortolà i Font
2013/11/26 Daniel Naber list2...@danielnaber.de On 2013-11-26 15:27, Jaume Ortolà i Font wrote: Look at these wordlists [1]. They are Apache 2.0. The words are classified in 256 ranges. [1] https://github.com/mozilla-b2g/gaia/tree/master/keyboard/dictionaries The German one looks okay

Re: WikiCheck

2013-10-18 Thread Jaume Ortolà i Font
it be an option in the comand line? Regards, Jaume Ortolà [1] https://ca.wikipedia.org/wiki/Glicèrid Salutacions, Jaume Ortolà www.riuraueditors.cat 2013/10/17 Daniel Naber list2...@danielnaber.de On 2013-10-17 09:28, Jaume Ortolà i Font wrote: Hi Jaume, But when you submit changes to Wikipedia

WikiCheck

2013-10-17 Thread Jaume Ortolà i Font
Hi, When usign LanguageTool WikiCheck, if the article has more than 100 errors, you get a warning: More than 100 possible errors found - the remaining errors will not be shown. But when you submit changes to Wikipedia, what you get in Wikipedia is always a no difference page. No change is

Re: why unification?

2013-10-16 Thread Jaume Ortolà i Font
2013/10/16 Daniel Naber list2...@danielnaber.de Hi, although I think I understand the technical details of unification, I'm not sure how/why it is used in grammar.xml. For example, if a sequence of words share the same gender and number, that means there's agreement, so you cannot use that

Re: building a synthesizer

2013-10-04 Thread Jaume Ortolà i Font
Daniel, I found the same problem recently. I resorted to the attached perl script for this step. Regards, Jaume Ortolà 2013/10/4 Daniel Naber list2...@danielnaber.de Hi, did anybody recently build a synthesizer? When I follow the instructions at

Re: Regional variants of Catalan (ca-ES-valencia)

2013-10-02 Thread Jaume Ortolà i Font
2013/9/22 Daniel Naber list2...@danielnaber.de On 2013-09-22 11:51, Jaume Ortolà i Font wrote: default parameters. So yes, I would prefer another way to deal with it. Perhaps what you suggested at first: ValencianCatalan implements getEnabledRules and getDisabledRules

LibreOffice goes BCP 47

2013-09-23 Thread Jaume Ortolà i Font
Hi, This could be of interest to you. Eike Rathke: I'll talk about it at the LibreOffice Conference 2013 at Milano, so to get all the details please join me and attend Getting you language in on Thursday, 26 September at 15:30 in Sala Alfa.

Re: Regional variants of Catalan (ca-ES-valencia)

2013-09-22 Thread Jaume Ortolà i Font
2013/9/22 Daniel Naber list2...@danielnaber.de Currently the two grammar.xml files look almost the same. Maybe we can avoid that by moving the common parts to its own files and including them, as described here? http://xml.silmaril.ie/includes.html This would need to be tested carefully

  1   2   >