Bug in Matcher postag replacement

2015-05-01 Thread Andriy Rysin
I've found a bug in the matcher postag replacement: if I have 2 out
match suggestions and first one uses postag_replace in match element,
but second one just uses simple match no=N/ the matcher in the
second suggestion also gets first postag replacement applied.
If I put the suggestion with non-replacing matcher first everything is good.

Here's the test that shows the problem, first test passes: it
generates 1 simple match for v1 text and 3 (via postag replacment) for
v2. The only difference in the second test is the order of the
matchers - replacing ones go first and now we get 3 versions of v1 and
3 versions of v2 which is wrong (as v1 should have stayed with 1
version).

Note: I could not easily reproduce it in the core module test as I
needed a postag_replace and Demo language does not have tags so this
test uses Ukrainian but to create PatternRuleMatcher I had to put it
in org.languagetool.rules.patterns

Regards,
Andriy
package org.languagetool.rules.patterns;

import static org.junit.Assert.assertEquals;

import java.io.IOException;
import java.util.Arrays;
import java.util.List;

import org.junit.BeforeClass;
import org.junit.Test;
import org.languagetool.JLanguageTool;
import org.languagetool.language.Ukrainian;
import org.languagetool.rules.RuleMatch;
import org.languagetool.rules.patterns.Match.CaseConversion;
import org.languagetool.rules.patterns.Match.IncludeRange;

public class SpecialMatcherTest {
  private static JLanguageTool langTool;

  @BeforeClass
  public static void setup() {
langTool = new JLanguageTool(new Ukrainian());
  }

  @Test
  public void testSuggestionWithPostagReplace1() throws Exception {
ListPatternToken patternTokens = Arrays.asList(makeElement(тактичний), makeElement(узгодження));
String suggestionsOutMsg = suggestionv1: \\1 \\2/suggestionsuggestionv2: \\1 \\2/suggestion;
PatternRule rule = new PatternRule(, langTool.getLanguage(), patternTokens, my description, my message, short message, suggestionsOutMsg);
PatternRuleMatcher matcher = new PatternRuleMatcher(rule, false);

rule.addSuggestionMatchOutMsg(new Match(null, null, false, null, null, CaseConversion.NONE, false, false, IncludeRange.NONE));
rule.addSuggestionMatchOutMsg(new Match(null, null, false, null, null, CaseConversion.NONE, false, false, IncludeRange.NONE));
rule.addSuggestionMatchOutMsg(new Match((adj.*)v_rod(.*), $1v_zna$2, true, null, null, CaseConversion.NONE, false, false, IncludeRange.NONE));
rule.addSuggestionMatchOutMsg(new Match((noun.*)v_rod(.*), $1v_zna$2, true, null, null, CaseConversion.NONE, false, false, IncludeRange.NONE));

RuleMatch[] matches = getMatches(тактичного узгодження, matcher);
System.out.println(matches[0].getSuggestedReplacements());
assertEquals(Arrays.asList(v1: тактичного узгодження, v2: тактичне узгодження, v2: тактичний узгодження, v2: тактичного узгодження), matches[0].getSuggestedReplacements());
  }

  @Test
  public void testSuggestionWithPostagReplace2() throws Exception {
ListPatternToken patternTokens = Arrays.asList(makeElement(тактичний), makeElement(узгодження));
String suggestionsOutMsg = suggestionv1: \\1 \\2/suggestionsuggestionv2: \\1 \\2/suggestion;
PatternRule rule = new PatternRule(, langTool.getLanguage(), patternTokens, my description, my message, short message, suggestionsOutMsg);
PatternRuleMatcher matcher = new PatternRuleMatcher(rule, false);

rule.addSuggestionMatchOutMsg(new Match((adj.*)v_rod(.*), $1v_zna$2, true, null, null, CaseConversion.NONE, false, false, IncludeRange.NONE));
rule.addSuggestionMatchOutMsg(new Match((noun.*)v_rod(.*), $1v_zna$2, true, null, null, CaseConversion.NONE, false, false, IncludeRange.NONE));
rule.addSuggestionMatchOutMsg(new Match(null, null, false, null, null, CaseConversion.NONE, false, false, IncludeRange.NONE));
rule.addSuggestionMatchOutMsg(new Match(null, null, false, null, null, CaseConversion.NONE, false, false, IncludeRange.NONE));

RuleMatch[] matches = getMatches(тактичного узгодження, matcher);
System.out.println(matches[0].getSuggestedReplacements());
assertEquals(Arrays.asList(v2: тактичне узгодження, v2: тактичний узгодження, v2: тактичного узгодження, v1: тактичного узгодження), matches[0].getSuggestedReplacements());
  }

  private PatternToken makeElement(String token) {
return new PatternToken(token, false, false, true);
  }
  private RuleMatch[] getMatches(String input, PatternRuleMatcher matcher) throws IOException {
return matcher.match(langTool.getAnalyzedSentence(input));
  }

}
--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.

Re: Create a pt_PT rule: HÁ n SEGUNDOS/MINUTOS/HORAS/DIAS/MESES/ANOS ATRÁS (remove ATRÁS)

2015-05-01 Thread Marco A.G.Pinto

Thanks, Yakov!

It worked like a charm!

Not even commercial software have this rule! :-P

Kind regards from your friend,
  Marco A.G.Pinto
---


On 30/04/2015 21:03, Yakov Reztsov wrote:

Hello!

You rule is like this:




  !-- HÁ n SEGUNDOS/MINUTOS/HORAS/DIAS/MESES/ANOS ATRÁS (remove 
ATRÁS) --

rule id=HÁ-ATRÁS name=há n tempo atrás
  pattern
tokenhá/token
token/token
token 
regexp=yessegundos?|minutos?|horas?|dias?|mês|meses|anos?/token

tokenatrás/token
  /pattern
  messageO verbo haver remove a necessidade de usar atrás: 
suggestion\1 \2 \3/suggestion/message
  example correction=HÁ n SEGUNDOSmarkerHÁ n SEGUNDOS 
ATRÁS/marker/example

/rule



I tested this rule on community.languagetool.org, and I got 21 matches.

 --
Yakov Reztsov



Четверг, 30 апреля 2015, 18:55 +01:00 от Marco A.G.Pinto 
marcoagpi...@mail.telepac.pt:


Hello!

How do I create a rule that asks to remove the word atrás when
há is used since it is redundant?

The rule in English for example would be:
10 days ago - há 10 dias atrás

The verb há in Portuguese removes the need to use atrás.

I triggered here the plural and singular forms of time.

I just need to know how to make the rule suggest the removal of
atrás.


   !-- HÁ n SEGUNDOS/MINUTOS/HORAS/DIAS/MESES/ANOS ATRÁS (remove
ATRÁS) --
rule id=HÁ-ATRÁS name=há n tempo atrás
  pattern
tokenhá/token
token/token ; HOW TO MAKE IT ACCEPT ANY WORD HERE?
token
regexp=yessegundos?|minutos?|horas?|dias?|mês|meses|anos?/token
tokenatrás/token
  /pattern
  messageO verbo haver remove a necessidade de usar
atrás./message
  example correction=/example
/rule


Thanks!

Kind regards,
  Marco A.G.Pinto
--



--
--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel