Problem solved.

The cause of the problem was a missing word between 2 pipe characters.

This set of disambiguation rules shows how the multiple <S> occurs:

    <rule id="PROJECT_TN_NOUN_SINGULAR" name="Approved Technical Names">
      <pattern>
       <token regexp="yes">airframe|boiler||clamp</token><!-- a missing word
between || causes 2 <S> markers for the first word in sentence. -->
      </pattern>
      <disambig action="add"><wd pos="PROJECT_TN_NOUN_SINGULAR"/></disambig>
    </rule>

  <rulegroup id="MAKE_PROJECTTERMS" name="Make PROJECTTERMS">
    <rule id="MAKE_PROJECT_TN_NOUN" name="Make PROJECT_TN_NOUN"> <!-- This
rule adds an <S> marker to the first word. -->
      <pattern>
        <token postag_regexp="yes"
postag="PROJECT_TN_NOUN_SINGULAR|PROJECT_TN_NOUN_PLURAL"/>
      </pattern>
      <disambig action="add"><wd pos="PROJECT_TN_NOUN"/></disambig>
    </rule>
    <rule id="MAKE_PROJECTTERM" name="Make PROJECTTERM"> <!-- This rule adds
an <S> marker to the first word. -->
      <pattern>
        <token postag_regexp="yes"
postag="PROJECT_TN_ADJECTIVE|PROJECT_TN_NOUN"/>
      </pattern>
      <disambig action="add"><wd pos="PROJECTTERM"/></disambig>
    </rule>
  </rulegroup>

Feature request: testrules finds this syntax error: ||

Regards,

Mike


-----Original Message-----
From: Mike Unwalla [mailto:[email protected]] 
Sent: 18 February 2013 19:45
To: 'development discussion for LanguageTool'
Subject: RE: Multiple instances of <S>: update

Marcin,

Thank you for your clarification about rewriting all tags. I tried to remove
the bug report, but I could not see how to do that. Sorry for the mess.

>Please describe in detail your input.
The entire input text is only 1 word: testword

Regards,

Mike

-----Original Message-----
From: Marcin Milkowski [mailto:[email protected]] 
Sent: 18 February 2013 19:00
To: [email protected]
Subject: Re: Multiple instances of <S>: update

W dniu 2013-02-18 18:30, Mike Unwalla pisze:
> With a particular rule, I get multiple instances of postag 'VB':
> <S> You[you/PRP] must[must/MD] check[check/VB, check/VB, check/VB,
check/VB,
> check/VB] this[this/DT] text[text/NN:UN].[./., </S>]

This is not a bug, this is a feature. You are rewriting all existing 
tags to 'VB', so no wonder you end up with multiple instances being the 
same. All *different* tags were rewritten using this:

<disambig><match no="1" postag="VB"/></disambig>

So this is exactly what should happen.

> The problem occurs when I add the rule to LT's original disambiguation.xml
> file.
>
> I created a bug report
>
(https://sourceforge.net/tracker/?func=detail&aid=3605206&group_id=110216&at
> id=655717).
>
> Regards,
>
> Mike
>
> -----Original Message-----
> From: Mike Unwalla [mailto:[email protected]]
> Sent: 18 February 2013 11:00
> To: [email protected]
> Subject: Multiple instances of <S>
>
> Hello,
>
> When LT tags text, the tag <S> shows the start of a sentence, doesn't it?
>
> With one particular disambiguation.xml file, I get unexpected results for
> the tagged text. LT gives multiple instances of the sentence start marker
> <S>, as shown in this output from the GUI:
> <S><S><S><S> testword[</S>testword/TESTPOS]
>
> The first rule in my disambiguation.xml is as follows. (Testrules gives no
> errors.):
>
>      <rule id="add_TESTPOS" name="add TESTPOS">
>        <pattern>
>          <token>testword</token>
>        </pattern>
>        <disambig action="add"><wd pos="TESTPOS"/></disambig>
>      </rule>
>
> If I put that rule in the LanguageTool disambiguation.xml file, there is
> only one <S> tag, as I expect.

Without knowing the full *input* sentence, it's hard to say, but if the 
testword was the only word in your sentence, then it has <S> by default. 
Note that testword itself is tagged only once, so I'm wondering what you 
had before 'testword.

>
> I do not understand:
> 1. How can there be multiple sentence starts?

Multiple end of lines are enough.

> 2. Something in my disambiguation.xml makes LT show multiple <S>. But,
this
> is the FIRST rule. How can rules that come after the first rule affect the
> tagging? (The rules "are applied in the order as they appear in the file"
> http://wiki.languagetool.org/developing-a-disambiguator .)

If there's an end-of-line character, it will be tagged.

>
> (I think that this is a bug. Probably, I will send more related questions,
> but for now, I want to keep things simple and focus only on one thing at a
> time.)

I'm not sure it is. Please describe in detail your input.

Best,
Marcin

>
> Regards,
>
> Mike Unwalla
> Contact: www.techscribe.co.uk/techw/contact.htm


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to