Problem solved.
The cause of the problem was a missing word between 2 pipe characters.
This set of disambiguation rules shows how the multiple <S> occurs:
<rule id="PROJECT_TN_NOUN_SINGULAR" name="Approved Technical Names">
<pattern>
<token regexp="yes">airframe|boiler||clamp</token><!-- a missing word
between || causes 2 <S> markers for the first word in sentence. -->
</pattern>
<disambig action="add"><wd pos="PROJECT_TN_NOUN_SINGULAR"/></disambig>
</rule>
<rulegroup id="MAKE_PROJECTTERMS" name="Make PROJECTTERMS">
<rule id="MAKE_PROJECT_TN_NOUN" name="Make PROJECT_TN_NOUN"> <!-- This
rule adds an <S> marker to the first word. -->
<pattern>
<token postag_regexp="yes"
postag="PROJECT_TN_NOUN_SINGULAR|PROJECT_TN_NOUN_PLURAL"/>
</pattern>
<disambig action="add"><wd pos="PROJECT_TN_NOUN"/></disambig>
</rule>
<rule id="MAKE_PROJECTTERM" name="Make PROJECTTERM"> <!-- This rule adds
an <S> marker to the first word. -->
<pattern>
<token postag_regexp="yes"
postag="PROJECT_TN_ADJECTIVE|PROJECT_TN_NOUN"/>
</pattern>
<disambig action="add"><wd pos="PROJECTTERM"/></disambig>
</rule>
</rulegroup>
Feature request: testrules finds this syntax error: ||
Regards,
Mike
-----Original Message-----
From: Mike Unwalla [mailto:[email protected]]
Sent: 18 February 2013 19:45
To: 'development discussion for LanguageTool'
Subject: RE: Multiple instances of <S>: update
Marcin,
Thank you for your clarification about rewriting all tags. I tried to remove
the bug report, but I could not see how to do that. Sorry for the mess.
>Please describe in detail your input.
The entire input text is only 1 word: testword
Regards,
Mike
-----Original Message-----
From: Marcin Milkowski [mailto:[email protected]]
Sent: 18 February 2013 19:00
To: [email protected]
Subject: Re: Multiple instances of <S>: update
W dniu 2013-02-18 18:30, Mike Unwalla pisze:
> With a particular rule, I get multiple instances of postag 'VB':
> <S> You[you/PRP] must[must/MD] check[check/VB, check/VB, check/VB,
check/VB,
> check/VB] this[this/DT] text[text/NN:UN].[./., </S>]
This is not a bug, this is a feature. You are rewriting all existing
tags to 'VB', so no wonder you end up with multiple instances being the
same. All *different* tags were rewritten using this:
<disambig><match no="1" postag="VB"/></disambig>
So this is exactly what should happen.
> The problem occurs when I add the rule to LT's original disambiguation.xml
> file.
>
> I created a bug report
>
(https://sourceforge.net/tracker/?func=detail&aid=3605206&group_id=110216&at
> id=655717).
>
> Regards,
>
> Mike
>
> -----Original Message-----
> From: Mike Unwalla [mailto:[email protected]]
> Sent: 18 February 2013 11:00
> To: [email protected]
> Subject: Multiple instances of <S>
>
> Hello,
>
> When LT tags text, the tag <S> shows the start of a sentence, doesn't it?
>
> With one particular disambiguation.xml file, I get unexpected results for
> the tagged text. LT gives multiple instances of the sentence start marker
> <S>, as shown in this output from the GUI:
> <S><S><S><S> testword[</S>testword/TESTPOS]
>
> The first rule in my disambiguation.xml is as follows. (Testrules gives no
> errors.):
>
> <rule id="add_TESTPOS" name="add TESTPOS">
> <pattern>
> <token>testword</token>
> </pattern>
> <disambig action="add"><wd pos="TESTPOS"/></disambig>
> </rule>
>
> If I put that rule in the LanguageTool disambiguation.xml file, there is
> only one <S> tag, as I expect.
Without knowing the full *input* sentence, it's hard to say, but if the
testword was the only word in your sentence, then it has <S> by default.
Note that testword itself is tagged only once, so I'm wondering what you
had before 'testword.
>
> I do not understand:
> 1. How can there be multiple sentence starts?
Multiple end of lines are enough.
> 2. Something in my disambiguation.xml makes LT show multiple <S>. But,
this
> is the FIRST rule. How can rules that come after the first rule affect the
> tagging? (The rules "are applied in the order as they appear in the file"
> http://wiki.languagetool.org/developing-a-disambiguator .)
If there's an end-of-line character, it will be tagged.
>
> (I think that this is a bug. Probably, I will send more related questions,
> but for now, I want to keep things simple and focus only on one thing at a
> time.)
I'm not sure it is. Please describe in detail your input.
Best,
Marcin
>
> Regards,
>
> Mike Unwalla
> Contact: www.techscribe.co.uk/techw/contact.htm
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel