Marcin, Thank you for your clarification about rewriting all tags. I tried to remove the bug report, but I could not see how to do that. Sorry for the mess.
>Please describe in detail your input. The entire input text is only 1 word: testword Regards, Mike -----Original Message----- From: Marcin Milkowski [mailto:[email protected]] Sent: 18 February 2013 19:00 To: [email protected] Subject: Re: Multiple instances of <S>: update W dniu 2013-02-18 18:30, Mike Unwalla pisze: > With a particular rule, I get multiple instances of postag 'VB': > <S> You[you/PRP] must[must/MD] check[check/VB, check/VB, check/VB, check/VB, > check/VB] this[this/DT] text[text/NN:UN].[./., </S>] This is not a bug, this is a feature. You are rewriting all existing tags to 'VB', so no wonder you end up with multiple instances being the same. All *different* tags were rewritten using this: <disambig><match no="1" postag="VB"/></disambig> So this is exactly what should happen. > The problem occurs when I add the rule to LT's original disambiguation.xml > file. > > I created a bug report > (https://sourceforge.net/tracker/?func=detail&aid=3605206&group_id=110216&at > id=655717). > > Regards, > > Mike > > -----Original Message----- > From: Mike Unwalla [mailto:[email protected]] > Sent: 18 February 2013 11:00 > To: [email protected] > Subject: Multiple instances of <S> > > Hello, > > When LT tags text, the tag <S> shows the start of a sentence, doesn't it? > > With one particular disambiguation.xml file, I get unexpected results for > the tagged text. LT gives multiple instances of the sentence start marker > <S>, as shown in this output from the GUI: > <S><S><S><S> testword[</S>testword/TESTPOS] > > The first rule in my disambiguation.xml is as follows. (Testrules gives no > errors.): > > <rule id="add_TESTPOS" name="add TESTPOS"> > <pattern> > <token>testword</token> > </pattern> > <disambig action="add"><wd pos="TESTPOS"/></disambig> > </rule> > > If I put that rule in the LanguageTool disambiguation.xml file, there is > only one <S> tag, as I expect. Without knowing the full *input* sentence, it's hard to say, but if the testword was the only word in your sentence, then it has <S> by default. Note that testword itself is tagged only once, so I'm wondering what you had before 'testword. > > I do not understand: > 1. How can there be multiple sentence starts? Multiple end of lines are enough. > 2. Something in my disambiguation.xml makes LT show multiple <S>. But, this > is the FIRST rule. How can rules that come after the first rule affect the > tagging? (The rules "are applied in the order as they appear in the file" > http://wiki.languagetool.org/developing-a-disambiguator .) If there's an end-of-line character, it will be tagged. > > (I think that this is a bug. Probably, I will send more related questions, > but for now, I want to keep things simple and focus only on one thing at a > time.) I'm not sure it is. Please describe in detail your input. Best, Marcin > > Regards, > > Mike Unwalla > Contact: www.techscribe.co.uk/techw/contact.htm ------------------------------------------------------------------------------ The Go Parallel Website, sponsored by Intel - in partnership with Geeknet, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials, tech docs, whitepapers, evaluation guides, and opinion stories. Check out the most recent posts - join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Languagetool-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/languagetool-devel
