On 2014-04-27 22:18, Dominique Pellé wrote:

> I wish I could check the POS tag of a portion of
> a token.

(Replying to an old thread here...)

I think the new rule filter offers a solution for this that does not 
require any changes to the XML. Mike posted an example where he wanted 
to match in(.*) and un(.*), but only if the matching part is an 
adjective.

It might work with a PartialPosTagFilter (that is yet to be developed) 
like this:

<pattern>
   <token regexp="yes">(in|un).*</token>
<pattern>
<filter class="org.lt.PartialPosTagFilter" args="no:1 
regexp:(?:in|un)(.*) postag_regexp:JJ message:'Approved adjective, but 
not-approved prefix. Use NOT + <adjective>.'"/>

no:1 refers to the relevant token
regexp:(?:in|un)(.*) specifies the part of the token to be checked
postag_regexp:JJ checks whether the matching part of the regex matches 
'JJ' (adjective)
message: the message to be shown if postag_regexp matches; if it 
doesn't, the rule match will be discarded

So as with any rule filter, we have a rule that matches candidates, and 
a filter that keeps only those matches that are actually useful.

What do you think?

Regards
  Daniel


------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to