Re: Suggestion: find POS tag of portion of a word in XML rules

Dominique Pellé Tue, 09 Sep 2014 13:40:12 -0700

Daniel Naber <daniel.na...@languagetool.org> wrote:

> On 2014-04-27 22:18, Dominique Pellé wrote:
>
>> I wish I could check the POS tag of a portion of
>> a token.
>
> (Replying to an old thread here...)
>
> I think the new rule filter offers a solution for this that does not
> require any changes to the XML. Mike posted an example where he wanted
> to match in(.*) and un(.*), but only if the matching part is an
> adjective.
>
> It might work with a PartialPosTagFilter (that is yet to be developed)
> like this:
>
> <pattern>
>    <token regexp="yes">(in|un).*</token>
> <pattern>
> <filter class="org.lt.PartialPosTagFilter" args="no:1
> regexp:(?:in|un)(.*) postag_regexp:JJ message:'Approved adjective, but
> not-approved prefix. Use NOT + <adjective>.'"/>
>
> no:1 refers to the relevant token
> regexp:(?:in|un)(.*) specifies the part of the token to be checked
> postag_regexp:JJ checks whether the matching part of the regex matches
> 'JJ' (adjective)
> message: the message to be shown if postag_regexp matches; if it
> doesn't, the rule match will be discarded
>
> So as with any rule filter, we have a rule that matches candidates, and
> a filter that keeps only those matches that are actually useful.
>
> What do you think?



Hi Daniel

Thanks for replying on this old thread.  I would will like
to have this feature somehow for French and Breton
at least.  But I did not understand well how it would work
with the <filter class=...> proposal:

* why does your example give a message in
  the java rule.  Why can't we use <message…></message>
  instead?

* you wrote that args="no:1" refers to the token.
  What about if we need to use this for one of the
   <exception>...</exception> inside a token?

In fact, in one of my earlier message in this thread,
I had an example where I wanted to match POS tag
of an portion of a token as well as in a portion of the
exception:

<pattern>
  <token regexp="yes" postag_group1="V.*" postag_group1_regexp="yes">(.*)-tu
     <exception><token regexp="yes" postag_group1="V.* 2 .*"
postag_group1_regexp="yes">(.*)-tu</exception>
  </token>
</pattern>

In other words, the rule matches token "(.*)-tu"  where
the POS of portion in parentheses has to be a verb (V.*).
But there is an exception if the POS of partion in parenthesis
matches "V.* 2 .*". So that rule would correctly:

* signal "Peut-tu" as an error.  (since "Peut" has postag
  "V ind pres 3 s" which matches the token but not the
  exception).
* ignore "Peux-tu" since it's correct  ("Peux" has postag
  "V ind pres 2 s" which matches the token and exception)

Of course the syntax was rejected at he time (which I
can understand as introducing postag_group1 is not nice),
but it illustrates the fact that we'd need to allow such a
mechanism not only for tokens, but also for exceptions.

Maybe Marcin proposal is better.  But I did not fully
understand it either. Marcin wrote:

<token postag="V.*" postag_regexp='yes'><match no="3"
regexp_match="(.*)-tu" regexp_replace="$1" setpos="yes"/>[whatever you
want here]</token>

What is [whatever you want here] meant to be?
And why no="3"?

I think I need a concrete example to make sure I understand.
If I'm guessing correctly, the French rule that I proposed above
would be written using Marcin proposal as:

<token postag="V.*" postag_regexp="yes">
  <match no="1" regexp_match="(.*)-tu" regexp_replace="$1" setpos="yes"/>
  <exception postag="V.* 2 .*" postag_regexp="yes">
    <match no="1" regexp_match="(.*)-tu" regexp_replace="$1" setpos="yes"/>
  </exception>
</token>

It looks a tad hairy and I'm not sure I understood correctly the proposals.

Regards
Dominique

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: Suggestion: find POS tag of portion of a word in XML rules

Reply via email to