Hi
I wish I could check the POS tag of a portion of
a token.
For example, in a Breton word such as "ez-c'hlas", I wish
I could check the POS tag of "c'hlas" in XML rules.
I don't think that's currently possible, unless:
- I write a Java rule
- or I change the tokenizer to split on hyphen - but
I'd rather not do that because many words have a
hyphen in them, which would make it harder or
impossible to write other XML rules. And sometimes
being to check POS tag may be useful even on
part of token not delimited by hyphens anyway.
What XML syntax to use?
I'm thinking of something like this for example:
<token regexp="yes" postag_group1="foo">ez-(.*)</token>
This would check that what matches in the first
group of the regexp, captured here by (.*), has
the postag "foo". So given a token "ez-c'hlas", it
would check that "c'hlas" has POS tag "foo".
Would that also be useful in other languages?
Regards
Dominique
------------------------------------------------------------------------------
Start Your Social Network Today - Download eXo Platform
Build your Enterprise Intranet with eXo Platform Software
Java Based Open Source Intranet - Social, Extensible, Cloud Ready
Get Started Now And Turn Your Intranet Into A Collaboration Platform
http://p.sf.net/sfu/ExoPlatform
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel