Hi all,
I added a new feature in the unification today: ignoring tokens. Think
of punctuation, adverbs that do not have any gender or number, weird
idiomatic expressions, or connectives. To silently add these to the
unified sequence, simply use:
<unify>
<feature id="gender"/>
<feature id="number"/>
<token>foo</token>
<unify-ignore>
<token>,</token>
</unify-ignore>
<token>foo</token>
</unify>
The comma will be then added without checking whether it agrees with
other tokens (frankly, it cannot, as commas are not inflected). One
particularly interesting use for Polish -- and probably other inflected
languages such as German -- is that it is relatively very easy to
identify noun groups as a sequence of agreeing tokens with a connective
or a comma inside. In other words, this could be used to group tokens,
not only for filtering them (alas, we don't have code for adding chunks
just yet).
I also found and fixed a small bug in the pattern rule code: the
unification rules were broken if there were any tokens after </unify> in
the pattern. Now they are matched correctly.
Regards,
Marcin
------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel