Hi all, I added a new feature in the unification today: ignoring tokens. Think of punctuation, adverbs that do not have any gender or number, weird idiomatic expressions, or connectives. To silently add these to the unified sequence, simply use:
<unify> <feature id="gender"/> <feature id="number"/> <token>foo</token> <unify-ignore> <token>,</token> </unify-ignore> <token>foo</token> </unify> The comma will be then added without checking whether it agrees with other tokens (frankly, it cannot, as commas are not inflected). One particularly interesting use for Polish -- and probably other inflected languages such as German -- is that it is relatively very easy to identify noun groups as a sequence of agreeing tokens with a connective or a comma inside. In other words, this could be used to group tokens, not only for filtering them (alas, we don't have code for adding chunks just yet). I also found and fixed a small bug in the pattern rule code: the unification rules were broken if there were any tokens after </unify> in the pattern. Now they are matched correctly. Regards, Marcin ------------------------------------------------------------------------------ Flow-based real-time traffic analytics software. Cisco certified tool. Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer Customize your own dashboards, set traffic alerts and generate reports. Network behavioral analysis & security monitoring. All-in-one tool. http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel