During last GCI, our student Matthew / highrise2357 did a lot of great work on automatically trimming lttoolbox analysers; that is, ensuring the analyser only has entries which pass through bidix.
I recently found time to wrap up that work, and it seems to work very well; trimming the bokmål analyser takes about the same time and memory as compiling it with lt-comp. The code is currently at https://github.com/unhammer/lttoolbox/tree/df-intersection – you can check it out with git clone https://github.com/unhammer/lttoolbox.git -b df-intersection and compile and make as usual. You can do "man lt-trim" after installing to see the docs, and tests/run_tests.py for some regression tests. There is one caveat: <g/> (group element) is not handled yet. The man page notes how to work around that. I have an idea for handling the group element[1], but I'm not sure when I'll get to try it out. If the powers that be accept the code, I can merge this into lttoolbox (without all the #ifdef DEBUG statements =P). [1] http://wiki.apertium.org/wiki/Talk:Automatically_trimming_a_monodix#.23-type_multiwords -- Kevin Brubeck Unhammer GPG: 0x766AC60C
pgpT4ew43_rwV.pgp
Description: PGP signature
------------------------------------------------------------------------------ Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
