During last GCI, our student Matthew / highrise2357 did a lot of great
work on automatically trimming lttoolbox analysers; that is, ensuring
the analyser only has entries which pass through bidix.

I recently found time to wrap up that work, and it seems to work very
well; trimming the bokmål analyser takes about the same time and memory
as compiling it with lt-comp. The code is currently at
https://github.com/unhammer/lttoolbox/tree/df-intersection – you can
check it out with

    git clone https://github.com/unhammer/lttoolbox.git -b df-intersection 

and compile and make as usual. You can do "man lt-trim" after installing
to see the docs, and tests/run_tests.py for some regression tests.

There is one caveat: <g/> (group element) is not handled yet. The man
page notes how to work around that. I have an idea for handling the
group element[1], but I'm not sure when I'll get to try it out.

If the powers that be accept the code, I can merge this into lttoolbox
(without all the #ifdef DEBUG statements =P).

[1] 
http://wiki.apertium.org/wiki/Talk:Automatically_trimming_a_monodix#.23-type_multiwords

-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C

Attachment: pgpT4ew43_rwV.pgp
Description: PGP signature

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to