Here's a timing test for weighted dictionaries. On apertium-eng-kaz: 1. lt-trim analyser.bin bidix.bin analyser-found.bin Time:
real 0m4.257s user 0m4.120s sys 0m0.131s 2. lt-trim analyser.bin bidix.bin analyser-found.bin lt-print -H analyser.bin > analyser.att lt-print -H analyser-found.bin > analyser-found.att hfst-txt2fst -e ε analyser.att -o analyser.hfst hfst-txt2fst -e ε analyser-found.att -o analyser-found.hfst hfst-subtract -1 analyser.hfst -2 analyser-found.hfst -o analyser-unfound.hfst hfst-reweight -a 1 analyser-unfound.hfst -o analyser-unfound.weighted.hfst hfst-union -1 analyser-unfound.weighted.hfst -2 analyser-found.hfst -o analyser.weighted.hfst hfst-fst2txt analyser.weighted.hfst -o analyser.weighted.att lt-comp lr analyser.weighted.att analyser.weighted.bin Time: real 0m7.990s user 0m7.227s sys 0m0.730s Tanmai On Mon, May 25, 2020 at 10:58 PM Samuel Sloniker <scoopgra...@gmail.com> wrote: > Maybe make trimming the default, but make apertium-init disable it for new > pairs? > > On Mon, May 25, 2020, 10:01 Tino Didriksen <m...@tinodidriksen.com> wrote: > >> On Mon, 25 May 2020 at 12:29, Xavi Ivars <xavi.iv...@gmail.com> wrote: >> >>> * In the trimming disadvantages number 1, we're stating that we're OK >>> having crappy monodixes because we *fix* that later on with trimming. >>> I'm sure that's where we are now, but as a project that focuses a lot on >>> provided free (as in speech) language resources that are later used for >>> many other use cases, I don't feel comfortable with that status. I think we >>> should aim to have as correct as possible dictionaries. And if we did that, >>> disadvantage number 1 would be smaller (even if not disappearing >>> completely). >>> >> >> This is critically important, in my opinion. Languages should be >> stand-alone and widely usable for many purposes. As I wrote on IRC, this a >> luxury problem. If the source analysis is bad, bloody well fix it so that >> all pairs, spell checker, and corpus work can take advantage. Don't let it >> remain a task for the pairs. >> >> The fact that trimming via bidix and target monodix is currently needed >> is a historical accident. It should not be something developers rely on >> going forward, and especially not for new pairs. >> >> -- Tino Didriksen >> _______________________________________________ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > -- *Khanna, Tanmai*
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff