Here's a timing test for weighted dictionaries.
On apertium-eng-kaz:

1. lt-trim analyser.bin bidix.bin analyser-found.bin
Time:

real 0m4.257s

user 0m4.120s

sys 0m0.131s


2.

lt-trim analyser.bin bidix.bin analyser-found.bin

lt-print -H analyser.bin > analyser.att

lt-print -H analyser-found.bin > analyser-found.att

hfst-txt2fst -e ε analyser.att -o analyser.hfst

hfst-txt2fst -e ε analyser-found.att -o analyser-found.hfst

hfst-subtract -1 analyser.hfst -2 analyser-found.hfst -o
analyser-unfound.hfst

hfst-reweight -a 1 analyser-unfound.hfst -o analyser-unfound.weighted.hfst

hfst-union -1 analyser-unfound.weighted.hfst -2 analyser-found.hfst -o
analyser.weighted.hfst

hfst-fst2txt analyser.weighted.hfst -o analyser.weighted.att

lt-comp lr analyser.weighted.att analyser.weighted.bin


Time:

real 0m7.990s

user 0m7.227s

sys 0m0.730s


Tanmai

On Mon, May 25, 2020 at 10:58 PM Samuel Sloniker <scoopgra...@gmail.com>
wrote:

> Maybe make trimming the default, but make apertium-init disable it for new
> pairs?
>
> On Mon, May 25, 2020, 10:01 Tino Didriksen <m...@tinodidriksen.com> wrote:
>
>> On Mon, 25 May 2020 at 12:29, Xavi Ivars <xavi.iv...@gmail.com> wrote:
>>
>>> * In the trimming disadvantages number 1, we're stating that we're OK
>>> having crappy monodixes because we *fix* that later on with trimming.
>>> I'm sure that's where we are now, but as a project that focuses a lot on
>>> provided free (as in speech) language resources that are later used for
>>> many other use cases, I don't feel comfortable with that status. I think we
>>> should aim to have as correct as possible dictionaries. And if we did that,
>>> disadvantage number 1 would be smaller (even if not disappearing
>>> completely).
>>>
>>
>> This is critically important, in my opinion. Languages should be
>> stand-alone and widely usable for many purposes. As I wrote on IRC, this a
>> luxury problem. If the source analysis is bad, bloody well fix it so that
>> all pairs, spell checker, and corpus work can take advantage. Don't let it
>> remain a task for the pairs.
>>
>> The fact that trimming via bidix and target monodix is currently needed
>> is a historical accident. It should not be something developers rely on
>> going forward, and especially not for new pairs.
>>
>> -- Tino Didriksen
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>


-- 
*Khanna, Tanmai*
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to