Hi Apertiumers,

> Wasn't there a "separable"-based solution that looked good though?

Besides trimming and not trimming, I would like to suggest a third alternative.

As of yesterday, apertium-separable can read and merge multiple source
files. I suggest moving MWEs from monodixes to -separable dictionaries
in the monolingual repos (this could be more or less completely
automated).

If this were done, we could apply trimming to the -separable FST
rather than the monodix, which would retain the already mentioned
benefits of not trimming while still dealing with the problems
relating to MWEs.

This would also have the benefit of enabling us to decrease redundancy
elsewhere, since we could then put separable MWEs (like "take out") in
the monolingual repos.

The main disadvantage to this approach that I can see is that
implementing it gradually would be difficult, though if we had a way
to either skip MWEs when compiling a monodix or trim them out at the
end (not trim based on bidix, just trim anything with spaces), then
this could be implemented in stages.

If it would be helpful, I can take some language that isn't used in
very many pairs and demonstrate the changes that would be involved in
this approach.

Daniel

On Mon, May 25, 2020 at 11:31 AM Kevin Brubeck Unhammer
<unham...@fsfe.org> wrote:
>
> Flammie A Pirinen <flam...@iki.fi> čálii:
>
> >> 4. Weighting the monodix will take more compile time than just trimming it.
> >
> > Some numbers would be interesting, I think both are quite heavy and we
> > don't do much further processing in finite-state algebra (/hfst space)
> > so the weighted models won't blow up. In any case, people seem to be
> > happy in 2020 to wait 70 hours for some neural stuff, few minutes for
> > weighted automata won't be too bad ;-)
>
> One of the main advantages of RBMT is we can quickly fix little things
> and see the results nearly right away. When working on nno-nob I often
> compile every few minutes. OTOH, nno-nob has good coverage, so I rarely
> notice things like missing words causing disambiguation errors. So for
> me, a noticable increase in compile time would be enough to make me not
> use this.
>
> But yes, we need numbers. It needs to be implemented and tried before
> we can decide if it should be default.
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to