Hi Apertiumers, > Wasn't there a "separable"-based solution that looked good though?
Besides trimming and not trimming, I would like to suggest a third alternative. As of yesterday, apertium-separable can read and merge multiple source files. I suggest moving MWEs from monodixes to -separable dictionaries in the monolingual repos (this could be more or less completely automated). If this were done, we could apply trimming to the -separable FST rather than the monodix, which would retain the already mentioned benefits of not trimming while still dealing with the problems relating to MWEs. This would also have the benefit of enabling us to decrease redundancy elsewhere, since we could then put separable MWEs (like "take out") in the monolingual repos. The main disadvantage to this approach that I can see is that implementing it gradually would be difficult, though if we had a way to either skip MWEs when compiling a monodix or trim them out at the end (not trim based on bidix, just trim anything with spaces), then this could be implemented in stages. If it would be helpful, I can take some language that isn't used in very many pairs and demonstrate the changes that would be involved in this approach. Daniel On Mon, May 25, 2020 at 11:31 AM Kevin Brubeck Unhammer <unham...@fsfe.org> wrote: > > Flammie A Pirinen <flam...@iki.fi> čálii: > > >> 4. Weighting the monodix will take more compile time than just trimming it. > > > > Some numbers would be interesting, I think both are quite heavy and we > > don't do much further processing in finite-state algebra (/hfst space) > > so the weighted models won't blow up. In any case, people seem to be > > happy in 2020 to wait 70 hours for some neural stuff, few minutes for > > weighted automata won't be too bad ;-) > > One of the main advantages of RBMT is we can quickly fix little things > and see the results nearly right away. When working on nno-nob I often > compile every few minutes. OTOH, nno-nob has good coverage, so I rarely > notice things like missing words causing disambiguation errors. So for > me, a noticable increase in compile time would be enough to make me not > use this. > > But yes, we need numbers. It needs to be implemented and tried before > we can decide if it should be default. > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff