2018-02-17, Jonathan Washington sanoi: > I don't oppose the idea of adding support for compiling dix-format > lexicons with HFST and implementing it for existing language modules, > but we'd still need to process the transducers with lt-proc because > of the tokenisation bug.¹ If the HFST tokenisation bug were to > disappear, then what you propose for the long term might make sense. > In any case, it couldn't hurt to add .dix support in hfst-comp as a > GSoC project idea—that'd be a good step in the right direction.
Well, there's a hfst-dix-compiler prototype I demoed in LREC somewhere some year, if this is pitched up as a GSOC project it must contain a plan for repo rollout to be successful imho. The dix compiler should be at most a coding challenge. > Maybe if someone ends up working on this and has extra time, they > could investigate the tokenisation bug as well. > > ¹ https://sourceforge.net/p/hfst/bugs/59/ seems to suggest that this > was fixed, and I'm unable to reproduce it at the moment. I remember > something about undesired side-effects of the implementation, but I > don't remember now what those were. The two things are, there are likely some more corner cases where hfst-ape-proc just eats random parts of strings wrongly, and the code-base is a bit unmaintainable. I use that for fin-x though since apertium-fin used to be uncompilable for lt-proc format, the bugs are reasonably minor not to be problematic for my personal experimentation. -- Flammie, computer scientist bachelor + linguist master = computational linguist doctor, free software Finnish localiser, and more! <http://www.iki.fi/flammie/> ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff