2018-02-17, Jonathan Washington sanoi:

> I don't oppose the idea of adding support for compiling dix-format
> lexicons with HFST and implementing it for existing language modules,
> but we'd still need to process the transducers with lt-proc because
> of the tokenisation bug.¹  If the HFST tokenisation bug were to
> disappear, then what you propose for the long term might make sense.
> In any case, it couldn't hurt to add .dix support in hfst-comp as a
> GSoC project idea—that'd be a good step in the right direction.

Well, there's a hfst-dix-compiler prototype I demoed in
LREC somewhere some year, if this is pitched up as a GSOC project it
must contain a plan for repo rollout to be successful imho. The dix
compiler should be at most a coding challenge.

> Maybe if someone ends up working on this and has extra time, they
> could investigate the tokenisation bug as well.
> 
> ¹ https://sourceforge.net/p/hfst/bugs/59/ seems to suggest that this
> was fixed, and I'm unable to reproduce it at the moment.  I remember
> something about undesired side-effects of the implementation, but I
> don't remember now what those were.

The two things are, there are likely some more corner cases where
hfst-ape-proc just eats random parts of strings wrongly, and the
code-base is a bit unmaintainable. I use that for fin-x though since
apertium-fin used to be uncompilable for lt-proc format, the bugs are
reasonably minor not to be problematic for my personal experimentation.

-- 
Flammie, computer scientist bachelor + linguist master = computational
linguist doctor, free software Finnish localiser,
and more! <http://www.iki.fi/flammie/>



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to