Hi, I wanted to let anyone using HFST tools for morphology know about a tool that I have been preparing as the last phase of my GSoC project. It is hfst-fst2strings and plays a role similar to lt-expand, in that it provides a dump of transductions recognized by a transducer. It is aware of and can evaluate and filter flag diacritics and has the ability to filter transducer paths to include only those with a surface or output form shorter than a give length and/or only those with a surface or output form matching a given prefix. For example, one could use it like:
hfst-fst2strings -ef -c 0 -P <lemma> <transducer> to extract strings whose analysis begins with <lemma> while evaluating and stripping out any flag diacritics. Results are less than ideal for transducers that produce compounds, however the "-l" and "-L" flags for limiting the input/output strings to a specified length can help deal with that, and I have a couple other ideas that would be simple to implement and make the tool even more useful. A caveat is that the tool is a part of HFST3 (only available from SVN) which does not yet have the lexc and twol tools, so the package as a whole is not yet ready to simply replace the current toolchain for those producing transducers that way. --Brian Croom
------------------------------------------------------------------------------ This SF.net email is sponsored by Make an app they can't live without Enter the BlackBerry Developer Challenge http://p.sf.net/sfu/RIM-dev2dev
_______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
