El 2018-08-03 15:42, Abinash Senapati escribió:
_
Hello developers,
I am a student currently working on the idea EXTEND LTTOOLBOX TO HAVE
THE POWER OF HFST for my GSoC project. So, I am here talk about the
new modifications that are now a part of the lttoolbox and want all of
you to try them out. As a part of my Coding Challenge I have developed
a module that converts the LEXC_ files to the _dix _file format. The
repo for the package is https://github.com/Techievena/lexc2dix. So
these are the set of changes we have in lttoolbox right now.
Currently lttoolbox supports allows weights in the binary files. Here
is a snippet of that.
Thanks Abinash! Excellent work!
What this means is that you can now weight your morphological analysers,
generators and bilingual dictionaries.
Here are some problems that can solve:
1) Having zero-context rules in your .lrx files. Now you can just put
the
weights directly in your bilingual dictionary
$ echo "^estación<n><f><sg>$" | lt-proc -W -b testbidix.bin
^estación<n><f><sg>/season<n><W:1.000000><sg>/station<n><W:1.500000><sg>$
$ echo "^estación<n><f><sg>$" | lt-proc -b testbidix.bin
^estación<n><f><sg>/season<n><sg>/station<n><sg>$
Analyses will be output according to lowest weight first. So you can
mark your
default translation as "1.0" and then all others as >1.0 ... because of
how
transfer works, it will always take the first, which will be the one
with
the lowest weight.
2) Improving POS-tagging accuracy by ordering analyses by probability.
This
way if your CG doesn't mop up all the ambiguity, you will get the
best
remaining analysis. This works kind of like the unigram tagger, but
because
it can be in the analyser itself, it can be easier to control.
3) Dealing with non-standard forms, instead of having to use LR/RL
direction
restrictions, you can just make non-standard forms have a high weight
and
ask for lt-proc to only generate the surface form with the lowest
weight.
There will no doubt be even more fun stuff that we can do with weights.
I
for one think it's very exciting and would encourage people to play
around
with it and see what they can come up with.
Fran
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff