El 2018-08-03 15:42, Abinash Senapati escribió:
_
Hello developers,
I am a student currently working on the idea EXTEND LTTOOLBOX TO HAVE
THE POWER OF HFST for my GSoC project. So, I am here talk about the
new modifications that are now a part of the lttoolbox and want all of
you to try them out. As a part of my Coding Challenge I have developed
a module that converts the LEXC_ files to the _dix _file format. The
repo for the package is https://github.com/Techievena/lexc2dix. So
these are the set of changes we have in lttoolbox right now.

Currently lttoolbox supports allows weights in the binary files. Here
is a snippet of that.

Thanks Abinash! Excellent work!

What this means is that you can now weight your morphological analysers,
generators and bilingual dictionaries.

Here are some problems that can solve:

1) Having zero-context rules in your .lrx files. Now you can just put the
   weights directly in your bilingual dictionary

$ echo "^estación<n><f><sg>$" | lt-proc -W -b testbidix.bin
^estación<n><f><sg>/season<n><W:1.000000><sg>/station<n><W:1.500000><sg>$

$ echo "^estación<n><f><sg>$" | lt-proc -b testbidix.bin
^estación<n><f><sg>/season<n><sg>/station<n><sg>$

Analyses will be output according to lowest weight first. So you can mark your default translation as "1.0" and then all others as >1.0 ... because of how transfer works, it will always take the first, which will be the one with
the lowest weight.

2) Improving POS-tagging accuracy by ordering analyses by probability. This way if your CG doesn't mop up all the ambiguity, you will get the best remaining analysis. This works kind of like the unigram tagger, but because
   it can be in the analyser itself, it can be easier to control.

3) Dealing with non-standard forms, instead of having to use LR/RL direction restrictions, you can just make non-standard forms have a high weight and ask for lt-proc to only generate the surface form with the lowest weight.

There will no doubt be even more fun stuff that we can do with weights. I for one think it's very exciting and would encourage people to play around
with it and see what they can come up with.

Fran

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to