El 2019-07-21 22:50, Amr Mohamed Hosny Anwar escribió:
Dear Francis, Nick, Tommi,
Hope this mail finds you well.
I would like to share with the blog posts that I have used to document
the project's progress.
Firstly, The scores for the implemented methods that are computed using
a custom script
(https://github.com/apertium/lttoolbox/pull/55/files#diff-4791d142daa5e6d636af9488c64ef69a)
can be found here https://ak-blog.herokuapp.com/posts/7/
Secondly, I have done my best searching for relevant publications
related to keywords such as: Morphological Disambiguation.
All the methods are supervised in one way or another.
I have documented my notes for the list of relevant publications here:
https://ak-blog.herokuapp.com/posts/9/
Finally, I have made some tweaks to the supervised model and
implemented
a model based on the analyses length.
The model seems to be equivalent to the one that assigns the same
weight
to all the analyses and I believe this is a result of the way the
lt-proc command works.
You can check my explanation/findings here:
https://ak-blog.herokuapp.com/posts/10/
Looking forward to reading your advice on how to proceed with the
project.
Additionally, Do you think we can make use of a parallel corpus for two
languages in some way or another?
I know a parallel corpus is also somehow supervised but my intuition is
that finding/developing parallel corpora is easier than
finding/developing a tagged corpus.
Note: The blog is hosted using heroku as a free host so the first time
you access a page might take some time to actually load :)
How about using BPE to weight the possible analyses?
e.g.
1) BPE will give you a segmentation it likes for a word,
"arabasız>lar>da"
2) analyser will give you various segmentations:
araba>sız>lar>da, arabasız>lar>da
3) you weight the segmentations that disagree with BPE higher for each
boundary
that isn't predicted by BPE
F.
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff