[Apertium-stuff] Request for review - Unsupervised weighting of automata patches

Amr Mohamed Hosny Anwar Wed, 14 Aug 2019 14:55:27 -0700

Dear all,

I have sent a mail using my other account and it wasn't approved so kindly find 
the message forwarded.

Thanks,
Amr

---------- Forwarded message ----------
From: Amr Keleg <amr_moha...@live.com>
Date: Aug 12, 2019 4:39 PM
Subject: [PATCH 00/12] Request for review - Unsupervised weighting of automata 
patches
To: nlhow...@gmail.com,tommi.antero.piri...@uni-hamburg.de,fty...@prompsit.com
Cc: apertium-stuff@lists.sourceforge.net

Dear Nick, Flammie, Francis, Apertium Maintainers,

I have done all my best for implementing the weighting scripts.
I need your help in reviewing the patches so that we can merge them into the 
master branch.
Personally, I prefer merging the code in multiple steps (for the benefit of the 
progress feeling).
The project will give a language's morphological analyser a better way for 
ordering the output.

I tried testing the project on languages like Breton and Kazakh but after 
debugging the code,
The AttCompiler seems to be causing some problems related to the final states.
Managing to merge the scripts earlier will give me a better chance to have a 
clear mind fixing bugs
that are limiting the usage of the scripts.

The pull request on Github can be found here: 
https://github.com/apertium/lttoolbox/pull/55

The main code organisation points that needs reviewing/attention are:
* How can we encapsulate pure shell scripts into lttoolbox?
* Should we port vanilla python scripts to shell or C++ scripts?
The supervised script is currently implemented in python.
The constraint grammar script also depends on the supervised one so python is 
one of its dependencies.
I am much more proficient in C++ coding than shell scripting.
* Should apertium-streamparser be used for parsing tagged corpora in the form 
"^surface/analysis1$" instead of re-implementing parsing methods (which is the 
case currently)?
* Can we use vanilla python scripts for models evaluation?
* How to update the automake file so that the weighting scripts can be used as 
all the other lt-* commands?
* How to avoid using the apertium-cleanstream script 
(http://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/apertium-cleanstream/apertium-cleanstream.cc)?
Is there other lightweight alternatives?

As a quick summary for related patches:
* Patches 01-04: Implement a shell script for weighting compiled dictionaries 
using a weighted regexp file.
Additionally, Implement a vanilla python script for generating a weightlist 
using a tagged corpus
* Patches 06-07: Implement two weigtlist generation scripts in an unsupervised 
fashion.
* Patch 08: Implement our evaluation scripts using cross-validation.
* Patch 09: Implement an unsupervised weighting script (doesn't use weightlists 
as an intermediate step)
* Patches 10-12: Update the lt-weight script to make use of multiple weightlist 
files instead of a single one.
Consequently, changes to the weightlist generation scripts is done to comply 
with the lt-weight tweak.

I know it reviewing the patchlist is somehow tedious but your efforts will be a 
great help into getting two or more months of experiments into useful code.

Thanks,
Amr

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] Request for review - Unsupervised weighting of automata patches

Reply via email to