Dear all, I have sent a mail using my other account and it wasn't approved so kindly find the message forwarded.
Thanks, Amr ---------- Forwarded message ---------- From: Amr Keleg <amr_moha...@live.com> Date: Aug 12, 2019 4:39 PM Subject: [PATCH 00/12] Request for review - Unsupervised weighting of automata patches To: nlhow...@gmail.com,tommi.antero.piri...@uni-hamburg.de,fty...@prompsit.com Cc: apertium-stuff@lists.sourceforge.net Dear Nick, Flammie, Francis, Apertium Maintainers, I have done all my best for implementing the weighting scripts. I need your help in reviewing the patches so that we can merge them into the master branch. Personally, I prefer merging the code in multiple steps (for the benefit of the progress feeling). The project will give a language's morphological analyser a better way for ordering the output. I tried testing the project on languages like Breton and Kazakh but after debugging the code, The AttCompiler seems to be causing some problems related to the final states. Managing to merge the scripts earlier will give me a better chance to have a clear mind fixing bugs that are limiting the usage of the scripts. The pull request on Github can be found here: https://github.com/apertium/lttoolbox/pull/55 The main code organisation points that needs reviewing/attention are: * How can we encapsulate pure shell scripts into lttoolbox? * Should we port vanilla python scripts to shell or C++ scripts? The supervised script is currently implemented in python. The constraint grammar script also depends on the supervised one so python is one of its dependencies. I am much more proficient in C++ coding than shell scripting. * Should apertium-streamparser be used for parsing tagged corpora in the form "^surface/analysis1$" instead of re-implementing parsing methods (which is the case currently)? * Can we use vanilla python scripts for models evaluation? * How to update the automake file so that the weighting scripts can be used as all the other lt-* commands? * How to avoid using the apertium-cleanstream script (http://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/apertium-cleanstream/apertium-cleanstream.cc)? Is there other lightweight alternatives? As a quick summary for related patches: * Patches 01-04: Implement a shell script for weighting compiled dictionaries using a weighted regexp file. Additionally, Implement a vanilla python script for generating a weightlist using a tagged corpus * Patches 06-07: Implement two weigtlist generation scripts in an unsupervised fashion. * Patch 08: Implement our evaluation scripts using cross-validation. * Patch 09: Implement an unsupervised weighting script (doesn't use weightlists as an intermediate step) * Patches 10-12: Update the lt-weight script to make use of multiple weightlist files instead of a single one. Consequently, changes to the weightlist generation scripts is done to comply with the lt-weight tweak. I know it reviewing the patchlist is somehow tedious but your efforts will be a great help into getting two or more months of experiments into useful code. Thanks, Amr
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff