Hello, I would love to work on 1.7 to implement weighted transfer rules on a new language pair, hopefully, Hindi-Sanskrit pair. Could someone guide me on how to get started?
- Shivanshu On Mon, Jan 28, 2019 at 10:43 PM Francis Tyers <fty...@prompsit.com> wrote: > Here is my run-down on the current GSOC ideas page: > > 1.1 Anaphora resolution for machine translation > > Nice project idea, but not sure in 3 months. > > 1.2 Bring a released language pair up to state-of-the-art quality > > Always needed > > 1.3 Robust tokenisation in lttoolbox > > Up for grabs, we need this > > 1.4 Adopt an unreleased language pair > > Always needed > > 1.5 Extend lttoolbox to have the power of HFST > > I think getting this one is unlikely and requires more than 3 months. > > 1.6 Robust recursive transfer > > Keep, this would be really great. I got asked to run a workshop on > Apertium > recently and then unasked when they found out that the formalisms > didn't > actually create parse trees :) > > 1.7 Extend weighted transfer rules > > There is ongoing work in this, it would need to be supervised carefully: > > https://github.com/sevilaybayatli/apertium-ambiguous > > I would say a nice project would be to really use this on a new language > pair > > 1.8 Improvements to the Apertium website > > Not sure > > 1.9 User-friendly lexical selection training > > I think getting this one is unlikely and requires more than 3 months. > Also has > been tried several times without luck. > > 1.10 Light alternative format for all XML files in an Apertium > language pair > > I'm not sure about this one. > > 1.11 Bilingual dictionary enrichment via graph completion > > There is code for this, it was a GSOC project last year but wasn't > merged, I'm > not sure how well it works. > > 1.12 UD and Apertium integration > > This is a very useful project. If we can take advantage of UD corpora we > can > make supervised taggers for around 70% of our languages. > > 1.13 Add weights to lttoolbox > > This was done last year. A nice project would be to actually make use of > it. > > 1.14 Improving language pairs mining Mediawiki Content Translation > postedits > 1.15 Unsupervised weighting of automata > > Open > > 1.16 Improvements to UD Annotatrix > > This is a really useful tool. > > 1.17 apertium-separable language-pair integration > > Agree, but I think that it should not just be apertium-separable, but > perhaps > something like "upgrade a language pair to use all the latest apertium > tricks" > > 1.18 Create FST-based module for disambiguating > > I like this idea, but I'm not sure three months is enough time, without > someone > who really knows what they are doing with both the FST library and > apertium. > > 1.19 Python API/library for Apertium > > This was mostly done right? I think this is still a really important > project > > 1.20 TIPP functionality for Apertium > > Not sure > > There is a lot of functionality that is not used widely that could be > really > used to improve performance of language pairs. > > * apertium-separable > * weights in lttoolbox > * weighted transfer > > Fran > > > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff