+1

Write it up, Gema! ;-)

You'll mentor it with a co-mentor (!)  I can easily think of a couple 
names....

Mikel

Al 03/19/2013 01:49 PM, En/na Gema Ramírez-Sánchez ha escrit:
> Hi there,
>
> as I see it, there is a need in Apertium for most released pairs and
> the ones to come: better PoS taggers. In my experience, training
> supervised taggers has never been a waste of time but all the
> opposite: at the same time we have quality improvement and we are
> creating unvaluable linguistic resources such as disambiguated tagged
> corpora.
>
> So, how to turn this inot a GSoC idea?
>
> Following the wikipages on how to train a tagger (see below) and
> taking into account that supervised training still to be written...
> this project would at least involve
>
> 0) (must-have) making an interface where you can upload a raw text of,
> say, 25.000 words or (optional) create a corpus or X size for a given
> language from wikipedia
>
>   and, by choosing a language for which there is at least a
> morphological dictionary in Apertium, you have:
>
> 1) (must-have) a non-disambiguated tagged corpus
> 3) (must-have) a .dic file
> 2) (must-have) a simple fully functional precalculated .tsx file in
> which coarse tags defined taking into account the information from the
> dic file
>
> then it will also include:
>
> 4) (must-have) a user-friendly interface to take your
> non-disambiguated tagged corpus and be able to disambiguate it
> manually
> 5) (must-have) a user-friendly documentation on how to improve the tsx
> (refine coarse tags, write rules)
> 6) (must-have) a user-friendly interface to train a supervised tagger
> 7) (must-have) some way to evaluate performance of a .prob
>
> I'm surely forgetting some must-have and I have to think about it a
> little bit more, but, what do you think about the general idea of
> having tools to train supervised taggers?
>
> Another important question: I'll not able to technically mentor this
> project, so, if no one else is interested...
>
> Best,
>
> Gema.
>
> --------------------
> How to train a tagger in Apertium:
> http://wiki.apertium.org/wiki/Tagger_training
> http://wiki.apertium.org/wiki/Target_language_tagger_training
> http://wiki.apertium.org/wiki/Unsupervised_tagger_training
>


-- 
Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
Departament de Llenguatges i Sistemes Informàtics
Universitat d'Alacant
E-03071 Alacant, Spain
Phone: +34 96 590 9776
Fax: +34 96 590 9326


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to