Hi there,

as I see it, there is a need in Apertium for most released pairs and
the ones to come: better PoS taggers. In my experience, training
supervised taggers has never been a waste of time but all the
opposite: at the same time we have quality improvement and we are
creating unvaluable linguistic resources such as disambiguated tagged
corpora.

So, how to turn this inot a GSoC idea?

Following the wikipages on how to train a tagger (see below) and
taking into account that supervised training still to be written...
this project would at least involve

0) (must-have) making an interface where you can upload a raw text of,
say, 25.000 words or (optional) create a corpus or X size for a given
language from wikipedia

 and, by choosing a language for which there is at least a
morphological dictionary in Apertium, you have:

1) (must-have) a non-disambiguated tagged corpus
3) (must-have) a .dic file
2) (must-have) a simple fully functional precalculated .tsx file in
which coarse tags defined taking into account the information from the
dic file

then it will also include:

4) (must-have) a user-friendly interface to take your
non-disambiguated tagged corpus and be able to disambiguate it
manually
5) (must-have) a user-friendly documentation on how to improve the tsx
(refine coarse tags, write rules)
6) (must-have) a user-friendly interface to train a supervised tagger
7) (must-have) some way to evaluate performance of a .prob

I'm surely forgetting some must-have and I have to think about it a
little bit more, but, what do you think about the general idea of
having tools to train supervised taggers?

Another important question: I'll not able to technically mentor this
project, so, if no one else is interested...

Best,

Gema.

--------------------
How to train a tagger in Apertium:
http://wiki.apertium.org/wiki/Tagger_training
http://wiki.apertium.org/wiki/Target_language_tagger_training
http://wiki.apertium.org/wiki/Unsupervised_tagger_training

-- 
Gema Ramírez
---------------------
Prompsit LE
Traduce, extrae, analiza: http://aplica.prompsit.com

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to