Hi there, as I see it, there is a need in Apertium for most released pairs and the ones to come: better PoS taggers. In my experience, training supervised taggers has never been a waste of time but all the opposite: at the same time we have quality improvement and we are creating unvaluable linguistic resources such as disambiguated tagged corpora.
So, how to turn this inot a GSoC idea? Following the wikipages on how to train a tagger (see below) and taking into account that supervised training still to be written... this project would at least involve 0) (must-have) making an interface where you can upload a raw text of, say, 25.000 words or (optional) create a corpus or X size for a given language from wikipedia and, by choosing a language for which there is at least a morphological dictionary in Apertium, you have: 1) (must-have) a non-disambiguated tagged corpus 3) (must-have) a .dic file 2) (must-have) a simple fully functional precalculated .tsx file in which coarse tags defined taking into account the information from the dic file then it will also include: 4) (must-have) a user-friendly interface to take your non-disambiguated tagged corpus and be able to disambiguate it manually 5) (must-have) a user-friendly documentation on how to improve the tsx (refine coarse tags, write rules) 6) (must-have) a user-friendly interface to train a supervised tagger 7) (must-have) some way to evaluate performance of a .prob I'm surely forgetting some must-have and I have to think about it a little bit more, but, what do you think about the general idea of having tools to train supervised taggers? Another important question: I'll not able to technically mentor this project, so, if no one else is interested... Best, Gema. -------------------- How to train a tagger in Apertium: http://wiki.apertium.org/wiki/Tagger_training http://wiki.apertium.org/wiki/Target_language_tagger_training http://wiki.apertium.org/wiki/Unsupervised_tagger_training -- Gema Ramírez --------------------- Prompsit LE Traduce, extrae, analiza: http://aplica.prompsit.com ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
