On 19 March 2013 12:49, Gema Ramírez-Sánchez <[email protected]> wrote: > Hi there, > > as I see it, there is a need in Apertium for most released pairs and > the ones to come: better PoS taggers. In my experience, training > supervised taggers has never been a waste of time but all the > opposite: at the same time we have quality improvement and we are > creating unvaluable linguistic resources such as disambiguated tagged > corpora. > > So, how to turn this inot a GSoC idea? > > Following the wikipages on how to train a tagger (see below) and > taking into account that supervised training still to be written... > this project would at least involve > > 0) (must-have) making an interface where you can upload a raw text of, > say, 25.000 words or (optional) create a corpus or X size for a given > language from wikipedia > > and, by choosing a language for which there is at least a > morphological dictionary in Apertium, you have: > > 1) (must-have) a non-disambiguated tagged corpus > 3) (must-have) a .dic file > 2) (must-have) a simple fully functional precalculated .tsx file in > which coarse tags defined taking into account the information from the > dic file > > then it will also include: > > 4) (must-have) a user-friendly interface to take your > non-disambiguated tagged corpus and be able to disambiguate it > manually > 5) (must-have) a user-friendly documentation on how to improve the tsx > (refine coarse tags, write rules) > 6) (must-have) a user-friendly interface to train a supervised tagger > 7) (must-have) some way to evaluate performance of a .prob > > I'm surely forgetting some must-have and I have to think about it a > little bit more, but, what do you think about the general idea of > having tools to train supervised taggers? > > Another important question: I'll not able to technically mentor this > project, so, if no one else is interested...
Could be interesting. While reading this, I started to wonder if how it would work out if we had an interface for regular users to help in at least partially disambiguating text - give them the translations of all possible outputs (something like this: https://gist.github.com/jimregan/5199780), and remove the translations that are wrong. -- <Sefam> Are any of the mentors around? <jimregan> yes, they're the ones trolling you ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
