I've added it to the ideas page, if anyone would like to expand on it,
the "read more" page is here:

http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Interface_for_creating_tagged_corpora

And the idea is at the bottom of the main ideas page.

Fran

El dc 20 de 03 de 2013 a les 21:49 +0000, en/na Francis Tyers va
escriure:
> I also like the idea! Especially if we can have an optional integration
> of CG to allow people to write rules to tag the corpus -- if they so
> wish. In the end we win both ways: Those who are looking for a tagged
> corpus for training the tagger get it, and those who would also like
> constraint rules get them too.
> 
> I'll try writing it up now. :)
> 
> Fran
> 
> El dt 19 de 03 de 2013 a les 20:19 +0100, en/na Mikel Forcada va
> escriure:
> > +1
> > 
> > Write it up, Gema! ;-)
> > 
> > You'll mentor it with a co-mentor (!)  I can easily think of a couple 
> > names....
> > 
> > Mikel
> > 
> > Al 03/19/2013 01:49 PM, En/na Gema Ramírez-Sánchez ha escrit:
> > > Hi there,
> > >
> > > as I see it, there is a need in Apertium for most released pairs and
> > > the ones to come: better PoS taggers. In my experience, training
> > > supervised taggers has never been a waste of time but all the
> > > opposite: at the same time we have quality improvement and we are
> > > creating unvaluable linguistic resources such as disambiguated tagged
> > > corpora.
> > >
> > > So, how to turn this inot a GSoC idea?
> > >
> > > Following the wikipages on how to train a tagger (see below) and
> > > taking into account that supervised training still to be written...
> > > this project would at least involve
> > >
> > > 0) (must-have) making an interface where you can upload a raw text of,
> > > say, 25.000 words or (optional) create a corpus or X size for a given
> > > language from wikipedia
> > >
> > >   and, by choosing a language for which there is at least a
> > > morphological dictionary in Apertium, you have:
> > >
> > > 1) (must-have) a non-disambiguated tagged corpus
> > > 3) (must-have) a .dic file
> > > 2) (must-have) a simple fully functional precalculated .tsx file in
> > > which coarse tags defined taking into account the information from the
> > > dic file
> > >
> > > then it will also include:
> > >
> > > 4) (must-have) a user-friendly interface to take your
> > > non-disambiguated tagged corpus and be able to disambiguate it
> > > manually
> > > 5) (must-have) a user-friendly documentation on how to improve the tsx
> > > (refine coarse tags, write rules)
> > > 6) (must-have) a user-friendly interface to train a supervised tagger
> > > 7) (must-have) some way to evaluate performance of a .prob
> > >
> > > I'm surely forgetting some must-have and I have to think about it a
> > > little bit more, but, what do you think about the general idea of
> > > having tools to train supervised taggers?
> > >
> > > Another important question: I'll not able to technically mentor this
> > > project, so, if no one else is interested...
> > >
> > > Best,
> > >
> > > Gema.
> > >
> > > --------------------
> > > How to train a tagger in Apertium:
> > > http://wiki.apertium.org/wiki/Tagger_training
> > > http://wiki.apertium.org/wiki/Target_language_tagger_training
> > > http://wiki.apertium.org/wiki/Unsupervised_tagger_training
> > >
> > 
> > 
> 
> 
> 
> 
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_mar
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff




------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to