El dv 23 de 03 de 2012 a les 08:42 +0800, en/na Chris Hokamp va escriure: > Hi everyone, > > My name is Chris, and I'm a graduate student in Linguistics and > Computer Science in the US. I have several ideas for potential > Apertium projects, so I wanted to bounce them off you and hopefully > get some feedback. > > First, regarding the potential adoption of a language pair. It looks > like there's no German-Turkish (de-tr) pair -- as I am an advanced > speaker of both of these languages, it seems like creating that pair > could be a good project.
With any language pair project, I'd encourage the applicant to go through the coding challenge on the "adopt a language pair" page, and to answer the four questions there: (a) Are there existing machine translation (MT) systems for this pair? (b) If there are existing systems, how good are they? -- Could you do better in three months? (c) How closely related is the pair? (d) How many resources already exist for the pair? > However, I really want to do something more programming-intensive. > > I think building a module for corpus-based language model learning > using a Vector Space model with grammatical features could be useful > and fun. Hmm, "corpus-based language model learning" -- what do you mean by this ? Are you talking about the corpus-based feature transfer ? > However, the ideas page suggests that this is needed primarily for > Romance languages If you're talking about the corpus-based feature transfer, then certainly not. This would be applicable to almost any pair of languages. The primary pairs I would recommend working with would be Icelandic-English (for articles) and French-Spanish (for pronouns). > - although I have good theoretical knowledge, I am not an advanced > speaker of any Romance languages, so knowing which features in a > particular language could benefit from language-model feedback might > be difficult without significant guidance. Well, guidance is what we're here for :) > However, this project could be a great learning experience. > > Finally, it seems to me that the language model system suggested above > (especially one using NGram probabilities) could be combined with the > project suggesting a new module for multiword specification to create > a system for automatically identifying and tagging multiwords. I'd be interested in hearing more how thi might work! > Of course, all of these ideas need refining, but I wanted to put them > out there to see what you think. Any feedback you have would be great! This would be good to discuss on IRC. I'd also recommend you install Apertium, and play with one or more language pairs. Fran ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
