Re: [Apertium-stuff] GSoC apertium ideas

Francis Tyers Fri, 23 Mar 2012 02:38:35 -0700

El dv 23 de 03 de 2012 a les 08:42 +0800, en/na Chris Hokamp va
escriure:
> Hi everyone,
> 
> My name is Chris, and I'm a graduate student in Linguistics and
> Computer Science in the US. I have several ideas for potential
> Apertium projects, so I wanted to bounce them off you and hopefully
> get some feedback. 
> 
> First, regarding the potential adoption of a language pair. It looks
> like there's no German-Turkish (de-tr) pair -- as I am an advanced
> speaker of both of these languages, it seems like creating that pair
> could be a good project.


With any language pair project, I'd encourage the applicant to go
through the coding challenge on the "adopt a language pair" page, and to
answer the four questions there:

(a) Are there existing machine translation (MT) systems for this pair?
(b) If there are existing systems, how good are they? -- Could you do
better in three months? 
(c) How closely related is the pair? 
(d) How many resources already exist for the pair? 

> However, I really want to do something more programming-intensive.
> 
> I think building a module for corpus-based language model learning
> using a Vector Space model with grammatical features could be useful
> and fun. 

Hmm, "corpus-based language model learning" -- what do you mean by
this ? Are you talking about the corpus-based feature transfer ? 

> However, the ideas page suggests that this is needed primarily for
> Romance languages 

If you're talking about the corpus-based feature transfer, then
certainly not. This would be applicable to almost any pair of languages.
The primary pairs I would recommend working with would be
Icelandic-English (for articles) and French-Spanish (for pronouns).

> - although I have good theoretical knowledge, I am not an advanced
> speaker of any Romance languages, so knowing which features in a
> particular language could benefit from language-model feedback might
> be difficult without significant guidance. 

Well, guidance is what we're here for :)

> However, this project could be a great learning experience. 
> 
> Finally, it seems to me that the language model system suggested above
> (especially one using NGram probabilities) could be combined with the
> project suggesting a new module for multiword specification to create
> a system for automatically identifying and tagging multiwords.

I'd be interested in hearing more how thi might work!

> Of course, all of these ideas need refining, but I wanted to put them
> out there to see what you think. Any feedback you have would be great!

This would be good to discuss on IRC. I'd also recommend you install
Apertium, and play with one or more language pairs.

Fran


------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] GSoC apertium ideas

Reply via email to