[Apertium-stuff] GSOC-2013 Hello Apertiumers

Gang Chen Thu, 18 Apr 2013 00:33:31 -0700

Hi, Apertiumers.

My name is Gang Chen. Currently, I am a 2-nd year postgraduate pursuing
the MS degree, majoring in Natural Language Processing, Peking University,
China.


I have a great interest in applying for working with Apertium in GSOC-2013.

I have been in touch with machine translation since 2 years ago.
For the recent 2 years, I have been interning in a machine translation team
in
an Internet company. I had brought up 2 language pairs (Spanish-Chinese and
Russian-Chinese) into service online, from parallel raw data. We used
statistical
machine translation models. Besides that, I processed large amounts of
bilingual
and monolingual text, for improving translation quality. Some machine
learning
techniques were also used.

However, there seems to be a trend in the statistical MT community to
employ more
linguistic knowledge. Luckily, during my undergraduate years, I had chances
to
participate as a member in developing some rule-based NLP systems, such as
a CFG
parser for Chinese, the construction of a WordNet-like Chinese ontology, a
lexical
similarity software, and  a POS tagging software, etc. I have a great
interest on
the rule-based methods, because it is based on solid linguistic analyses,
and with
the goal of understanding a language.


Among the ideas listed on the idea page, I am mostly attracted by:

1."Corpus-based lexicalised feature transfer"
2."Sliding-window part-of-speech tagger"

I had some experience in processing corpus and implementing an HMM POS
tagger, and
hope that may help.

So far, I have read through the documentation and have a general
understanding of
the build blocks in Apertium. Also I had Apertium installed, but there
seeems to be
a PCRE problem as you guys discussed about these days.


I got a little bit confused about two questions here:

1. Chinese is my native language. It seems that there isn't a Chinese-XXX
pair
in Apertium. Is it because of the following reasons: there is online
translation
services good enough for Chiense? or Chinese is a morphology-poor language
and
thus less related to other languages?

2. How does Apertium look on statistical methods being used in the
platform?
Does Apertium plan more statistical methods to be integrated?



It would be great to have your guidance on how to make further preparations
:-)


Thank you.



Best regards,
Gang Chen

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] GSOC-2013 Hello Apertiumers

Reply via email to