Here is a copy of my GSOC proposal. I would love to get some feedback on
some changes I can make.
Name: Sachin Shastri
E-mail address:[email protected]
Other information that may be useful to contact you: Contact No:
+917760118343 Irc nick: sachinsurfs in #apertium at irc.freenode.net
*Why is it you are interested in machine translation? *
I am a computer science student hailing from Bangalore,South-India
which is known for its rich cultural diversity(8+ languages are spoken
in my college alone). Due to my background, and my dual-interest in
computer science, computational linguistic was a subject I always
wanted to pursue. Initially, it was the thought of a machine being to
translate a very uncommon language like tulu fascinated me. Hence
since a young age,I have been studying on Machine translation and
always felt I have a strong connection to this subject.I got my first
actual basic formal education on this subject when our college
included a special course on Finite Automata and formal Languages in
our syllabus( I am proud to say, that PESIT (the college I am enrolled
in) is the only college in the entire state which offers this subject
for 2nd year students). After this I took a course on Natural language
processing in course era,and my inerest in this subject has been
elevating since.
W*hy is it that you are interested in the Apertium project?*
I am interestd in Apertium foremost because it is open-source(and
free). Secondly due to the fact that I have always had a lot of
interest in machine translation. I always wanted to do a project under
a MT organization and so Apertium was instantly one of my first choice
in the list of organization for gsoc.Although I was immensely
interested in working for Apertium, I was intially worried that I
might not have the specific knowledge in MT required for doing
a project here(Not confusing interest with knowledge) , but after
finding a project that is exactly right for my skill level, I knew
this was the right organization for me.
*Which of the published tasks are you interested in? What do you plan to do?*
Everything above being said, I would like to take up the task of
*Integrating Apertium in various chat clients.*
Telegram - I plan to integrate the Apertium web service(using scaleMT
based on Apertium Scalable service) to Telegram(using the source
provided on github).(This probably will include the usual tasks of
issueing the HTTP requests, parsing the JSON result strings,using
Async task,etc).
Xchat & pidgin- Make plugins that will interface the
machine-translation system .(I will most probably make use of the
python scripting interface provided in both these chat clients)
(Suggestion - I could might as well do more plugins on other popular
chat clients like adium and finch, if I am going to be making use of
the libpurple libraries)
*Reason for choosing the selected task over other tasks-*
The reason I have chosen this task over others(Adopting a language
pair, rule-based finite-state disambiguation,etc) is that, although I
have considerable knowledge of(and lot of interest in) computational
linguistics and constraint grammer, I don't have that much experience
in them as much as some of my prospective peers.(I am not discrediting
myself, I am just appriciating the knowledge other people have w.r.t
that) In fact, I have learned a huge deal from the IRC channel and
while waiting for the coding challenge.(Will take up task of adopting
a language pair next year, when I am ready) However, I have had years
of experience with Java, Python and android and so I felt this task is
right for me.(This way I get to also work with Apertium)
*Proposal Title* - *Mathrubhasha *
(Hindi for mother tongue)
*Reasons why Google and Apertium should sponsor it-*
The number of Pidgin users was estimated to be over 3 million in 2007.
The number has been growing at a steady rate since. The number of
Xchat users is also increasing at a good rate. Hence, there are a
large number of users using these chat clients everyday. Also,
Telegram logged 5 million downloads in one day following WhatsApp
sale. So this would be a prime-time to integrate apertium to these
chat clients, and thereby adding a powerful tool to these messengers
which will not only help popularize machine translation platforms but
will also help in better communication and make these chat client more
user friendly.
(Since Pidgin is derived from "pidgin language" , having a machine
translation tool inorder to break the language barrier between people
makes a lot of sense).
*A description of how and who it will benefit in society-*
There are 10+ million users using atleast one of these chat clients
*everyday*. Though English is a universal language,a majority of the
people are mostly comfortable in their own mother tongue. This tool
will help this majority help express themselves better.(which is a
highly desired quality in chat clients). Also this will help breaking
the language barrier which will help different user from different
parts of the world communicate with each other effectively.
It will mainly help the rural and urban communities(esp. in India)
since, many here know how to operate a computer and a phone, but don't
know English.
Although the other projects(like developing language pairs) benifit
specific societies, the number of people in those societies who are
benifited is small, unless the end result of the project is made use
in day to day situations. Since chat messengers have become quite
common and has become almost one of the prime means of communication,
it will be benificial to majority of people in different societies.
This is especially true for mobile messaging app, since people carry
their phones everywhere. Last but not the least , due to the same
reason, it will also help in awareness and expansion of the open souce
communities since Apertium and all the chat clients are all
open-source softwares.
Work plan
*WEEK*
*TASK*
Pre-week 1-4
Getting to know mentor better.
Analyzing the source code of the different chat clients and Apertium.
Forecast some of the more common constraints, that I will face and decide
plan of action for these.
Collect necessary information, read documentations, do in-depth research ,
analyze and completely prepare for starting the project.
Get the source code and reading more on Apertium-caffeine (Which will give
a much better idea for making the plugins)
Week 1
Start with Telegram. Use the available source code, make modifications in
manifests for integration for Apertium web service avaiable.
Week 2
Work on the code , while make use of already present API's ike the JSON
REST API ( for issueing http requests, parsing,etc)
Week 3
Final UI work including making for use of AsyncTask for the threads and
finally Debugging. (If everything goes well, I will have Apertium ready in
Telegram during this time)
Week 4
More Debugging and making any changes required.
Deliverable #1
Integration of Apertium with Telegram
Week 5
Start making the plugin for Xchat. Start writiting scripts.
Week 6
Coding. Working on inertface.
Week 7
More Coding and debugging.(If everything goes right, I will have plugin
code ready by this time)
Week 8
Debugging and making of makefiles and config files for easy compilation.
Start working on plugin for Pidgin.
Deliverable #2
Apertium plugin for Xchat Chat Client
Week 9
Continue working on plugin for Pidgin.
Week 10
Coding.
Week 11
Debugging.
Deliverable #3
Apertium plugin for Pidgin Chat Client
Week 12
First 5 days: Extra time, In case I come across some major issue.
Last 2 days : Final Presentation .
Post-week
Tidying up.
*Important dates* : April 22nd- Commencing work on the project
May 19th - Commencing work on the project
June 16th - Deliverable #1
July 12th - Deliverable #2
~August 6th- Deliverable #3
August 10th -Project completition
August 18th-22th - Project Evaluation
*Time commitments:*
Preweek 1-3- 3-5 hours per day(I have my Semester End Exams then which ends
by 4th week)
Preweek 4 - atleast 12 hours per day
Week 1-3: 7-9 hours per day
Week 5-7 :7-9 hours per day
Week 9-10 :12 hours per day(Since I am alloting relatively less time for
this part)
Week 4,8,11:10-12 hours per day (Since debugging usually takes the most
amount of time)
Post week- 4-6 hours per day(Since my summer holidays end at this time)
*List your skills and give evidence of your qualifications*.
I am currently doing my B.Tech(Branch- Computer science and
engineering) in PES,Institute of technology, Bangalore.
Programming Skills related to this project: C, Java, Python,Xml, JavaScripts
I have taken up various courses on Java including advance data
structures and Algorthim design. I been working on android app
development for few years now and I have I have taken up few android
project(I can provide scanned copy of certificates). I have worked on
application integration and web services before. I have done my
research on the different chat clients and feel this project is
do-able with only knowledge of python(and maybe javascripts) as
scripting language. Also, from the coding challenge, I have now a good
idea on making plugins(esp for pidgin and xchat) and writing scripts
for these clients. Hence, I think this project can be done by me, for
the alloted time.
(I am also fluent in 6+ natural languages, although I doubt if that
will be of much use in the specific task i have chosen )
Coding Challenge: In progress. Expected to complete before deadline.
Link
:https://github.com/sachinsurfs/apertium-code-challenge-chat-client-plugins.git
*Previous experience in open-source project*: No.:( However, I am
currently working on a open-source cloud-benchmarking tool, making use
of apache-geronimo and daytrader which benchmarks the performance of
public,private and hybrid clouds, and gives values, while changing
various parameters like no of clients and WLAN speed.
*List any non-Summer-of-Code plans you have for the Summer-*
None. Therefore I am ready to devote entire 12 weeks on this porject.
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff