El dt 01 de 03 de 2011 a les 22:07 +0530, en/na Aish Raj Dahal va
escriure:
> Hi everyone,
> Since the past month or so, I have been studying about Apertium
> basically about how it works and learning what maybe considered as the
> "baby steps" in the world of linguistics and Machine translation.

Great !

> With this in regard, I have learned some basic stuffs and I wish to
> develop Nepali-Hindi language pair,and maybe even make it a Google
> Summer of Code Project. But to my great dismay, I have gone through a
> previous
> GSoC application (http://donchaknow.com/m/doc/gsoc_fin_sme_proposal.pdf) and 
> found out that one needs to have some work already done over the language 
> pair so as to build upon it. 

No need to dismay ! 

> All that I have found on Nepali and Hindi are listed below:
> 1]
> http://ltrc.iiit.ac.in/showfile.php?filename=onlineServices/morph/index.htm

Converting an IIIT analyser to Unicode/our tagset should be a fairly
straightforward job for someone who knows the language. Strange that
no-one has managed it so far.

> 2] http://www.panl10n.net/english/outputs/Working%
> 20Papers/Nepal/Microsoft%20Word%20-%206_OK_N_331.pdf

We weren't able to get this released under a suitable licence (GPL). See
below.

> 3]  http://www-users.cs.york.ac.uk/~santa/Nepali_Morpho_LSN.pdf

Kind of low on details.

> 4] http://nlp.ku.edu.np/cgi-bin/dobhase

We sent a letter to the Dobhase people a few years ago, asking if they
would free their stuff. We had some early success, but there was some
trouble when they started squabbling over licences. Jacob Nordfalk will
have more information on this.

> This much done, I feel that I still need to doubt my knowledge about
> the process and need to ask myself "Where to start from" (PS I have
> been through the Add new Language Pair HOW TO).

You've been through the New Language Pair HOWTO and didn't ask any
questions ? Did you understand _all_ of it ?

> I would be really very thankful if I would be given some
> review/feedback about the resources that I have collected, and also
> some advice regarding how to make my first steps into this area and if
> possible eventually into Google Summer of Code.
> Thank You  

The only existing resource that you found with a free licence is the
morphological analyser of Hindi. So I suppose the first thing to do
would be to convert it to Unicode / a more Apertium standard tagset.

 http://wiki.apertium.org/wiki/Hindi
 http://wiki.apertium.org/wiki/WX_notation

If you have any questions, we'll be here and on IRC.

Fran

PS. Kind of surprised you didn't find any of the resources for Hindi and
Nepali in the Apertium SVN.


------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to