El dt 01 de 03 de 2011 a les 22:07 +0530, en/na Aish Raj Dahal va escriure: > Hi everyone, > Since the past month or so, I have been studying about Apertium > basically about how it works and learning what maybe considered as the > "baby steps" in the world of linguistics and Machine translation.
Great ! > With this in regard, I have learned some basic stuffs and I wish to > develop Nepali-Hindi language pair,and maybe even make it a Google > Summer of Code Project. But to my great dismay, I have gone through a > previous > GSoC application (http://donchaknow.com/m/doc/gsoc_fin_sme_proposal.pdf) and > found out that one needs to have some work already done over the language > pair so as to build upon it. No need to dismay ! > All that I have found on Nepali and Hindi are listed below: > 1] > http://ltrc.iiit.ac.in/showfile.php?filename=onlineServices/morph/index.htm Converting an IIIT analyser to Unicode/our tagset should be a fairly straightforward job for someone who knows the language. Strange that no-one has managed it so far. > 2] http://www.panl10n.net/english/outputs/Working% > 20Papers/Nepal/Microsoft%20Word%20-%206_OK_N_331.pdf We weren't able to get this released under a suitable licence (GPL). See below. > 3] http://www-users.cs.york.ac.uk/~santa/Nepali_Morpho_LSN.pdf Kind of low on details. > 4] http://nlp.ku.edu.np/cgi-bin/dobhase We sent a letter to the Dobhase people a few years ago, asking if they would free their stuff. We had some early success, but there was some trouble when they started squabbling over licences. Jacob Nordfalk will have more information on this. > This much done, I feel that I still need to doubt my knowledge about > the process and need to ask myself "Where to start from" (PS I have > been through the Add new Language Pair HOW TO). You've been through the New Language Pair HOWTO and didn't ask any questions ? Did you understand _all_ of it ? > I would be really very thankful if I would be given some > review/feedback about the resources that I have collected, and also > some advice regarding how to make my first steps into this area and if > possible eventually into Google Summer of Code. > Thank You The only existing resource that you found with a free licence is the morphological analyser of Hindi. So I suppose the first thing to do would be to convert it to Unicode / a more Apertium standard tagset. http://wiki.apertium.org/wiki/Hindi http://wiki.apertium.org/wiki/WX_notation If you have any questions, we'll be here and on IRC. Fran PS. Kind of surprised you didn't find any of the resources for Hindi and Nepali in the Apertium SVN. ------------------------------------------------------------------------------ Free Software Download: Index, Search & Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
