sorry I was away You can start by getting the code from code.google.com/p/nepaliwikipediatranslator
On Fri, Apr 13, 2012 at 8:56 PM, Bishisht Bhatta <[email protected]>wrote: > Can u provide me some starter points. Where to start I dont know. Sorry. > But I want to do it.Please let me know asap. :D M damn happy to start this > bro. > > > On Fri, Apr 13, 2012 at 08:14, Rajesh Pandey <[email protected]>wrote: > >> This is great. >> You can start right away. Just let me know if you need any help to start >> with. I really need few people to join in the project so that we could work >> together. Right now I want to have someone start translating the web >> version of the translator into php/perl/python/java code. >> >> >> On Fri, Apr 13, 2012 at 1:22 PM, Bishisht Bhatta < >> [email protected]> wrote: >> >>> Hello there I am also intrested in this project. And I am also taking >>> the cource through the courcera.org. Please let me know how can we >>> start this project. I am very eager on this and want to start asap. >>> >>> >>> On Thu, Apr 12, 2012 at 03:48, Rajesh Pandey <[email protected]>wrote: >>> >>>> @pravin: Thanks Pravin. >>>> I am extracting them and creating files such as : >>>> >>>> http://code.google.com/p/nepaliwikipediatranslator/source/browse/trunk/NepaliWikiPediaTranslator/bin/Debug/nounlist.txt >>>> from this. I will be extracting Nepali/Hindi and English texts. >>>> >>>> @Rakesh: You are welcome. Those NLP classes are great. I wish to get >>>> some contributors for the translator, convert it to a web application, and >>>> host somewhere. Lets see. >>>> >>>> On Thu, Apr 12, 2012 at 1:55 PM, rakesh bachchan < >>>> [email protected]> wrote: >>>> >>>>> Well I am interested in this and will be happy to find myself in the >>>>> group. I am also taking the online class of NLP currently being run by >>>>> Stanford university(coursera.org). >>>>> >>>>> Thanking you >>>>> Rakesh Kumar Bachchan >>>>> >>>>> ------------------------------ >>>>> *From:* pravin joshi <[email protected]> >>>>> *To:* FOSS Nepal <[email protected]> >>>>> *Sent:* Thursday, 12 April 2012 6:53 AM >>>>> *Subject:* [FOSS Nepal] Re: Natural language processing (Nepali) >>>>> >>>>> Just saw this mail thread. anyway below is Python code to extract all >>>>> nepali words from the example of text you gave. >>>>> # -*- coding: utf-8-*- >>>>> >>>>> data = """ >>>>> <page> >>>>> [[en:Apple]] >>>>> [[ne:स्याउ]] >>>>> [[new:स्याउ]] >>>>> [[hi:सेव]] >>>>> [[fr:????]] >>>>> </page> >>>>> """ >>>>> def get_next_target(data): >>>>> start_link = data.find('[[ne:') >>>>> if start_link == -1: >>>>> return None, 0 >>>>> start_quote = data.find('[[ne:', start_link) >>>>> end_quote = data.find(']]', start_quote + 1) >>>>> nepWord = data[start_quote + 1:end_quote] >>>>> nepWord = nepWord.split(":")[-1] >>>>> return nepWord, end_quote >>>>> >>>>> def get_all_nepData(data): >>>>> links = [] >>>>> while True: >>>>> url, endpos = get_next_target(data) >>>>> if url: >>>>> links.append(url) >>>>> data = data[endpos:] >>>>> else: >>>>> break >>>>> return links >>>>> >>>>> if __name__ == "__main__": >>>>> t = get_all_nepData(data)-- >>>>> for i in t: >>>>> print i >>>>> >>>>> Regarding autocomplete and word suggestion you might want to look at >>>>> Bayes Theorem and using bulk text. You might want to read this paper >>>>> thoroughly --- http://norvig.com/spell-correct.html >>>>> >>>>> Pravin >>>>> >>>>> On Apr 11, 10:00 am, Rajesh Pandey <[email protected]> wrote: >>>>> > Hi Folks, >>>>> > *"If any one of you are interested in this please reply, so that we >>>>> could >>>>> > work in this. "* >>>>> > >>>>> > I am interested to make a group of few people who would be >>>>> interested in >>>>> > data mining. If you are already involved in nlp-class.org. that >>>>> would be >>>>> > great as well. >>>>> > Not to be confused with the word "data mining", The only thing we >>>>> would do >>>>> > is extract Nepali words from wiktionary database >>>>> > dump<http://dumps.wikimedia.org/backup-index.html>where we would >>>>> > extract Nepali words and save them so that they could be >>>>> > used for various purposes. >>>>> > For instance: >>>>> > 1) Autocomplete >>>>> > 2) Nepali corpus >>>>> > 3) Nepali translator >>>>> > >>>>> > How "Autocomplete" works is providing suggestions while we start >>>>> typing, if >>>>> > we have a list of words, we can provide suggestions for the users. >>>>> > >>>>> > The Nepali corpus, which contains words which are tagged as "Noun", >>>>> > "Adjective" etc can be created. I wish to use them in one of the >>>>> "open >>>>> > source translator for >>>>> > Nepali<http://code.google.com/p/nepaliwikipediatranslator>" >>>>> > in which I am also involved in. >>>>> > >>>>> > The database dump of Wiktionary has an XML file which contains a lot >>>>> of >>>>> > words and their English equivalents along with equivalents in other >>>>> > available languages. >>>>> > >>>>> > For instance : There would be >>>>> > <page> >>>>> > [[en:Apple]] >>>>> > [[ne:स्याउ]] >>>>> > [[new:स्याउ]] >>>>> > [[hi:सेव]] >>>>> > [[fr:????]] >>>>> > </page> >>>>> > >>>>> > etc >>>>> > So we need to extract स्याउ and Apple or a list of स्याउ, केरा , >>>>> सुन्तला in >>>>> > a file. So that we could suggest स्याउ when a user starts typing स >>>>> or >>>>> > suggest केरा when a user starts writing क . This is autocomplete. >>>>> > >>>>> > When we have स्याउ and Apple, we will have a Nepali translator as >>>>> well. >>>>> > >>>>> > ================== >>>>> > Sorry for the ambiguous subject: Natural language processing: I >>>>> could have >>>>> > added a more specific title, or "Data mining" would have been another >>>>> > subject. Thanks for your patience in reading this email :). >>>>> > ====================== >>>>> > Want to create a web based php/python/java application [Nepali >>>>> translator] >>>>> > based on code.google.com/p/nepaliwikipediatranslator ?, You are >>>>> welcome. >>>>> > (Not .Net, because we already have a lot of stuff in .NET, and we are >>>>> > looking for .net alternatives so that we could use them in Linux >>>>> easily) >>>>> > ====================== >>>>> > -- >>>>> > Rajesh Pandey >>>>> >>>>> -- >>>>> FOSS Nepal mailing list: [email protected] >>>>> http://groups.google.com/group/foss-nepal >>>>> To unsubscribe, e-mail: [email protected] >>>>> >>>>> Mailing List Guidelines: >>>>> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines >>>>> Community website: http://www.fossnepal.org/ >>>>> >>>>> >>>>> -- >>>>> FOSS Nepal mailing list: [email protected] >>>>> http://groups.google.com/group/foss-nepal >>>>> To unsubscribe, e-mail: [email protected] >>>>> >>>>> Mailing List Guidelines: >>>>> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines >>>>> Community website: http://www.fossnepal.org/ >>>>> >>>> >>>> >>>> >>>> -- >>>> Rajesh Pandey >>>> >>>> -- >>>> FOSS Nepal mailing list: [email protected] >>>> http://groups.google.com/group/foss-nepal >>>> To unsubscribe, e-mail: [email protected] >>>> >>>> Mailing List Guidelines: >>>> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines >>>> Community website: http://www.fossnepal.org/ >>>> >>> >>> >>> >>> -- >>> Regards, >>> Bishisht Bhatta >>> Pepsicola Townplanning-35 >>> Kathmandu,Nepal >>> +977-(980-641-6309) >>> +977-(981-352-7344) >>> +977-(984-984-9525) >>> >>> >>> ****************************************************************************************** >>> Freelance Programmer >>> Cheap Webhosting and Webdesign. >>> Software Development and Maintenance. >>> >>> ****************************************************************************************** >>> Computer Engineering Student, Nepal College of Information Technology >>> http://www.ncit.net.np/ >>> Balkumari, Lalitpur >>> >>> >>> ****************************************************************************************** >>> Volunteer at Nepal Wireless Networking Project >>> Plesae visit >>> http://www.nepalwireless.net/ >>> http://himanchal.org/ >>> >>> >>> -- >>> FOSS Nepal mailing list: [email protected] >>> http://groups.google.com/group/foss-nepal >>> To unsubscribe, e-mail: [email protected] >>> >>> Mailing List Guidelines: >>> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines >>> Community website: http://www.fossnepal.org/ >>> >> >> >> >> -- >> Rajesh Pandey >> >> -- >> FOSS Nepal mailing list: [email protected] >> http://groups.google.com/group/foss-nepal >> To unsubscribe, e-mail: [email protected] >> >> Mailing List Guidelines: >> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines >> Community website: http://www.fossnepal.org/ >> > > > > -- > Regards, > Bishisht Bhatta > Pepsicola Townplanning-35 > Kathmandu,Nepal > +977-(980-641-6309) > +977-(981-352-7344) > +977-(984-984-9525) > > > ****************************************************************************************** > Freelance Programmer > Cheap Webhosting and Webdesign. > Software Development and Maintenance. > > ****************************************************************************************** > Computer Engineering Student, Nepal College of Information Technology > http://www.ncit.net.np/ > Balkumari, Lalitpur > > > ****************************************************************************************** > Volunteer at Nepal Wireless Networking Project > Plesae visit > http://www.nepalwireless.net/ > http://himanchal.org/ > > -- > FOSS Nepal mailing list: [email protected] > http://groups.google.com/group/foss-nepal > To unsubscribe, e-mail: [email protected] > > Mailing List Guidelines: > http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines > Community website: http://www.fossnepal.org/ > -- Rajesh Pandey -- FOSS Nepal mailing list: [email protected] http://groups.google.com/group/foss-nepal To unsubscribe, e-mail: [email protected] Mailing List Guidelines: http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines Community website: http://www.fossnepal.org/
