Hello there I am also intrested in this project. And I am also taking the cource through the courcera.org. Please let me know how can we start this project. I am very eager on this and want to start asap.
On Thu, Apr 12, 2012 at 03:48, Rajesh Pandey <[email protected]>wrote: > @pravin: Thanks Pravin. > I am extracting them and creating files such as : > > http://code.google.com/p/nepaliwikipediatranslator/source/browse/trunk/NepaliWikiPediaTranslator/bin/Debug/nounlist.txt > from this. I will be extracting Nepali/Hindi and English texts. > > @Rakesh: You are welcome. Those NLP classes are great. I wish to get some > contributors for the translator, convert it to a web application, and host > somewhere. Lets see. > > On Thu, Apr 12, 2012 at 1:55 PM, rakesh bachchan < > [email protected]> wrote: > >> Well I am interested in this and will be happy to find myself in the >> group. I am also taking the online class of NLP currently being run by >> Stanford university(coursera.org). >> >> Thanking you >> Rakesh Kumar Bachchan >> >> ------------------------------ >> *From:* pravin joshi <[email protected]> >> *To:* FOSS Nepal <[email protected]> >> *Sent:* Thursday, 12 April 2012 6:53 AM >> *Subject:* [FOSS Nepal] Re: Natural language processing (Nepali) >> >> Just saw this mail thread. anyway below is Python code to extract all >> nepali words from the example of text you gave. >> # -*- coding: utf-8-*- >> >> data = """ >> <page> >> [[en:Apple]] >> [[ne:स्याउ]] >> [[new:स्याउ]] >> [[hi:सेव]] >> [[fr:????]] >> </page> >> """ >> def get_next_target(data): >> start_link = data.find('[[ne:') >> if start_link == -1: >> return None, 0 >> start_quote = data.find('[[ne:', start_link) >> end_quote = data.find(']]', start_quote + 1) >> nepWord = data[start_quote + 1:end_quote] >> nepWord = nepWord.split(":")[-1] >> return nepWord, end_quote >> >> def get_all_nepData(data): >> links = [] >> while True: >> url, endpos = get_next_target(data) >> if url: >> links.append(url) >> data = data[endpos:] >> else: >> break >> return links >> >> if __name__ == "__main__": >> t = get_all_nepData(data)-- >> for i in t: >> print i >> >> Regarding autocomplete and word suggestion you might want to look at >> Bayes Theorem and using bulk text. You might want to read this paper >> thoroughly --- http://norvig.com/spell-correct.html >> >> Pravin >> >> On Apr 11, 10:00 am, Rajesh Pandey <[email protected]> wrote: >> > Hi Folks, >> > *"If any one of you are interested in this please reply, so that we >> could >> > work in this. "* >> > >> > I am interested to make a group of few people who would be interested in >> > data mining. If you are already involved in nlp-class.org. that would >> be >> > great as well. >> > Not to be confused with the word "data mining", The only thing we would >> do >> > is extract Nepali words from wiktionary database >> > dump<http://dumps.wikimedia.org/backup-index.html>where we would >> > extract Nepali words and save them so that they could be >> > used for various purposes. >> > For instance: >> > 1) Autocomplete >> > 2) Nepali corpus >> > 3) Nepali translator >> > >> > How "Autocomplete" works is providing suggestions while we start >> typing, if >> > we have a list of words, we can provide suggestions for the users. >> > >> > The Nepali corpus, which contains words which are tagged as "Noun", >> > "Adjective" etc can be created. I wish to use them in one of the "open >> > source translator for >> > Nepali<http://code.google.com/p/nepaliwikipediatranslator>" >> > in which I am also involved in. >> > >> > The database dump of Wiktionary has an XML file which contains a lot of >> > words and their English equivalents along with equivalents in other >> > available languages. >> > >> > For instance : There would be >> > <page> >> > [[en:Apple]] >> > [[ne:स्याउ]] >> > [[new:स्याउ]] >> > [[hi:सेव]] >> > [[fr:????]] >> > </page> >> > >> > etc >> > So we need to extract स्याउ and Apple or a list of स्याउ, केरा , >> सुन्तला in >> > a file. So that we could suggest स्याउ when a user starts typing स or >> > suggest केरा when a user starts writing क . This is autocomplete. >> > >> > When we have स्याउ and Apple, we will have a Nepali translator as well. >> > >> > ================== >> > Sorry for the ambiguous subject: Natural language processing: I could >> have >> > added a more specific title, or "Data mining" would have been another >> > subject. Thanks for your patience in reading this email :). >> > ====================== >> > Want to create a web based php/python/java application [Nepali >> translator] >> > based on code.google.com/p/nepaliwikipediatranslator ?, You are >> welcome. >> > (Not .Net, because we already have a lot of stuff in .NET, and we are >> > looking for .net alternatives so that we could use them in Linux easily) >> > ====================== >> > -- >> > Rajesh Pandey >> >> -- >> FOSS Nepal mailing list: [email protected] >> http://groups.google.com/group/foss-nepal >> To unsubscribe, e-mail: [email protected] >> >> Mailing List Guidelines: >> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines >> Community website: http://www.fossnepal.org/ >> >> >> -- >> FOSS Nepal mailing list: [email protected] >> http://groups.google.com/group/foss-nepal >> To unsubscribe, e-mail: [email protected] >> >> Mailing List Guidelines: >> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines >> Community website: http://www.fossnepal.org/ >> > > > > -- > Rajesh Pandey > > -- > FOSS Nepal mailing list: [email protected] > http://groups.google.com/group/foss-nepal > To unsubscribe, e-mail: [email protected] > > Mailing List Guidelines: > http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines > Community website: http://www.fossnepal.org/ > -- Regards, Bishisht Bhatta Pepsicola Townplanning-35 Kathmandu,Nepal +977-(980-641-6309) +977-(981-352-7344) +977-(984-984-9525) ****************************************************************************************** Freelance Programmer Cheap Webhosting and Webdesign. Software Development and Maintenance. ****************************************************************************************** Computer Engineering Student, Nepal College of Information Technology http://www.ncit.net.np/ Balkumari, Lalitpur ****************************************************************************************** Volunteer at Nepal Wireless Networking Project Plesae visit http://www.nepalwireless.net/ http://himanchal.org/ -- FOSS Nepal mailing list: [email protected] http://groups.google.com/group/foss-nepal To unsubscribe, e-mail: [email protected] Mailing List Guidelines: http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines Community website: http://www.fossnepal.org/
