Can u provide me some starter points. Where to start I dont know. Sorry. But I want to do it.Please let me know asap. :D M damn happy to start this bro.
On Fri, Apr 13, 2012 at 08:14, Rajesh Pandey <[email protected]>wrote: > This is great. > You can start right away. Just let me know if you need any help to start > with. I really need few people to join in the project so that we could work > together. Right now I want to have someone start translating the web > version of the translator into php/perl/python/java code. > > > On Fri, Apr 13, 2012 at 1:22 PM, Bishisht Bhatta < > [email protected]> wrote: > >> Hello there I am also intrested in this project. And I am also taking the >> cource through the courcera.org. Please let me know how can we start >> this project. I am very eager on this and want to start asap. >> >> >> On Thu, Apr 12, 2012 at 03:48, Rajesh Pandey <[email protected]>wrote: >> >>> @pravin: Thanks Pravin. >>> I am extracting them and creating files such as : >>> >>> http://code.google.com/p/nepaliwikipediatranslator/source/browse/trunk/NepaliWikiPediaTranslator/bin/Debug/nounlist.txt >>> from this. I will be extracting Nepali/Hindi and English texts. >>> >>> @Rakesh: You are welcome. Those NLP classes are great. I wish to get >>> some contributors for the translator, convert it to a web application, and >>> host somewhere. Lets see. >>> >>> On Thu, Apr 12, 2012 at 1:55 PM, rakesh bachchan < >>> [email protected]> wrote: >>> >>>> Well I am interested in this and will be happy to find myself in the >>>> group. I am also taking the online class of NLP currently being run by >>>> Stanford university(coursera.org). >>>> >>>> Thanking you >>>> Rakesh Kumar Bachchan >>>> >>>> ------------------------------ >>>> *From:* pravin joshi <[email protected]> >>>> *To:* FOSS Nepal <[email protected]> >>>> *Sent:* Thursday, 12 April 2012 6:53 AM >>>> *Subject:* [FOSS Nepal] Re: Natural language processing (Nepali) >>>> >>>> Just saw this mail thread. anyway below is Python code to extract all >>>> nepali words from the example of text you gave. >>>> # -*- coding: utf-8-*- >>>> >>>> data = """ >>>> <page> >>>> [[en:Apple]] >>>> [[ne:स्याउ]] >>>> [[new:स्याउ]] >>>> [[hi:सेव]] >>>> [[fr:????]] >>>> </page> >>>> """ >>>> def get_next_target(data): >>>> start_link = data.find('[[ne:') >>>> if start_link == -1: >>>> return None, 0 >>>> start_quote = data.find('[[ne:', start_link) >>>> end_quote = data.find(']]', start_quote + 1) >>>> nepWord = data[start_quote + 1:end_quote] >>>> nepWord = nepWord.split(":")[-1] >>>> return nepWord, end_quote >>>> >>>> def get_all_nepData(data): >>>> links = [] >>>> while True: >>>> url, endpos = get_next_target(data) >>>> if url: >>>> links.append(url) >>>> data = data[endpos:] >>>> else: >>>> break >>>> return links >>>> >>>> if __name__ == "__main__": >>>> t = get_all_nepData(data)-- >>>> for i in t: >>>> print i >>>> >>>> Regarding autocomplete and word suggestion you might want to look at >>>> Bayes Theorem and using bulk text. You might want to read this paper >>>> thoroughly --- http://norvig.com/spell-correct.html >>>> >>>> Pravin >>>> >>>> On Apr 11, 10:00 am, Rajesh Pandey <[email protected]> wrote: >>>> > Hi Folks, >>>> > *"If any one of you are interested in this please reply, so that we >>>> could >>>> > work in this. "* >>>> > >>>> > I am interested to make a group of few people who would be interested >>>> in >>>> > data mining. If you are already involved in nlp-class.org. that >>>> would be >>>> > great as well. >>>> > Not to be confused with the word "data mining", The only thing we >>>> would do >>>> > is extract Nepali words from wiktionary database >>>> > dump<http://dumps.wikimedia.org/backup-index.html>where we would >>>> > extract Nepali words and save them so that they could be >>>> > used for various purposes. >>>> > For instance: >>>> > 1) Autocomplete >>>> > 2) Nepali corpus >>>> > 3) Nepali translator >>>> > >>>> > How "Autocomplete" works is providing suggestions while we start >>>> typing, if >>>> > we have a list of words, we can provide suggestions for the users. >>>> > >>>> > The Nepali corpus, which contains words which are tagged as "Noun", >>>> > "Adjective" etc can be created. I wish to use them in one of the "open >>>> > source translator for >>>> > Nepali<http://code.google.com/p/nepaliwikipediatranslator>" >>>> > in which I am also involved in. >>>> > >>>> > The database dump of Wiktionary has an XML file which contains a lot >>>> of >>>> > words and their English equivalents along with equivalents in other >>>> > available languages. >>>> > >>>> > For instance : There would be >>>> > <page> >>>> > [[en:Apple]] >>>> > [[ne:स्याउ]] >>>> > [[new:स्याउ]] >>>> > [[hi:सेव]] >>>> > [[fr:????]] >>>> > </page> >>>> > >>>> > etc >>>> > So we need to extract स्याउ and Apple or a list of स्याउ, केरा , >>>> सुन्तला in >>>> > a file. So that we could suggest स्याउ when a user starts typing स or >>>> > suggest केरा when a user starts writing क . This is autocomplete. >>>> > >>>> > When we have स्याउ and Apple, we will have a Nepali translator as >>>> well. >>>> > >>>> > ================== >>>> > Sorry for the ambiguous subject: Natural language processing: I could >>>> have >>>> > added a more specific title, or "Data mining" would have been another >>>> > subject. Thanks for your patience in reading this email :). >>>> > ====================== >>>> > Want to create a web based php/python/java application [Nepali >>>> translator] >>>> > based on code.google.com/p/nepaliwikipediatranslator ?, You are >>>> welcome. >>>> > (Not .Net, because we already have a lot of stuff in .NET, and we are >>>> > looking for .net alternatives so that we could use them in Linux >>>> easily) >>>> > ====================== >>>> > -- >>>> > Rajesh Pandey >>>> >>>> -- >>>> FOSS Nepal mailing list: [email protected] >>>> http://groups.google.com/group/foss-nepal >>>> To unsubscribe, e-mail: [email protected] >>>> >>>> Mailing List Guidelines: >>>> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines >>>> Community website: http://www.fossnepal.org/ >>>> >>>> >>>> -- >>>> FOSS Nepal mailing list: [email protected] >>>> http://groups.google.com/group/foss-nepal >>>> To unsubscribe, e-mail: [email protected] >>>> >>>> Mailing List Guidelines: >>>> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines >>>> Community website: http://www.fossnepal.org/ >>>> >>> >>> >>> >>> -- >>> Rajesh Pandey >>> >>> -- >>> FOSS Nepal mailing list: [email protected] >>> http://groups.google.com/group/foss-nepal >>> To unsubscribe, e-mail: [email protected] >>> >>> Mailing List Guidelines: >>> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines >>> Community website: http://www.fossnepal.org/ >>> >> >> >> >> -- >> Regards, >> Bishisht Bhatta >> Pepsicola Townplanning-35 >> Kathmandu,Nepal >> +977-(980-641-6309) >> +977-(981-352-7344) >> +977-(984-984-9525) >> >> >> ****************************************************************************************** >> Freelance Programmer >> Cheap Webhosting and Webdesign. >> Software Development and Maintenance. >> >> ****************************************************************************************** >> Computer Engineering Student, Nepal College of Information Technology >> http://www.ncit.net.np/ >> Balkumari, Lalitpur >> >> >> ****************************************************************************************** >> Volunteer at Nepal Wireless Networking Project >> Plesae visit >> http://www.nepalwireless.net/ >> http://himanchal.org/ >> >> >> -- >> FOSS Nepal mailing list: [email protected] >> http://groups.google.com/group/foss-nepal >> To unsubscribe, e-mail: [email protected] >> >> Mailing List Guidelines: >> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines >> Community website: http://www.fossnepal.org/ >> > > > > -- > Rajesh Pandey > > -- > FOSS Nepal mailing list: [email protected] > http://groups.google.com/group/foss-nepal > To unsubscribe, e-mail: [email protected] > > Mailing List Guidelines: > http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines > Community website: http://www.fossnepal.org/ > -- Regards, Bishisht Bhatta Pepsicola Townplanning-35 Kathmandu,Nepal +977-(980-641-6309) +977-(981-352-7344) +977-(984-984-9525) ****************************************************************************************** Freelance Programmer Cheap Webhosting and Webdesign. Software Development and Maintenance. ****************************************************************************************** Computer Engineering Student, Nepal College of Information Technology http://www.ncit.net.np/ Balkumari, Lalitpur ****************************************************************************************** Volunteer at Nepal Wireless Networking Project Plesae visit http://www.nepalwireless.net/ http://himanchal.org/ -- FOSS Nepal mailing list: [email protected] http://groups.google.com/group/foss-nepal To unsubscribe, e-mail: [email protected] Mailing List Guidelines: http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines Community website: http://www.fossnepal.org/
