@pravin: Thanks Pravin. I am extracting them and creating files such as : http://code.google.com/p/nepaliwikipediatranslator/source/browse/trunk/NepaliWikiPediaTranslator/bin/Debug/nounlist.txt from this. I will be extracting Nepali/Hindi and English texts.
@Rakesh: You are welcome. Those NLP classes are great. I wish to get some contributors for the translator, convert it to a web application, and host somewhere. Lets see. On Thu, Apr 12, 2012 at 1:55 PM, rakesh bachchan <[email protected] > wrote: > Well I am interested in this and will be happy to find myself in the > group. I am also taking the online class of NLP currently being run by > Stanford university(coursera.org). > > Thanking you > Rakesh Kumar Bachchan > > ------------------------------ > *From:* pravin joshi <[email protected]> > *To:* FOSS Nepal <[email protected]> > *Sent:* Thursday, 12 April 2012 6:53 AM > *Subject:* [FOSS Nepal] Re: Natural language processing (Nepali) > > Just saw this mail thread. anyway below is Python code to extract all > nepali words from the example of text you gave. > # -*- coding: utf-8-*- > > data = """ > <page> > [[en:Apple]] > [[ne:स्याउ]] > [[new:स्याउ]] > [[hi:सेव]] > [[fr:????]] > </page> > """ > def get_next_target(data): > start_link = data.find('[[ne:') > if start_link == -1: > return None, 0 > start_quote = data.find('[[ne:', start_link) > end_quote = data.find(']]', start_quote + 1) > nepWord = data[start_quote + 1:end_quote] > nepWord = nepWord.split(":")[-1] > return nepWord, end_quote > > def get_all_nepData(data): > links = [] > while True: > url, endpos = get_next_target(data) > if url: > links.append(url) > data = data[endpos:] > else: > break > return links > > if __name__ == "__main__": > t = get_all_nepData(data)-- > for i in t: > print i > > Regarding autocomplete and word suggestion you might want to look at > Bayes Theorem and using bulk text. You might want to read this paper > thoroughly --- http://norvig.com/spell-correct.html > > Pravin > > On Apr 11, 10:00 am, Rajesh Pandey <[email protected]> wrote: > > Hi Folks, > > *"If any one of you are interested in this please reply, so that we could > > work in this. "* > > > > I am interested to make a group of few people who would be interested in > > data mining. If you are already involved in nlp-class.org. that would be > > great as well. > > Not to be confused with the word "data mining", The only thing we would > do > > is extract Nepali words from wiktionary database > > dump<http://dumps.wikimedia.org/backup-index.html>where we would > > extract Nepali words and save them so that they could be > > used for various purposes. > > For instance: > > 1) Autocomplete > > 2) Nepali corpus > > 3) Nepali translator > > > > How "Autocomplete" works is providing suggestions while we start typing, > if > > we have a list of words, we can provide suggestions for the users. > > > > The Nepali corpus, which contains words which are tagged as "Noun", > > "Adjective" etc can be created. I wish to use them in one of the "open > > source translator for > > Nepali<http://code.google.com/p/nepaliwikipediatranslator>" > > in which I am also involved in. > > > > The database dump of Wiktionary has an XML file which contains a lot of > > words and their English equivalents along with equivalents in other > > available languages. > > > > For instance : There would be > > <page> > > [[en:Apple]] > > [[ne:स्याउ]] > > [[new:स्याउ]] > > [[hi:सेव]] > > [[fr:????]] > > </page> > > > > etc > > So we need to extract स्याउ and Apple or a list of स्याउ, केरा , सुन्तला > in > > a file. So that we could suggest स्याउ when a user starts typing स or > > suggest केरा when a user starts writing क . This is autocomplete. > > > > When we have स्याउ and Apple, we will have a Nepali translator as well. > > > > ================== > > Sorry for the ambiguous subject: Natural language processing: I could > have > > added a more specific title, or "Data mining" would have been another > > subject. Thanks for your patience in reading this email :). > > ====================== > > Want to create a web based php/python/java application [Nepali > translator] > > based on code.google.com/p/nepaliwikipediatranslator ?, You are welcome. > > (Not .Net, because we already have a lot of stuff in .NET, and we are > > looking for .net alternatives so that we could use them in Linux easily) > > ====================== > > -- > > Rajesh Pandey > > -- > FOSS Nepal mailing list: [email protected] > http://groups.google.com/group/foss-nepal > To unsubscribe, e-mail: [email protected] > > Mailing List Guidelines: > http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines > Community website: http://www.fossnepal.org/ > > > -- > FOSS Nepal mailing list: [email protected] > http://groups.google.com/group/foss-nepal > To unsubscribe, e-mail: [email protected] > > Mailing List Guidelines: > http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines > Community website: http://www.fossnepal.org/ > -- Rajesh Pandey -- FOSS Nepal mailing list: [email protected] http://groups.google.com/group/foss-nepal To unsubscribe, e-mail: [email protected] Mailing List Guidelines: http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines Community website: http://www.fossnepal.org/
