Hi Folks, *"If any one of you are interested in this please reply, so that we could work in this. "*
I am interested to make a group of few people who would be interested in data mining. If you are already involved in nlp-class.org. that would be great as well. Not to be confused with the word "data mining", The only thing we would do is extract Nepali words from wiktionary database dump<http://dumps.wikimedia.org/backup-index.html>where we would extract Nepali words and save them so that they could be used for various purposes. For instance: 1) Autocomplete 2) Nepali corpus 3) Nepali translator How "Autocomplete" works is providing suggestions while we start typing, if we have a list of words, we can provide suggestions for the users. The Nepali corpus, which contains words which are tagged as "Noun", "Adjective" etc can be created. I wish to use them in one of the "open source translator for Nepali<http://code.google.com/p/nepaliwikipediatranslator>" in which I am also involved in. The database dump of Wiktionary has an XML file which contains a lot of words and their English equivalents along with equivalents in other available languages. For instance : There would be <page> [[en:Apple]] [[ne:स्याउ]] [[new:स्याउ]] [[hi:सेव]] [[fr:????]] </page> etc So we need to extract स्याउ and Apple or a list of स्याउ, केरा , सुन्तला in a file. So that we could suggest स्याउ when a user starts typing स or suggest केरा when a user starts writing क . This is autocomplete. When we have स्याउ and Apple, we will have a Nepali translator as well. ================== Sorry for the ambiguous subject: Natural language processing: I could have added a more specific title, or "Data mining" would have been another subject. Thanks for your patience in reading this email :). ====================== Want to create a web based php/python/java application [Nepali translator] based on code.google.com/p/nepaliwikipediatranslator ?, You are welcome. (Not .Net, because we already have a lot of stuff in .NET, and we are looking for .net alternatives so that we could use them in Linux easily) ====================== -- Rajesh Pandey -- FOSS Nepal mailing list: [email protected] http://groups.google.com/group/foss-nepal To unsubscribe, e-mail: [email protected] Mailing List Guidelines: http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines Community website: http://www.fossnepal.org/
