Hello there I am also intrested in this project. And I am also taking the
cource through the courcera.org. Please let me know how can we start this
project. I am very eager on this and want to start asap.

On Thu, Apr 12, 2012 at 03:48, Rajesh Pandey <[email protected]>wrote:

> @pravin: Thanks Pravin.
> I am extracting them and creating files such as :
>
> http://code.google.com/p/nepaliwikipediatranslator/source/browse/trunk/NepaliWikiPediaTranslator/bin/Debug/nounlist.txt
> from this. I will be extracting Nepali/Hindi and English texts.
>
> @Rakesh: You are welcome. Those NLP classes are great. I wish to get some
> contributors for the translator, convert it to a web application, and host
> somewhere. Lets see.
>
> On Thu, Apr 12, 2012 at 1:55 PM, rakesh bachchan <
> [email protected]> wrote:
>
>> Well I am interested in this and will be happy to find myself in the
>> group. I am also taking the online class of NLP currently being run by
>> Stanford university(coursera.org).
>>
>> Thanking you
>> Rakesh Kumar Bachchan
>>
>>   ------------------------------
>> *From:* pravin joshi <[email protected]>
>> *To:* FOSS Nepal <[email protected]>
>> *Sent:* Thursday, 12 April 2012 6:53 AM
>> *Subject:* [FOSS Nepal] Re: Natural language processing (Nepali)
>>
>> Just saw this mail thread. anyway below is Python code to extract all
>> nepali words from the example of text you gave.
>> # -*- coding: utf-8-*-
>>
>> data = """
>> <page>
>> [[en:Apple]]
>> [[ne:स्याउ]]
>> [[new:स्याउ]]
>> [[hi:सेव]]
>> [[fr:????]]
>> </page>
>> """
>> def get_next_target(data):
>>     start_link = data.find('[[ne:')
>>     if start_link == -1:
>>         return None, 0
>>     start_quote = data.find('[[ne:', start_link)
>>     end_quote = data.find(']]', start_quote + 1)
>>     nepWord = data[start_quote + 1:end_quote]
>>     nepWord = nepWord.split(":")[-1]
>>     return nepWord, end_quote
>>
>> def get_all_nepData(data):
>>     links = []
>>     while True:
>>         url, endpos = get_next_target(data)
>>         if url:
>>             links.append(url)
>>             data = data[endpos:]
>>         else:
>>             break
>>     return links
>>
>> if __name__ == "__main__":
>>     t = get_all_nepData(data)--
>>     for i in t:
>>         print i
>>
>> Regarding autocomplete and word suggestion you might want to look at
>> Bayes Theorem and using bulk text. You might want to read this paper
>> thoroughly --- http://norvig.com/spell-correct.html
>>
>> Pravin
>>
>> On Apr 11, 10:00 am, Rajesh Pandey <[email protected]> wrote:
>> > Hi Folks,
>> > *"If any one of you are interested in this please reply, so that we
>> could
>> > work in this. "*
>> >
>> > I am interested to make a group of few people who would be interested in
>> > data mining. If you are already involved in nlp-class.org. that would
>> be
>> > great as well.
>> > Not to be confused with the word "data mining", The only thing we would
>> do
>> > is extract Nepali words from wiktionary database
>> > dump<http://dumps.wikimedia.org/backup-index.html>where we would
>> > extract Nepali words and save them so that they could be
>> > used for various purposes.
>> > For instance:
>> > 1) Autocomplete
>> > 2) Nepali corpus
>> > 3) Nepali translator
>> >
>> > How "Autocomplete" works is providing suggestions while we start
>> typing, if
>> > we have a list of words, we can provide suggestions for the users.
>> >
>> > The Nepali corpus, which contains words which are tagged as "Noun",
>> > "Adjective" etc can be created. I wish to use them in one of the "open
>> > source translator for
>> > Nepali<http://code.google.com/p/nepaliwikipediatranslator>"
>> > in which I am also involved in.
>> >
>> > The database dump of Wiktionary has an XML file which contains a lot of
>> > words and their English equivalents along with equivalents in other
>> > available languages.
>> >
>> > For instance : There would be
>> > <page>
>> > [[en:Apple]]
>> > [[ne:स्याउ]]
>> > [[new:स्याउ]]
>> > [[hi:सेव]]
>> > [[fr:????]]
>> > </page>
>> >
>> > etc
>> > So we need to extract स्याउ and Apple or a list of स्याउ, केरा ,
>> सुन्तला in
>> > a file. So that we could suggest स्याउ when a user starts typing स  or
>> > suggest केरा when a user starts writing क . This is autocomplete.
>> >
>> > When we have स्याउ and Apple, we will have a Nepali translator as well.
>> >
>> > ==================
>> > Sorry for the ambiguous subject: Natural language processing: I could
>> have
>> > added a more specific title, or "Data mining" would have been another
>> > subject. Thanks for your patience in reading this email :).
>> > ======================
>> > Want to create a web based php/python/java application [Nepali
>> translator]
>> > based on code.google.com/p/nepaliwikipediatranslator ?, You are
>> welcome.
>> > (Not .Net, because we already have a lot of stuff in .NET, and we are
>> > looking for .net alternatives so that we could use them in Linux easily)
>> > ======================
>> > --
>> > Rajesh Pandey
>>
>> --
>> FOSS Nepal mailing list: [email protected]
>> http://groups.google.com/group/foss-nepal
>> To unsubscribe, e-mail: [email protected]
>>
>> Mailing List Guidelines:
>> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines
>> Community website: http://www.fossnepal.org/
>>
>>
>>   --
>> FOSS Nepal mailing list: [email protected]
>> http://groups.google.com/group/foss-nepal
>> To unsubscribe, e-mail: [email protected]
>>
>> Mailing List Guidelines:
>> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines
>> Community website: http://www.fossnepal.org/
>>
>
>
>
> --
> Rajesh Pandey
>
> --
> FOSS Nepal mailing list: [email protected]
> http://groups.google.com/group/foss-nepal
> To unsubscribe, e-mail: [email protected]
>
> Mailing List Guidelines:
> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines
> Community website: http://www.fossnepal.org/
>



-- 
Regards,
Bishisht Bhatta
Pepsicola Townplanning-35
Kathmandu,Nepal
+977-(980-641-6309)
+977-(981-352-7344)
+977-(984-984-9525)

******************************************************************************************
Freelance Programmer
Cheap Webhosting and Webdesign.
Software Development and Maintenance.
******************************************************************************************
Computer Engineering Student, Nepal College of Information Technology
http://www.ncit.net.np/
Balkumari, Lalitpur

******************************************************************************************
Volunteer at Nepal Wireless Networking Project
Plesae visit
http://www.nepalwireless.net/
http://himanchal.org/

-- 
FOSS Nepal mailing list: [email protected]
http://groups.google.com/group/foss-nepal
To unsubscribe, e-mail: [email protected]

Mailing List Guidelines: 
http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines
Community website: http://www.fossnepal.org/

Reply via email to