@pravin: Thanks Pravin.
I am extracting them and creating files such as :
http://code.google.com/p/nepaliwikipediatranslator/source/browse/trunk/NepaliWikiPediaTranslator/bin/Debug/nounlist.txt
from this. I will be extracting Nepali/Hindi and English texts.

@Rakesh: You are welcome. Those NLP classes are great. I wish to get some
contributors for the translator, convert it to a web application, and host
somewhere. Lets see.

On Thu, Apr 12, 2012 at 1:55 PM, rakesh bachchan <[email protected]
> wrote:

> Well I am interested in this and will be happy to find myself in the
> group. I am also taking the online class of NLP currently being run by
> Stanford university(coursera.org).
>
> Thanking you
> Rakesh Kumar Bachchan
>
>   ------------------------------
> *From:* pravin joshi <[email protected]>
> *To:* FOSS Nepal <[email protected]>
> *Sent:* Thursday, 12 April 2012 6:53 AM
> *Subject:* [FOSS Nepal] Re: Natural language processing (Nepali)
>
> Just saw this mail thread. anyway below is Python code to extract all
> nepali words from the example of text you gave.
> # -*- coding: utf-8-*-
>
> data = """
> <page>
> [[en:Apple]]
> [[ne:स्याउ]]
> [[new:स्याउ]]
> [[hi:सेव]]
> [[fr:????]]
> </page>
> """
> def get_next_target(data):
>     start_link = data.find('[[ne:')
>     if start_link == -1:
>         return None, 0
>     start_quote = data.find('[[ne:', start_link)
>     end_quote = data.find(']]', start_quote + 1)
>     nepWord = data[start_quote + 1:end_quote]
>     nepWord = nepWord.split(":")[-1]
>     return nepWord, end_quote
>
> def get_all_nepData(data):
>     links = []
>     while True:
>         url, endpos = get_next_target(data)
>         if url:
>             links.append(url)
>             data = data[endpos:]
>         else:
>             break
>     return links
>
> if __name__ == "__main__":
>     t = get_all_nepData(data)--
>     for i in t:
>         print i
>
> Regarding autocomplete and word suggestion you might want to look at
> Bayes Theorem and using bulk text. You might want to read this paper
> thoroughly --- http://norvig.com/spell-correct.html
>
> Pravin
>
> On Apr 11, 10:00 am, Rajesh Pandey <[email protected]> wrote:
> > Hi Folks,
> > *"If any one of you are interested in this please reply, so that we could
> > work in this. "*
> >
> > I am interested to make a group of few people who would be interested in
> > data mining. If you are already involved in nlp-class.org. that would be
> > great as well.
> > Not to be confused with the word "data mining", The only thing we would
> do
> > is extract Nepali words from wiktionary database
> > dump<http://dumps.wikimedia.org/backup-index.html>where we would
> > extract Nepali words and save them so that they could be
> > used for various purposes.
> > For instance:
> > 1) Autocomplete
> > 2) Nepali corpus
> > 3) Nepali translator
> >
> > How "Autocomplete" works is providing suggestions while we start typing,
> if
> > we have a list of words, we can provide suggestions for the users.
> >
> > The Nepali corpus, which contains words which are tagged as "Noun",
> > "Adjective" etc can be created. I wish to use them in one of the "open
> > source translator for
> > Nepali<http://code.google.com/p/nepaliwikipediatranslator>"
> > in which I am also involved in.
> >
> > The database dump of Wiktionary has an XML file which contains a lot of
> > words and their English equivalents along with equivalents in other
> > available languages.
> >
> > For instance : There would be
> > <page>
> > [[en:Apple]]
> > [[ne:स्याउ]]
> > [[new:स्याउ]]
> > [[hi:सेव]]
> > [[fr:????]]
> > </page>
> >
> > etc
> > So we need to extract स्याउ and Apple or a list of स्याउ, केरा , सुन्तला
> in
> > a file. So that we could suggest स्याउ when a user starts typing स  or
> > suggest केरा when a user starts writing क . This is autocomplete.
> >
> > When we have स्याउ and Apple, we will have a Nepali translator as well.
> >
> > ==================
> > Sorry for the ambiguous subject: Natural language processing: I could
> have
> > added a more specific title, or "Data mining" would have been another
> > subject. Thanks for your patience in reading this email :).
> > ======================
> > Want to create a web based php/python/java application [Nepali
> translator]
> > based on code.google.com/p/nepaliwikipediatranslator ?, You are welcome.
> > (Not .Net, because we already have a lot of stuff in .NET, and we are
> > looking for .net alternatives so that we could use them in Linux easily)
> > ======================
> > --
> > Rajesh Pandey
>
> --
> FOSS Nepal mailing list: [email protected]
> http://groups.google.com/group/foss-nepal
> To unsubscribe, e-mail: [email protected]
>
> Mailing List Guidelines:
> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines
> Community website: http://www.fossnepal.org/
>
>
>   --
> FOSS Nepal mailing list: [email protected]
> http://groups.google.com/group/foss-nepal
> To unsubscribe, e-mail: [email protected]
>
> Mailing List Guidelines:
> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines
> Community website: http://www.fossnepal.org/
>



-- 
Rajesh Pandey

-- 
FOSS Nepal mailing list: [email protected]
http://groups.google.com/group/foss-nepal
To unsubscribe, e-mail: [email protected]

Mailing List Guidelines: 
http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines
Community website: http://www.fossnepal.org/

Reply via email to