Salam, Ahmed if you are trying to parse Quranic Corpus Xml file ,i think nltk is not the library needed
you need an xml parser like elementtree http://effbot.org/downloads/ and a text parser like pyparsing you need also to read about quranic corpus morphology syntax i am working on a python api for Quranic Corpus , but not ready to publish you need an 2010/5/24, Kais Dukes <k...@kaisdukes.com>: > Salam Ahmed, > > I don't know much about python. But I have forwarded your e-mail to the > comp-quran mailing list, Inshallah someone will be able to help you! > > w/salam, > > -- Kais > > ------------------------------------------- > From: Ahmed > Salem[SMTP:ahmed.elsayed.sa...@gmail.com<smtp%3aahmed.elsayed.sa...@gmail.com> > ] > Sent: Sunday, May 23, 2010 11:51:43 PM > To: Kais Dukes; k...@kaisdukes.com > Subject: Python with Quranic Arabic Corpus Help > Auto forwarded by a Rule > > Salmo alikom > Hi kais, > > > I'm student and now i trying to build Quranic Arabic search program with > python and Quran Corpus but i'm beginer at python and nltk > > I work on Quran corpus at the link<http://corpus.quran.com/download/> which > build on that format > # Format: <chapter> | <verse> | <word> | <token> | <part-of-speech> > > Now i need your help in finding all verse in selected chapter then all word > in selected (chapter,word) > > i start to seprate them with code like > > path = nltk.data.find('D:\\quran\quranic-corpus-text-0.1.txt') > > ar = {} > arabic =codecs.open(path, encoding='utf-8') > > line = arabic.readline() > while line!='': > tmp = line.splitlines('|') > ch= tmp[0] > v = tmp[1] > txt=tmp[2] > kkk=ch.strib()+":"+v.strib() > ar[kkk]=txt.strib() > > line = arabic.readline() > arabic.close() > > > but that way can't work yet also i think there are another easy way to do > that so if you can please help or advice > > Thanks, > -- > Ahmed Salem Resume<http://www.scribd.com/doc/14256056/Ahmed-LSayed-Salem-CV> > MUFIX Community<http://www.mufix.org> > Mobile : +2 (018) 23 79 073 >