Indexing XML document

Liaqat Ali Tue, 04 Dec 2007 10:05:45 -0800

Hi all,

I want to index an XML file,containing 200 Urdu language (Varient ofArabic and Persian) documents. This corpus is in CES format,consistingof information about author and many more, I just want to extracttextual data of each document and relative Doc number and title in eachdocument using SAX.

The problem I m facing that what should be the output of this wholeprocessing, which is acceptable to Lucene Indexer. I just want to storeDocument number, and Title with each document. The example given belowis Doc 2 from that XML file. I want to make complete index of 200documents with Doc number and title... Kindly guide me......



<h.title>Doc 2</h.title>

<title>حکمت یار کو ایران بدر کرنے پر غور</title>
</p>

<p>اور خبریں ہیں کہ انھیں ایران بدر کرنے پر بھی غور کیا جا رہا ہے۔ حکمتیار جو سابق سوویت یونین کی مداخلت کے خلاف امریکی حمایت سے چلے والیمزاحمت میں سامنے آۓ تھے اب مخالف خیالات کے لۓ جانے جاتے ہیں اور اب وہکرزئی انتظامیہ کی بھی مخالفت کررہے تھے۔ گذشتہ ہفتے ایران نے حکمت یار پرالزام لگایا تھا کہ وہ ایران کی سرزمین کو افغان انتظامیہ کے خلافکاروائیاں کرنے کے لۓ استعمال کررہے ہیں جب کہ ایران کا کہنا ہے کہ وہطالبان کے خلاف مزاحم دھڑوں کو جو حمایت فراحم کر رہا تھا وہ طالبان کاکنٹرول ختم ہونے کے بعد بند کر دی گئی ہے۔ تاہم بعض ذرائع کا خیال ہے کہایران نے حکمت یار کے خلاف اقدام امریکہ کے اعتراضات کے بعد کیے ہیں۔</p>




Thanks ..... Liaqat

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Indexing XML document

Reply via email to