Hi,
I m facing a problem while indexing a small .txt file with Lucene. The
file which i want to index with lucene is in Urdu language (varient of
Arabic and Persian). But the Index i get is in Unicode form, not in the
real form (original Urdu text). This program works good for a file in
English language. This is the code i use for indexing..
FileReader file = new FileReader ("urdoc.txt");
BufferedReader buff = new BufferedReader(file);
String line = buff.readLine();
boolean eof = false;
buff.close();
String indexDir = "D:\\index";
Analyzer analyzer = new StandardAnalyzer();
boolean createFlag = true;
IndexWriter writer =
new IndexWriter(indexDir, analyzer, createFlag);
Document document = new Document();
document.add(new Field("fieldname",line, Field.Store.YES,
Field.Index.TOKENIZED));
writer.addDocument(document);
writer.close();
Kindly guide me, what I should do, would i have to change this code or
whatever else do you suggest?
Liaqat
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]