Hi,
I m facing a problem while indexing a small .txt file with Lucene. The file which i want to index with lucene is in Urdu language (varient of Arabic and Persian). But the Index i get is in Unicode form, not in the real form (original Urdu text). This program works good for a file in English language. This is the code i use for indexing..

       FileReader file = new FileReader ("urdoc.txt");
       BufferedReader buff = new BufferedReader(file);
       String line = buff.readLine();
       boolean eof = false;
       buff.close();
       String indexDir = "D:\\index";
              Analyzer analyzer = new StandardAnalyzer();
           boolean createFlag = true;
       IndexWriter writer =
                   new IndexWriter(indexDir, analyzer, createFlag);
           Document document  = new Document();
       document.add(new Field("fieldname",line, Field.Store.YES,
       Field.Index.TOKENIZED));
           writer.addDocument(document);
           writer.close();

Kindly guide me, what I should do, would i have to change this code or whatever else do you suggest?

Liaqat

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to