Hi Erik,
I don't remove the stop words, as I index parallel corpora which is used
for learning the translations between pair of languages. so every word is
important. I even develop my own analyzer for Arabic which is just remove
punctuations and special symbols and it return only Arabic text.
I guess in the FileDocument.java the whole text is already stored
doc.add(Field.Text("contents", IN));
where IN is
IN = new BufferedReader(new InputStreamReader(new FileInputStream(f))
if this is not the case yould you please how to store the whole text inside
the index ?
I am new to lucene and I don't know how to use this "Field.Store.YES" to
store whole text.
Best regards
Farag
starz10de wrote:
>
> Could any one tell me please how to print the content of the document
> after reading the index.
> for example if i like to print the index terms then i do :
>
> IndexReader ir = IndexReader.open(index);
> TermEnum termEnum = ir.terms();
> while (termEnum.next()) {
> TermDocs dok = ir.termDocs();
> dok.seek(termEnum);
> while (dok.next()) {
> System.out.println(termEnum.term().text().trim());
> }
>
> I can print the text files before indexing them, but because of encoding
> issues i like to print them from the index.
> As i know the content of the document(whole text) is also stored in the
> index, my question how to print this content.
>
> so at the end i will print the path of the current document , index terms
> and the content of the document
>
>
> thanks in advance
>
--
View this message in context:
http://www.nabble.com/storing-the-contents-of-a-document-in-the--lucene-index-tp18595855p18605547.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]