Re: storing the contents of a document in the lucene index

starz10de Wed, 23 Jul 2008 00:43:23 -0700

Hi Erik,

 I don't remove the stop words, as I index parallel corpora which is used
for learning the translations between pair of languages. so every word is
important. I even develop my own analyzer for Arabic which is just remove
punctuations and special symbols and it return only Arabic text.


I guess in the   FileDocument.java   the whole text is already stored

doc.add(Field.Text("contents", IN)); 

where IN is 

IN = new BufferedReader(new InputStreamReader(new FileInputStream(f))

if this is not the case yould you please how to store the whole text inside
the index ? 

I am new to lucene and I don't know how to use this "Field.Store.YES" to
store whole text.

 

Best regards
Farag



starz10de wrote:
> 
>   Could any one tell me please how to print the content of the document
> after reading the index.
> for example if i like to print the  index terms then i do :
> 
> IndexReader ir = IndexReader.open(index);
> TermEnum termEnum = ir.terms(); 
> while (termEnum.next()) {
>                       TermDocs dok = ir.termDocs();
>                       dok.seek(termEnum);
>                       while (dok.next()) {
> System.out.println(termEnum.term().text().trim());
>                               }
> 
> I can print the text files before indexing them, but because of encoding
> issues i like to print them from the index.
> As i know the content of the document(whole text) is also stored in the
> index, my question how to print this content.
> 
> so at the end i will print the path of the current document , index terms
> and the content of the document
> 
> 
> thanks in advance
> 

-- 
View this message in context: 
http://www.nabble.com/storing-the-contents-of-a-document-in-the--lucene-index-tp18595855p18605547.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: storing the contents of a document in the lucene index

Reply via email to