Hi,
open office documents are getting indexed but when i search for the words of
those documents i am not seeing the correct result.

regards,
ganesh 

Uwe Schindler wrote:
> 
> For converting full text to plain text for indexing look at Apache TIKA,
> which has an converter for OpenDocument: http://lucene.apache.org/tika/
> 
> This Mailing List is *about* the development of Lucene, not about
> questions
> *how* to develop own code that uses Lucene.
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [EMAIL PROTECTED]
> 
>> -----Original Message-----
>> From: ganesh H D [mailto:[EMAIL PROTECTED]
>> Sent: Friday, November 21, 2008 1:50 PM
>> To: [email protected]
>> Subject: Indexing Open office documents
>> 
>> 
>> Hi,
>> 
>> I have been working on Apache Lucene from past 3 days. I tried to deploy
>> the
>> sample application which we get from lucene distribution. its working
>> absolutely fine. It's indexing all type files like .pdf, .Xml, .java ,
>> .txt
>> etc.....its also indexing open office documents also. but when i search
>> for
>> the words of open office documents, its not showing the exact result.
>> later
>> i come to know that open office documents are ZIP archives that contain
>> XML
>> files. we need to uncompress the file using Java's ZIP support, then
>> parse
>> meta.xml to get title etc. and content.xml to get the document's content.
>> But i couldn't get much information about this issue. please help me to
>> solve this issue.
>> 
>> regards,
>> ganesh
>> 
>> --
>> View this message in context: http://www.nabble.com/Indexing-Open-office-
>> documents-tp20620421p20620421.html
>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Indexing-Open-office-documents-tp20620421p20658947.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to