You may want to try the latest build of Apache POI.

For .ppt POI provides a PPT2PNG converter.

for .doc we have :

WordToFoConverter - to convert to PDF via Apache FOP
WordToHtmlConverter

and for .xls :
ExcelToHtmlConverter

All these features are in  poi-scratchpad.jar.

Regards,
Yegor

On Sat, Aug 20, 2011 at 5:39 PM, nirnaydewan <[email protected]> wrote:
> Currently i am using Solr 3.3.0 to index Rich Documents like MS Word. This
> also includes PDF as well.
>
> I want to show the whole indexed text as a preview after a search is made
> and found in the specific documents.
>
> For e.g, if i make a search of the word "marketing" and this is found in
> documents A,B and shown with highlight snippets as:
>
> Doc A
> [highlight..]
>
> Doc B
> [highlight..]
>
>
> Highlight portion shows a part of the searched text with <em> embedded.
>
> Now, further expanding Doc A node, i want to show a preview of the whole
> text like it was in document with formatting and all.
>
> Is it really possible? Because to what i have known till now is that, only
> the text is extracted and stored with all formatting discarded.
>
>
> If not, how will i be able to show? Please suggest some ways?
>
>
>
> Thanks in advance.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Preview-of-Rich-Documents-tp3270554p3270554.html
> Sent from the Apache Tika - Development mailing list archive at Nabble.com.
>

Reply via email to