You may want to try the latest build of Apache POI. For .ppt POI provides a PPT2PNG converter.
for .doc we have : WordToFoConverter - to convert to PDF via Apache FOP WordToHtmlConverter and for .xls : ExcelToHtmlConverter All these features are in poi-scratchpad.jar. Regards, Yegor On Sat, Aug 20, 2011 at 5:39 PM, nirnaydewan <[email protected]> wrote: > Currently i am using Solr 3.3.0 to index Rich Documents like MS Word. This > also includes PDF as well. > > I want to show the whole indexed text as a preview after a search is made > and found in the specific documents. > > For e.g, if i make a search of the word "marketing" and this is found in > documents A,B and shown with highlight snippets as: > > Doc A > [highlight..] > > Doc B > [highlight..] > > > Highlight portion shows a part of the searched text with <em> embedded. > > Now, further expanding Doc A node, i want to show a preview of the whole > text like it was in document with formatting and all. > > Is it really possible? Because to what i have known till now is that, only > the text is extracted and stored with all formatting discarded. > > > If not, how will i be able to show? Please suggest some ways? > > > > Thanks in advance. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Preview-of-Rich-Documents-tp3270554p3270554.html > Sent from the Apache Tika - Development mailing list archive at Nabble.com. >
