Hi Gautham, Thanks. There are a few examples of using the XHTMLContentHandler, including the 5min parser guide on tika.apache.org. Check it out. HTML parser extracts text and metadata from the upstream content stream.
HTH! Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Gautham Gowrishankar <gowri...@usc.edu> Reply-To: "dev@tika.apache.org" <dev@tika.apache.org> Date: Sunday, September 21, 2014 4:24 PM To: "dev@tika.apache.org" <dev@tika.apache.org> Cc: "smrit...@usc.edu" <smrit...@usc.edu>, "madh...@usc.edu" <madh...@usc.edu>, shashank shiralikar <mailittoshash...@gmail.com> Subject: XHTML Content Handler >Hi, > > >I have a few a questions > >1.How can i use a XHTML content Handler to generate XHTML ? how should i >store the above content.? Else should i use a another content Handler >along >with XHTML Content Handler to do this task. Also how can i display the >XHTML content generated from XHTML Content Handler > >2.What is the exact role of a HTML Parser in its support of XHMTL media >type.Does it just extract text . > >Regards >Gautham