actually http://localhost:49309/rmeta/body
works. Interesting! I need to read up on the difference between these two and see if i can just switch to using this always then. On Wed, Jun 23, 2021 at 1:48 PM Tim Allison <[email protected]> wrote: > > is possible in tika-server > > Currently, but this has been on my wishlist forever… > > On Wed, Jun 23, 2021 at 2:35 PM Tim Allison <[email protected]> wrote: > > > I don’t think handler customization generally is possible in Tika-server. > > > > What happens w /rmeta/body? > > > > On Wed, Jun 23, 2021 at 2:27 PM Nicholas DiPiazza < > > [email protected]> wrote: > > > >> When we are using the Tika-Server and parsing an html > >> > >> <html><title>hi there</title><body>woah</body></html> > >> > >> The parser when called through the endpoing: > >> > >> http://localhost:49309/rmeta/text > >> > >> Will give you a basic result like this: > >> > >> [ > >> { > >> "Content-Encoding": "ISO-8859-1", > >> "Content-Type": "text/html; charset=ISO-8859-1", > >> "X-Parsed-By": [ > >> "org.apache.tika.parser.DefaultParser", > >> "org.apache.tika.parser.html.HtmlParser" > >> ], > >> "X-TIKA:content": "\n\n\n\n\n\n\nhi there\n\nwoah", > >> "X-TIKA:content_handler": "ToTextContentHandler", > >> "X-TIKA:embedded_depth": "0", > >> "X-TIKA:parse_time_millis": "284", > >> "dc:title": "hi there", > >> "title": "hi there" > >> } > >> ] > >> > >> Notice how the title is in the body content. > >> > >> When using tika embedded in a java app, I know if you extend Tika's > >> default > >> handler you can customize the XHTML attributes such as <title> so that > you > >> could, for example, make it so that the content field does not have the > >> title in it. > >> > >> Does anyone know when using Tika Server if there is a similar thing > >> possible? > >> > > >
