actually

http://localhost:49309/rmeta/body

works.

Interesting! I need to read up on the difference between these two and see
if i can just switch to using this always then.

On Wed, Jun 23, 2021 at 1:48 PM Tim Allison <[email protected]> wrote:

> > is possible in tika-server
>
> Currently, but this has been on my wishlist forever…
>
> On Wed, Jun 23, 2021 at 2:35 PM Tim Allison <[email protected]> wrote:
>
> > I don’t think handler customization generally is possible in Tika-server.
> >
> > What happens w /rmeta/body?
> >
> > On Wed, Jun 23, 2021 at 2:27 PM Nicholas DiPiazza <
> > [email protected]> wrote:
> >
> >> When we are using the Tika-Server and parsing an html
> >>
> >> <html><title>hi there</title><body>woah</body></html>
> >>
> >> The parser when called through the endpoing:
> >>
> >> http://localhost:49309/rmeta/text
> >>
> >> Will give you a basic result like this:
> >>
> >> [
> >> {
> >> "Content-Encoding": "ISO-8859-1",
> >> "Content-Type": "text/html; charset=ISO-8859-1",
> >> "X-Parsed-By": [
> >> "org.apache.tika.parser.DefaultParser",
> >> "org.apache.tika.parser.html.HtmlParser"
> >> ],
> >> "X-TIKA:content": "\n\n\n\n\n\n\nhi there\n\nwoah",
> >> "X-TIKA:content_handler": "ToTextContentHandler",
> >> "X-TIKA:embedded_depth": "0",
> >> "X-TIKA:parse_time_millis": "284",
> >> "dc:title": "hi there",
> >> "title": "hi there"
> >> }
> >> ]
> >>
> >> Notice how the title is in the body content.
> >>
> >> When using tika embedded in a java app, I know if you extend Tika's
> >> default
> >> handler you can customize the XHTML attributes such as <title> so that
> you
> >> could, for example, make it so that the content field does not have the
> >> title in it.
> >>
> >> Does anyone know when using Tika Server if there is a similar thing
> >> possible?
> >>
> >
>

Reply via email to