[
https://issues.apache.org/jira/browse/TIKA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-3494:
------------------------------
Description:
The pipes module is built around the RecursiveParserWrapper, and I'm happy with
that being the default. I suspect that there are a number of users who will
want the legacy behavior of getting a single metadata object + text for a
document, no matter how many attachments. We recently did this on TIKA-3352.
Let's add a "parseMode" or similar to FetchEmitTuple to allow for the legacy
behavior.
This will be a breaking change in the Solr and OpenSearch emitters. The
"concatenate" and "skip" options for those emitters can now be handled via the
fetchemituple/pipesiterator.
was:
The pipes module is built around the RecursiveParserWrapper, and I'm happy with
that being the default. I suspect that there are a number of users who will
want the legacy behavior of getting a single metadata object + text for a
document, no matter how many attachments. We recently did this on TIKA-3352.
Let's add a "handlerMode" or similar to FetchEmitTuple to allow for the legacy
behavior.
> Allow legacy combined doc extract in pipes module
> -------------------------------------------------
>
> Key: TIKA-3494
> URL: https://issues.apache.org/jira/browse/TIKA-3494
> Project: Tika
> Issue Type: New Feature
> Components: tika-pipes
> Reporter: Tim Allison
> Priority: Minor
>
> The pipes module is built around the RecursiveParserWrapper, and I'm happy
> with that being the default. I suspect that there are a number of users who
> will want the legacy behavior of getting a single metadata object + text for
> a document, no matter how many attachments. We recently did this on
> TIKA-3352.
> Let's add a "parseMode" or similar to FetchEmitTuple to allow for the legacy
> behavior.
> This will be a breaking change in the Solr and OpenSearch emitters. The
> "concatenate" and "skip" options for those emitters can now be handled via
> the fetchemituple/pipesiterator.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)