[ 
https://issues.apache.org/jira/browse/TIKA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-3494:
------------------------------
    Description: 
The pipes module is built around the RecursiveParserWrapper, and I'm happy with 
that being the default.  I suspect that there are a number of users who will 
want the legacy behavior of getting a single metadata object + text for a 
document, no matter how many attachments.  We recently did this on TIKA-3352.

Let's add a "parseMode" or similar to FetchEmitTuple to allow for the legacy 
behavior.  

This will be a breaking change in the Solr and OpenSearch emitters.  The 
"concatenate" and "skip" options for those emitters can now be handled via the 
fetchemituple/pipesiterator.

  was:
The pipes module is built around the RecursiveParserWrapper, and I'm happy with 
that being the default.  I suspect that there are a number of users who will 
want the legacy behavior of getting a single metadata object + text for a 
document, no matter how many attachments.  We recently did this on TIKA-3352.

Let's add a "handlerMode" or similar to FetchEmitTuple to allow for the legacy 
behavior.  


> Allow legacy combined doc extract in pipes module
> -------------------------------------------------
>
>                 Key: TIKA-3494
>                 URL: https://issues.apache.org/jira/browse/TIKA-3494
>             Project: Tika
>          Issue Type: New Feature
>          Components: tika-pipes
>            Reporter: Tim Allison
>            Priority: Minor
>
> The pipes module is built around the RecursiveParserWrapper, and I'm happy 
> with that being the default.  I suspect that there are a number of users who 
> will want the legacy behavior of getting a single metadata object + text for 
> a document, no matter how many attachments.  We recently did this on 
> TIKA-3352.
> Let's add a "parseMode" or similar to FetchEmitTuple to allow for the legacy 
> behavior.  
> This will be a breaking change in the Solr and OpenSearch emitters.  The 
> "concatenate" and "skip" options for those emitters can now be handled via 
> the fetchemituple/pipesiterator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to