[
https://issues.apache.org/jira/browse/TIKA-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902591#comment-14902591
]
Tim Allison commented on TIKA-1740:
-----------------------------------
How about we store a list of <Metadata, Handler> pairs instead of Metadata
objects. The current {{getMetadata()}} would behave as it currently does.
We'll add {{getMetadataAndHandlers()}}, which would return the list of
<Metadata, Handler> pairs. This would not include TIKA_CONTENT.
The current {{getMetadata}} will call {{getMetadataAndHandlers}} under the hood
and add TIKA_CONTENT. An initial concern is that this will double memory at
the time that {{getMetadata}} is called, but as I think about the way the
recursion is working, we're pretty much doing that now.
How does this sound?
> RecursiveParserWrapper returning ContentHandler-s
> -------------------------------------------------
>
> Key: TIKA-1740
> URL: https://issues.apache.org/jira/browse/TIKA-1740
> Project: Tika
> Issue Type: Wish
> Components: core, parser
> Reporter: Andrea
>
> I would like to build a mechanism to allow a custom object being built
> starting from a parsing result. This can be done easily by working with a
> custom ContentHandler "transformer", but how can I achieve this result using
> a RecursiveParserWrapper? In this case I can only set a ContentHandlerFactory
> and the parser will just call the toString method and set it as a metadata.
> Can you imagine something to get the entire ContentHandler object for each
> subfile instead of the result of the toString method? Of course, it would
> also be needed to have a flag to disable the TIKA_CONTENT metadata production.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)