[
https://issues.apache.org/jira/browse/TIKA-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tyler Palsulich closed TIKA-675.
--------------------------------
Resolution: Fixed
Marking as fixed. Please see the RecursiveParserWrapper. Thanks Nick.
> PackageExtractor should track names of recursively nested resources
> -------------------------------------------------------------------
>
> Key: TIKA-675
> URL: https://issues.apache.org/jira/browse/TIKA-675
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 0.10
> Reporter: Andrzej Bialecki
>
> When parsing archive formats the hierarchy of names is not tracked, only the
> current embedded component's name is preserved under
> Metadata.RESOURCE_NAME_KEY. In a way similar to the VFS model it would be
> nice to build pseudo-urls for nested resources. In case of Tika API that uses
> streams this could look like
> {code}tar:gz:stream://example.tar.gz!/example.tar!/example.html{code} ...or
> otherwise track the parent-child relationship - e.g. some applications need
> this information to indicate what composite documents to delete from the
> index after a container archive has been deleted.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)