[
https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920720#action_12920720
]
Jack Krupansky commented on CONNECTORS-118:
-------------------------------------------
Just to be clear, this subcrawling proosal does not depend on Apache VFS, but
as does Aperture it simply borrows the naming convention for representing the
id for each file as a pseudo-URL, not a real URL.
So, if somebody wants to de-reference one of these pseudo URLS they must:
1) Separate the prefix, parent-object-uri, and path from the pseudo-URL.
2) Fetch the file from the parent-object-uri.
3) Use an access library based on the prefix to extract the file at the path
from within the fetched archive.
> Crawled archive files should be expanded into their constituent files
> ---------------------------------------------------------------------
>
> Key: CONNECTORS-118
> URL: https://issues.apache.org/jira/browse/CONNECTORS-118
> Project: ManifoldCF
> Issue Type: New Feature
> Components: Framework crawler agent
> Reporter: Jack Krupansky
>
> Archive files such as zip, mbox, tar, etc. should be expanded into their
> constituent files during crawling of repositories so that any output
> connector would output the flattened archive.
> This could be an option, defaulted to ON, since someone may want to implement
> a "copy" connector that maintains crawled files as-is.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.