[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files

Jack Krupansky (JIRA) Wed, 13 Oct 2010 11:40:02 -0700

    [ 
https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920711#action_12920711
 ]


Jack Krupansky commented on CONNECTORS-118:
-------------------------------------------

Subcrawling is based on the file type (zip, tar, gzip, bzip2, mbox, jar, etc.), 
not the type of repository that contains it. I can't speak about all repository 
types, but subcrawling would apply to web and SharePoint in addition to file 
system and share crawling. Basically, any repository type that returns files, 
as opposed to say the JDBC connector which is returning a row of data values 
rather than a file.


> Crawled archive files should be expanded into their constituent files
> ---------------------------------------------------------------------
>
>                 Key: CONNECTORS-118
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-118
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Framework crawler agent
>            Reporter: Jack Krupansky
>
> Archive files such as zip, mbox, tar, etc. should be expanded into their 
> constituent files during crawling of repositories so that any output 
> connector would output the flattened archive.
> This could be an option, defaulted to ON, since someone may want to implement 
> a "copy" connector that maintains crawled files as-is.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CONNECTORS-118) Crawled archive files should be expanded into their constituent files

Reply via email to