[ 
https://issues.apache.org/jira/browse/CONNECTORS-214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054304#comment-13054304
 ] 

Erlend Garåsen commented on CONNECTORS-214:
-------------------------------------------

I agree. I suggest that the post-extraction fields are placed under the 
pre-extraction fields. This means that the include and exclude tabs will both 
have two fields each.

Maybe the post-extraction fields should support more advanced filtering rules, 
for instance filtering based on mime types? This will make it easier to filter 
out video files without having to define all kinds of video files extensions. 
What do you think?



> Add post-extraction inclusions and exclusions into the web connector
> --------------------------------------------------------------------
>
>                 Key: CONNECTORS-214
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-214
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Web connector
>    Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2
>            Reporter: Erlend Garåsen
>            Assignee: Erlend Garåsen
>             Fix For: ManifoldCF next
>
>
> If html files are excluded for a job, links in these files will not be 
> followed. If we add inclusion and exclusion filters based on post-extraction, 
> it will be possible to fetch only certain types of documents, such as PDFs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to