[
https://issues.apache.org/jira/browse/CONNECTORS-214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068554#comment-13068554
]
Karl Wright commented on CONNECTORS-214:
----------------------------------------
It wasn't added to the Solr connector because it wasn't clear whether the mime
type filter would be adequate for people's needs, and the Solr connector had
grown an unconfortable number of tabs already.
So where things were left is that the infrastructure was written to support
filtering by url, but the Solr connector only had mime type and length
filtering support added. Having said that, if you have a need I would be
willing to finish the job. It would be good to understand your actual use case
so I'd be sure to cover it.
> Add post-extraction inclusions and exclusions into the web connector
> --------------------------------------------------------------------
>
> Key: CONNECTORS-214
> URL: https://issues.apache.org/jira/browse/CONNECTORS-214
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Web connector
> Affects Versions: ManifoldCF 0.1, ManifoldCF 0.2
> Reporter: Erlend GarĂ¥sen
> Assignee: Karl Wright
> Fix For: ManifoldCF next
>
>
> If html files are excluded for a job, links in these files will not be
> followed. If we add inclusion and exclusion filters based on post-extraction,
> it will be possible to fetch only certain types of documents, such as PDFs.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira