[
https://issues.apache.org/jira/browse/CONNECTORS-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arcadius Ahouansou updated CONNECTORS-1193:
-------------------------------------------
Attachment: CONNECTORS-1193.patch
Hello [[email protected]].
This is an updated version of the patch that includes the following changes:
- Tika no longer used
- Using existing mechanism for pattern matching
- Routine is called if and only if user provided regex for content exclusion
- Only text documents will be matched
Thanks.
> Consider adding feature to web connector to skip pages that match specified
> criteria
> ------------------------------------------------------------------------------------
>
> Key: CONNECTORS-1193
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1193
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Web connector
> Affects Versions: ManifoldCF 1.10, ManifoldCF 2.2
> Reporter: Karl Wright
> Assignee: Karl Wright
> Fix For: ManifoldCF 1.10, ManifoldCF 2.2
>
> Attachments: CONNECTORS-1193.patch
>
>
> The user wants to skip content that matches specified criteria, because some
> sites don't return a 404 code (for instance) but instead return 200 with a
> textual error message.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)