[
https://issues.apache.org/jira/browse/CONNECTORS-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320660#comment-16320660
]
Karl Wright commented on CONNECTORS-1482:
-----------------------------------------
[~schuch], the *only* mime type that the Tika Extractor sets for a document is
"text/plain". If you want to filter documents based on their *original* mime
type, you must do it *before* the Tika Extractor in your pipeline.
> Mime type exclusion and document length exclusion in Solr output connector
> don't apparently work
> ------------------------------------------------------------------------------------------------
>
> Key: CONNECTORS-1482
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1482
> Project: ManifoldCF
> Issue Type: Bug
> Components: Lucene/SOLR connector
> Affects Versions: ManifoldCF 2.9
> Reporter: Karl Wright
> Assignee: Karl Wright
> Fix For: ManifoldCF 2.10
>
> Attachments: problem_documents_connector.png,
> problem_documents_connector_solr.png,
> problem_documents_connector_solr_stream_size.png
>
>
> See attached images. Setting exclusions apparently does not prevent
> documents with that mime type from being included. This may be because of
> regexp characters etc but it needs to be researched and documented at least.
> Also, the length limitation doesn't seem to be working either.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)