[
https://issues.apache.org/jira/browse/CONNECTORS-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318497#comment-16318497
]
Karl Wright commented on CONNECTORS-1482:
-----------------------------------------
The length exclusion code is trivial and hard to bypass, unless the HttpPoster
object is created incorrectly:
{code}
/**
* Post the input stream to ingest
*
* @param documentURI is the document's uri.
* @param document is the document structure to ingest.
* @param arguments are the configuration arguments to pass in the post. Key
is argument name, value is a list of the argument values.
* @param authorityNameString is the name of the governing authority for this
document's acls, or null if none.
* @param activities is the activities object, so we can report what's
happening. @return true if the ingestion was successful, or false if the
ingestion is illegal.
* @throws ManifoldCFException, ServiceInterruption
*/
public boolean indexPost(String documentURI,
RepositoryDocument document, Map<String,List<String>> arguments,
String authorityNameString, IOutputAddActivity activities)
throws ManifoldCFException, ServiceInterruption
{
if (Logging.ingest.isDebugEnabled())
Logging.ingest.debug("indexPost(): '" + documentURI + "'");
// If the document is too long, reject it.
if (maxDocumentLength != null && document.getBinaryLength() >
maxDocumentLength.longValue()){
activities.recordActivity(null,SolrConnector.INGEST_ACTIVITY,null,documentURI,activities.EXCLUDED_LENGTH,"Solr
connector rejected document due to its big size:
('"+document.getBinaryLength()+"')");
return false;
}
{code}
> Mime type exclusion and document length exclusion in Solr output connector
> don't apparently work
> ------------------------------------------------------------------------------------------------
>
> Key: CONNECTORS-1482
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1482
> Project: ManifoldCF
> Issue Type: Bug
> Components: Lucene/SOLR connector
> Affects Versions: ManifoldCF 2.9
> Reporter: Karl Wright
> Assignee: Karl Wright
> Fix For: ManifoldCF 2.10
>
> Attachments: problem_documents_connector.png,
> problem_documents_connector_solr.png,
> problem_documents_connector_solr_stream_size.png
>
>
> See attached images. Setting exclusions apparently does not prevent
> documents with that mime type from being included. This may be because of
> regexp characters etc but it needs to be researched and documented at least.
> Also, the length limitation doesn't seem to be working either.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)