[ 
https://issues.apache.org/jira/browse/CONNECTORS-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318497#comment-16318497
 ] 

Karl Wright commented on CONNECTORS-1482:
-----------------------------------------

The length exclusion code is trivial and hard to bypass, unless the HttpPoster 
object is created incorrectly:

{code}
  /**
  * Post the input stream to ingest
  *
   * @param documentURI is the document's uri.
   * @param document is the document structure to ingest.
   * @param arguments are the configuration arguments to pass in the post.  Key 
is argument name, value is a list of the argument values.
   * @param authorityNameString is the name of the governing authority for this 
document's acls, or null if none.
   * @param activities is the activities object, so we can report what's 
happening.   @return true if the ingestion was successful, or false if the 
ingestion is illegal.
  * @throws ManifoldCFException, ServiceInterruption
  */
  public boolean indexPost(String documentURI,
    RepositoryDocument document, Map<String,List<String>> arguments,
    String authorityNameString, IOutputAddActivity activities)
    throws ManifoldCFException, ServiceInterruption
  {
    if (Logging.ingest.isDebugEnabled())
      Logging.ingest.debug("indexPost(): '" + documentURI + "'");

    // If the document is too long, reject it.
    if (maxDocumentLength != null && document.getBinaryLength() > 
maxDocumentLength.longValue()){
      
activities.recordActivity(null,SolrConnector.INGEST_ACTIVITY,null,documentURI,activities.EXCLUDED_LENGTH,"Solr
 connector rejected document due to its big size: 
('"+document.getBinaryLength()+"')");
      return false;
    }
{code}


> Mime type exclusion and document length exclusion in Solr output connector 
> don't apparently work
> ------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1482
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1482
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Lucene/SOLR connector
>    Affects Versions: ManifoldCF 2.9
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 2.10
>
>         Attachments: problem_documents_connector.png, 
> problem_documents_connector_solr.png, 
> problem_documents_connector_solr_stream_size.png
>
>
> See attached images.  Setting exclusions apparently does not prevent 
> documents with that mime type from being included.  This may be because of 
> regexp characters etc but it needs to be researched and documented at least.  
> Also, the length limitation doesn't seem to be working either.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to