[jira] [Commented] (CONNECTORS-1219) Lucene Output Connector

Karl Wright (JIRA) Mon, 06 Jul 2015 12:42:35 -0700

    [ 
https://issues.apache.org/jira/browse/CONNECTORS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615520#comment-14615520
 ]


Karl Wright commented on CONNECTORS-1219:
-----------------------------------------

bq. This OOM could be resolved by tika write limit.

I don't think so, because it occurs after the LuceneDocument structure has been 
built already.  It occurs on the client.addOrReplace() line:

{code}
      LuceneDocument inputDoc = buildDocument(documentURI, document);
      client.addOrReplace(documentURI, inputDoc);
{code}

This is likely because Lucene needs some multiple of the maximum size of a 
document in order to compress field values.  But as long as memory consumption 
overall is limited by some user-controllable means, it's still OK, and the file 
size limit should do that.


> Lucene Output Connector
> -----------------------
>
>                 Key: CONNECTORS-1219
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1219
>             Project: ManifoldCF
>          Issue Type: New Feature
>            Reporter: Shinichiro Abe
>            Assignee: Shinichiro Abe
>         Attachments: CONNECTORS-1219-v0.1patch.patch, 
> CONNECTORS-1219-v0.2.patch
>
>
> A output connector for Lucene local index directly, not via remote search 
> engine. It would be nice if we could use Lucene various API to the index 
> directly, even though we could do the same thing to the Solr or Elasticsearch 
> index. I assume we can do something to classification, categorization, and 
> tagging, using e.g lucene-classification package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CONNECTORS-1219) Lucene Output Connector

Reply via email to