[
https://issues.apache.org/jira/browse/CONNECTORS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615520#comment-14615520
]
Karl Wright commented on CONNECTORS-1219:
-----------------------------------------
bq. This OOM could be resolved by tika write limit.
I don't think so, because it occurs after the LuceneDocument structure has been
built already. It occurs on the client.addOrReplace() line:
{code}
LuceneDocument inputDoc = buildDocument(documentURI, document);
client.addOrReplace(documentURI, inputDoc);
{code}
This is likely because Lucene needs some multiple of the maximum size of a
document in order to compress field values. But as long as memory consumption
overall is limited by some user-controllable means, it's still OK, and the file
size limit should do that.
> Lucene Output Connector
> -----------------------
>
> Key: CONNECTORS-1219
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1219
> Project: ManifoldCF
> Issue Type: New Feature
> Reporter: Shinichiro Abe
> Assignee: Shinichiro Abe
> Attachments: CONNECTORS-1219-v0.1patch.patch,
> CONNECTORS-1219-v0.2.patch
>
>
> A output connector for Lucene local index directly, not via remote search
> engine. It would be nice if we could use Lucene various API to the index
> directly, even though we could do the same thing to the Solr or Elasticsearch
> index. I assume we can do something to classification, categorization, and
> tagging, using e.g lucene-classification package.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)