[
https://issues.apache.org/jira/browse/CONNECTORS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shinichiro Abe updated CONNECTORS-1219:
---------------------------------------
Attachment: CONNECTORS-1219-v0.1patch.patch
strawman patch, still be improved more.
I think this connector will need to have much heap memory for working well.
Where are memory problems you said? Multiple threads are writing to an index?
If so, I took it into account the below.
In tika connector, on the other hand, BodyContentHandler should be replaced
with WriteOutContentHandler because any connectors might treat big string
object. WriteOutContentHandler has writeLimit param and have used by tika
facade or jackrabbit oak's solr integration to avoid consuming more memory.
Also, I have a plan to introduce mcf-search-api-service.war based on this
connector, since mcf would be able to have a search engine with pull-agent,
it's just an idea for me though. As to Lucene memory, multiple connections of
this connector share one client instance per local path because of those, and I
also have an idea to use it from search-api side.
> Lucene Output Connector
> -----------------------
>
> Key: CONNECTORS-1219
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1219
> Project: ManifoldCF
> Issue Type: New Feature
> Reporter: Shinichiro Abe
> Assignee: Shinichiro Abe
> Attachments: CONNECTORS-1219-v0.1patch.patch
>
>
> A output connector for Lucene local index directly, not via remote search
> engine. It would be nice if we could use Lucene various API to the index
> directly, even though we could do the same thing to the Solr or Elasticsearch
> index. I assume we can do something to classification, categorization, and
> tagging, using e.g lucene-classification package.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)