[
https://issues.apache.org/jira/browse/CONNECTORS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613588#comment-14613588
]
Shinichiro Abe commented on CONNECTORS-1219:
--------------------------------------------
I created a branch. r1689110.
And added some fixes. r1689113.
I have known issue or limitations in the branch.
* indexing big contents with parallel output connections may happen OOM. to
avoid this:
** reduce throttling size
** make tika to cut content out by limit(not implemented)
** make term vector off (not implemented)
* can not reflect online schema changes until being called by
IConnector.disconnect() or poll() expiration.
* not implemented analyzer resource path.
* not implemented other field types except for string and text.
Please review the branch. LuceneClientTest.java by maven and luke index browser
might be helpful for test. Thank you.
> Lucene Output Connector
> -----------------------
>
> Key: CONNECTORS-1219
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1219
> Project: ManifoldCF
> Issue Type: New Feature
> Reporter: Shinichiro Abe
> Assignee: Shinichiro Abe
> Attachments: CONNECTORS-1219-v0.1patch.patch
>
>
> A output connector for Lucene local index directly, not via remote search
> engine. It would be nice if we could use Lucene various API to the index
> directly, even though we could do the same thing to the Solr or Elasticsearch
> index. I assume we can do something to classification, categorization, and
> tagging, using e.g lucene-classification package.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)