[
https://issues.apache.org/jira/browse/CONNECTORS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629096#comment-14629096
]
Shinichiro Abe commented on CONNECTORS-1219:
--------------------------------------------
Yes, it does for separate process and RMI. But there still has a serialization
problem.
I'm not sure about RMI, read mcf in action yesterday though, but when
mcf'connection invokes the method which will add or replace a document via RMI,
the class having that method have to be implemented serializable. This class
may have LuceneClient which has a indexwriter. Is this correct? If so, maybe it
will not work. If correct, it works well if the method is implemented by not
having LuceneClient in that class, and the method just puts to something object
like queue, then LuceneClient picks up from the queue. But this case is not
enough for me in indexing latency-wise.
A few month ago I was looking for lowerest indexing latency implementation as
pull crawler model. At that time, I used apache spark, ignite working on
distributed nodes, which require to implement serializable class. I used lucene
indexes with local disk version or hdfs version, but all I did ended up with a
failure because of indexwriter serialization. After that I thought mcf could
become the the best lowest indexing latency application when we set up mcf
single processes to each node. The each node has each index. But this thought
does not meet mcf multi process model though.
> Lucene Output Connector
> -----------------------
>
> Key: CONNECTORS-1219
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1219
> Project: ManifoldCF
> Issue Type: New Feature
> Reporter: Shinichiro Abe
> Assignee: Shinichiro Abe
> Attachments: CONNECTORS-1219-v0.1patch.patch,
> CONNECTORS-1219-v0.2.patch, CONNECTORS-1219-v0.3.patch
>
>
> A output connector for Lucene local index directly, not via remote search
> engine. It would be nice if we could use Lucene various API to the index
> directly, even though we could do the same thing to the Solr or Elasticsearch
> index. I assume we can do something to classification, categorization, and
> tagging, using e.g lucene-classification package.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)