[jira] [Commented] (CONNECTORS-1219) Lucene Output Connector

Shinichiro Abe (JIRA) Wed, 15 Jul 2015 19:43:37 -0700

    [ 
https://issues.apache.org/jira/browse/CONNECTORS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629096#comment-14629096
 ]


Shinichiro Abe commented on CONNECTORS-1219:
--------------------------------------------

Yes, it does for separate process and RMI. But there still has a serialization 
problem.
I'm not sure about RMI, read mcf in action yesterday though, but when 
mcf'connection invokes the method which will add or replace a document via RMI, 
the class having that method have to be implemented serializable. This class 
may have LuceneClient which has a indexwriter. Is this correct? If so, maybe it 
will not work. If correct, it works well if the method is implemented by not 
having LuceneClient in that class, and the method just puts to something object 
like queue, then LuceneClient picks up from the queue. But this case is not 
enough for me in indexing latency-wise.
A few month ago I was looking for lowerest indexing latency implementation as 
pull crawler model. At that time, I used apache spark, ignite working on 
distributed nodes, which require to implement serializable class. I used lucene 
indexes with local disk version or hdfs version, but all I did ended up with a 
failure because of indexwriter serialization. After that I thought mcf could 
become the the best lowest indexing latency application when we set up mcf 
single processes to each node. The each node has each index. But this thought 
does not meet mcf multi process model though.

> Lucene Output Connector
> -----------------------
>
>                 Key: CONNECTORS-1219
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1219
>             Project: ManifoldCF
>          Issue Type: New Feature
>            Reporter: Shinichiro Abe
>            Assignee: Shinichiro Abe
>         Attachments: CONNECTORS-1219-v0.1patch.patch, 
> CONNECTORS-1219-v0.2.patch, CONNECTORS-1219-v0.3.patch
>
>
> A output connector for Lucene local index directly, not via remote search 
> engine. It would be nice if we could use Lucene various API to the index 
> directly, even though we could do the same thing to the Solr or Elasticsearch 
> index. I assume we can do something to classification, categorization, and 
> tagging, using e.g lucene-classification package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CONNECTORS-1219) Lucene Output Connector

Reply via email to