[jira] [Commented] (CONNECTORS-1219) Lucene Output Connector

Karl Wright (JIRA) Thu, 16 Jul 2015 22:40:09 -0700

    [ 
https://issues.apache.org/jira/browse/CONNECTORS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630800#comment-14630800
 ]


Karl Wright commented on CONNECTORS-1219:
-----------------------------------------

Hi Abe-san,
This sounds like a workable solution to the cluster problem. Can you
also write your lucene searcher to use the same technology?

Sent from my Windows Phone
From: Shinichiro Abe (JIRA)
Sent: 7/17/2015 1:18 AM
To: [email protected]
Subject: [jira] [Commented] (CONNECTORS-1219) Lucene Output Connector

    [ 
https://issues.apache.org/jira/browse/CONNECTORS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630787#comment-14630787
]

Shinichiro Abe commented on CONNECTORS-1219:
--------------------------------------------

Thanks [~apillaiz], I'd like to collect not only web content but also
manifold repositories content.

 [~DaddyWri], I discovered the
[OakDirectory|https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndexEditorContext.java#L89]
which extends Lucene Directory class. I saw the below comment, they
also had multi process(cluster) problem as to Lucene index, and they
put the index to Blob object that means mongodb or rdb storage. From
that, I come to switching Directory impl, for instance, we use
FSDirectory on mcf single process, and use
[HdfsDirectory|http://lucene.apache.org/solr/5_2_1/solr-core/org/apache/solr/store/hdfs/HdfsDirectory.html]
on mcf multi process. The writes to Hdfs was
[slow|https://github.com/ouava/lclient/blob/master/lclient-hdfs/src/main/java/org/apache/lucene/lclient/util/HdfsUtils.java#L47]
when I tried to use before. But this will be expected to improve.
I don't want to use RMI because... first: to avoid complexable
operation or increase 2 steps for bootstrap on single process mode,
second: I don't know how to write the test code, third: around me,
only one user uses multi process and everyone will hope to run mcf as
OOTB as possible,  fourth: jackrabbit 2 has RMI api but oak doesn't
have one. I think RMI is not cool as well as CMIS rather than JCR ,
fifth: I want to make mcf easy to use. These are not technical reason,
but HdfsDirectory will help us.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


> Lucene Output Connector
> -----------------------
>
>                 Key: CONNECTORS-1219
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1219
>             Project: ManifoldCF
>          Issue Type: New Feature
>            Reporter: Shinichiro Abe
>            Assignee: Shinichiro Abe
>         Attachments: CONNECTORS-1219-v0.1patch.patch, 
> CONNECTORS-1219-v0.2.patch, CONNECTORS-1219-v0.3.patch
>
>
> A output connector for Lucene local index directly, not via remote search 
> engine. It would be nice if we could use Lucene various API to the index 
> directly, even though we could do the same thing to the Solr or Elasticsearch 
> index. I assume we can do something to classification, categorization, and 
> tagging, using e.g lucene-classification package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CONNECTORS-1219) Lucene Output Connector

Reply via email to