[jira] [Commented] (UIMA-3096) A LuCas extension that allows ro index the Lucene documents created by LuCas into a Solr server.

Tommaso Teofili (JIRA) Mon, 22 Jul 2013 08:06:42 -0700

    [ 
https://issues.apache.org/jira/browse/UIMA-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13715259#comment-13715259
 ]


Tommaso Teofili commented on UIMA-3096:
---------------------------------------

I think it'd make sense to have something like this for ElasticSearch in LuCas 
if Lucene is the "source" for ES so that for example one step of the UIMA 
pipeline maps CAS FeatureStructures to Lucene documents and one other maps 
Lucene documents to ES (but it may be Solr as well).

If instead a mapping between CAS FeatureStructures and ES is done in one single 
step (without explicitly passing through the separate Lucene indexing like 
SolrCas is doing) then it'd be probably worth a separate addon for ES mapping.

However given we're talking about the former scenario I'd say it would fit into 
LuCas for now.
                
> A LuCas extension that allows ro index the Lucene documents created by LuCas 
> into a Solr server.
> ------------------------------------------------------------------------------------------------
>
>                 Key: UIMA-3096
>                 URL: https://issues.apache.org/jira/browse/UIMA-3096
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Sandbox-Lucas
>    Affects Versions: 2.4.0Addons
>            Reporter: Erik Faessler
>            Priority: Minor
>         Attachments: lucasToSolr.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Add a UIMA component extending LuceneDocumentAE that converts the Lucene 
> document instances created by LuCas into Solr's PreAnalyzed field format 
> (http://wiki.apache.org/solr/PreAnalyzedField). The converted documents are 
> then sent in batches to Solr using the SolrJ API.
> On the Solr side, PreAnalyzedUpdateProcessorFactory 
> (http://lucene.apache.org/solr/4_3_1/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html)
>  can be used to restrict the pre-analyzed field values to an existing Solr 
> schema. In case that the LuCas mapping file matches the Solr schema, it is 
> easy as this:
> In solrconfig.xml, add this updateRequestProcessorChain:
>  <updateRequestProcessorChain name="pre-analyzed-json">
>     <processor class="solr.PreAnalyzedUpdateProcessorFactory">
>       <str name="fieldRegex">.*</str>
>       <str name="parser">json</str>
>     </processor>
>     <processor class="solr.RunUpdateProcessorFactory" />
>   </updateRequestProcessorChain>
> Then, add this chain to the default update handler:
> <requestHandler name="/update" class="solr.UpdateRequestHandler">
>        <lst name="defaults">
>          <str name="update.chain">pre-analyzed-json</str>
>        </lst>
>   </requestHandler>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (UIMA-3096) A LuCas extension that allows ro index the Lucene documents created by LuCas into a Solr server.

Reply via email to