[jira] [Commented] (UIMA-3096) A LuCas extension that allows ro index the Lucene documents created by LuCas into a Solr server.

Tommaso Teofili (JIRA) Tue, 23 Jul 2013 00:50:01 -0700

    [ 
https://issues.apache.org/jira/browse/UIMA-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716211#comment-13716211
 ]


Tommaso Teofili commented on UIMA-3096:
---------------------------------------

bq. I agree on the distinction "CAS -> luceneDoc -> ES" vs. "CAS -> ES". More 
generally, it would be "CAS -> luceneDoc -> SearchServer" vs. "CAS -> 
SearchServer". Since I still rely on a few special abilities of LuCas (most 
importantly the tokenstream merging), it will be "CAS -> luceneDoc -> 
SearchServer" for me and I will make the appropriate additions available here 
on JIRA. If I will do this for ES, I will open a separate issue. For now, I 
will do some documentation on the Solr part.

ok great

bq. On a different note, the PreAnalyzed field type allows a very direct way of 
"CAS -> Solr" with a lot of control. I just won't build this because LuCas is 
fine for me and rebuilding the whole mapping stuff would just be too much work.

good point, probably I'll look into it for SolrCas with Solr 4.x
                
> A LuCas extension that allows ro index the Lucene documents created by LuCas 
> into a Solr server.
> ------------------------------------------------------------------------------------------------
>
>                 Key: UIMA-3096
>                 URL: https://issues.apache.org/jira/browse/UIMA-3096
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Sandbox-Lucas
>    Affects Versions: 2.4.0Addons
>            Reporter: Erik Faessler
>            Priority: Minor
>         Attachments: lucasToSolr.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Add a UIMA component extending LuceneDocumentAE that converts the Lucene 
> document instances created by LuCas into Solr's PreAnalyzed field format 
> (http://wiki.apache.org/solr/PreAnalyzedField). The converted documents are 
> then sent in batches to Solr using the SolrJ API.
> On the Solr side, PreAnalyzedUpdateProcessorFactory 
> (http://lucene.apache.org/solr/4_3_1/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html)
>  can be used to restrict the pre-analyzed field values to an existing Solr 
> schema. In case that the LuCas mapping file matches the Solr schema, it is 
> easy as this:
> In solrconfig.xml, add this updateRequestProcessorChain:
>  <updateRequestProcessorChain name="pre-analyzed-json">
>     <processor class="solr.PreAnalyzedUpdateProcessorFactory">
>       <str name="fieldRegex">.*</str>
>       <str name="parser">json</str>
>     </processor>
>     <processor class="solr.RunUpdateProcessorFactory" />
>   </updateRequestProcessorChain>
> Then, add this chain to the default update handler:
> <requestHandler name="/update" class="solr.UpdateRequestHandler">
>        <lst name="defaults">
>          <str name="update.chain">pre-analyzed-json</str>
>        </lst>
>   </requestHandler>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (UIMA-3096) A LuCas extension that allows ro index the Lucene documents created by LuCas into a Solr server.

Reply via email to