[ 
https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommaso Teofili updated SOLR-2129:
----------------------------------

    Attachment: SOLR-2129-version2.patch

Huge Solr-UIMA refactoring, including injecting the following information from 
<uimaConfig> tag inside solrconfig:

1. added dynamic field mapping with the following syntax:
<fieldMapping>
    <type name="org.apache.uima.jcas.tcas.Annotation">
      <map feature="coveredText" field="tag"/>
    </type>
    <type name="org.apache.uima.jcas.tcas.AnotherAnnotationType">
      <map feature="featureName" field="anotherField"/>
    </type>
</fieldMapping>

2. added AnalysisEngine descriptor path (must be inside the classpath)
<analysisEngine>/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</analysisEngine>

3. added fields' values to be analyzed, eventually merging their values to make 
UIMA run only once:
 <analyzeFields merge="false">text,title</analyzeFields>

Runtime parameters for defining overriding parameters for delegate AEs remains 
the same:
<runtimeParameters>
    <keyword_apikey>VALID_ALCHEMYAPI_KEY</keyword_apikey>
    <concept_apikey>VALID_ALCHEMYAPI_KEY</concept_apikey>
    <lang_apikey>VALID_ALCHEMYAPI_KEY</lang_apikey>
    <cat_apikey>VALID_ALCHEMYAPI_KEY</cat_apikey>
    <oc_licenseID>VALID_OPENCALAIS_KEY</oc_licenseID>
</runtimeParameters>

These changes should make the use of such a module much easier and flexible.
Looking forward for your feedback.
Tommaso

> Provide a Solr module for dynamic metadata extraction/indexing with Apache 
> UIMA
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-2129
>                 URL: https://issues.apache.org/jira/browse/SOLR-2129
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Tommaso Teofili
>            Assignee: Robert Muir
>         Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, 
> SOLR-2129-version2.patch, SOLR-2129.patch
>
>
> Provide components to enable Apache UIMA automatic metadata extraction to be 
> exploited when indexing documents.
> The purpose of this is to get unstructured information "inside" a document 
> and create structured metadata (as fields) to enrich each document.
> Basically this can be done with a custom UpdateRequestProcessor which 
> triggers UIMA while indexing documents.
> The basic UIMA implementation of UpdateRequestProcessor extracts sentences 
> (with a tokenizer and an hidden Markov model tagger), named entities, 
> language, suggested category, keywords and concepts (exploiting external 
> services from OpenCalais and AlchemyAPI). Such an implementation can be 
> easily extended adding or selecting different UIMA analysis engines, both 
> from UIMA repositories on the web or creating new ones from scratch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to