[
https://issues.apache.org/jira/browse/NUTCH-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924491#action_12924491
]
Matthias Agethle commented on NUTCH-923:
----------------------------------------
Perhaps something like Solr DIH could be a solution. Adding scriptable
transformers would allow to write custom logic and would be much more flexible.
This way one could also add default field values if no value is provided etc.
E.g.
{code:xml}
<script><![CDATA[
function addLanguage(row) {
//Implementation
}
]]></script>
<fields transformer="script:addLanguage" >
<field dest="lang" source="lang"/>
<field dest="title" source="title"/>
</fields>
{code}
In the addLanguage script one could do all kind of validations to restrict
explosion of field-names.
> Multilingual support for Solr-index-mapping
> -------------------------------------------
>
> Key: NUTCH-923
> URL: https://issues.apache.org/jira/browse/NUTCH-923
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 1.2
> Reporter: Matthias Agethle
> Assignee: Markus Jelsma
> Priority: Minor
>
> It would be useful to extend the mapping-possibilites when indexing to solr.
> One useful feature would be to use the detected language of the html page
> (for example via the language-identifier plugin) and send the content to
> corresponding language-aware solr-fields.
> The mapping file could be as follows:
> <field dest="lang" source="lang"/>
> <field dest="title_${lang}" source="title" />
> so that the title-field gets mapped to title_en for English-pages and
> tilte_fr for French pages.
> What do you think? Could this be useful also to others?
> Or are there already other solutions out there?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.