[
https://issues.apache.org/jira/browse/NUTCH-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276805#comment-13276805
]
Markus Agethle commented on NUTCH-923:
--------------------------------------
It's a while since this got updated. Are there any news here?
I think this would be very useful. How would one do stemming, stopword removal
etc without some extension like this?
Perhaps we can add some validation to the solr indexer component in order to
limit the detected languages to the supported ones?
> Multilingual support for Solr-index-mapping
> -------------------------------------------
>
> Key: NUTCH-923
> URL: https://issues.apache.org/jira/browse/NUTCH-923
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 1.2
> Reporter: Matthias Agethle
> Assignee: Markus Jelsma
> Priority: Minor
> Attachments: patch-923-nutch-release-1.2.txt
>
>
> It would be useful to extend the mapping-possibilites when indexing to solr.
> One useful feature would be to use the detected language of the html page
> (for example via the language-identifier plugin) and send the content to
> corresponding language-aware solr-fields.
> The mapping file could be as follows:
> <field dest="lang" source="lang"/>
> <field dest="title_${lang}" source="title" />
> so that the title-field gets mapped to title_en for English-pages and
> tilte_fr for French pages.
> What do you think? Could this be useful also to others?
> Or are there already other solutions out there?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira