[jira] Commented: (NUTCH-923) Multilingual support for Solr-index-mapping

Andrzej Bialecki (JIRA) Fri, 22 Oct 2010 11:02:32 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923947#action_12923947
 ]


Andrzej Bialecki  commented on NUTCH-923:
-----------------------------------------

My point was simply that if you want to build your data schema dynamically, 
based on the actual input data, then you need to be aware that this process is 
inherently risky - now we could perhaps deal with "lang" and 
LanguageIdentifier, but tomorrow we may be dealing with dc.author or cc.license 
or something else, and then we will face the same issue, ie. a potentially 
unlimited number of fields created based on data.

I don't have a good answer to this problem. On one hand this functionality is 
useful, on the other hand it's inherently risky in presence of less than ideal 
data, which is always a possibility... Perhaps introducing some sort of 
validation mechanism would make this safer to use.

> Multilingual support for Solr-index-mapping
> -------------------------------------------
>
>                 Key: NUTCH-923
>                 URL: https://issues.apache.org/jira/browse/NUTCH-923
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: 1.2
>            Reporter: Matthias Agethle
>            Assignee: Markus Jelsma
>            Priority: Minor
>
> It would be useful to extend the mapping-possibilites when indexing to solr.
> One useful feature would be to use the detected language of the html page 
> (for example via the language-identifier plugin) and send the content to 
> corresponding language-aware solr-fields.
> The mapping file could be as follows:
> <field dest="lang" source="lang"/>
> <field dest="title_${lang}" source="title" />
> so that the title-field gets mapped to title_en for English-pages and 
> tilte_fr for French pages.
> What do you think? Could this be useful also to others?
> Or are there already other solutions out there?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (NUTCH-923) Multilingual support for Solr-index-mapping

Reply via email to