[jira] [Created] (SOLR-2939) Clustering of multilingual search results

Stanislaw Osinski (Created) (JIRA) Fri, 02 Dec 2011 05:26:04 -0800

Clustering of multilingual search results
-----------------------------------------


                 Key: SOLR-2939
                 URL: https://issues.apache.org/jira/browse/SOLR-2939
             Project: Solr
          Issue Type: Improvement
          Components: contrib - Clustering
            Reporter: Stanislaw Osinski
            Assignee: Stanislaw Osinski
             Fix For: 3.6


Carrot2 internally supports clustering of multilingual search results. The 
clustering component should allow passing a language field to Carrot2. This 
feature would need at least two new parameters: {{carrot.lang}} for the name of 
Solr field that contains the language code (ISO 639) and a {{carrot.lcmap}} 
field similar to the one in language recognizer to map arbitrary strings to ISO 
639 codes.

Another feature of language recognizer we should mirror is the expansion of the 
{{{lang}}} token in field names into the language code of the document (in case 
of multiple languages per document -- the first Carrot2-supported language 
code). The feature seems easy to implement in the non-distributed setting of 
Solr, but the simple implementation isn't going to work in the distributed 
setting because the name of the specific field to be fetched depends on the 
content (language) of each matching document. Looking at the 
{{SearchClusteringEngine.getFieldsToLoad(SolrQueryRequest)}} method, a quick 
but costly solution would be to load the contents of all stored fields. I'm not 
too strong in distributed-mode Solr, but maybe this could be optimized so that 
only the required fields get fetched?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SOLR-2939) Clustering of multilingual search results

Reply via email to