Clustering of multilingual search results
-----------------------------------------
Key: SOLR-2939
URL: https://issues.apache.org/jira/browse/SOLR-2939
Project: Solr
Issue Type: Improvement
Components: contrib - Clustering
Reporter: Stanislaw Osinski
Assignee: Stanislaw Osinski
Fix For: 3.6
Carrot2 internally supports clustering of multilingual search results. The
clustering component should allow passing a language field to Carrot2. This
feature would need at least two new parameters: {{carrot.lang}} for the name of
Solr field that contains the language code (ISO 639) and a {{carrot.lcmap}}
field similar to the one in language recognizer to map arbitrary strings to ISO
639 codes.
Another feature of language recognizer we should mirror is the expansion of the
{{{lang}}} token in field names into the language code of the document (in case
of multiple languages per document -- the first Carrot2-supported language
code). The feature seems easy to implement in the non-distributed setting of
Solr, but the simple implementation isn't going to work in the distributed
setting because the name of the specific field to be fetched depends on the
content (language) of each matching document. Looking at the
{{SearchClusteringEngine.getFieldsToLoad(SolrQueryRequest)}} method, a quick
but costly solution would be to load the contents of all stored fields. I'm not
too strong in distributed-mode Solr, but maybe this could be optimized so that
only the required fields get fetched?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]