[
https://issues.apache.org/jira/browse/SOLR-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568745#comment-13568745
]
Nolan Lawson commented on SOLR-4381:
------------------------------------
{quote}
Could you specify which private methods in eDisMax you needed to copy/paste?
Perhaps we can look at how to make it more extension friendly?
{quote}
[These
lines|https://github.com/healthonnet/hon-lucene-synonyms/blob/master/src/main/java/org/apache/solr/search/SynonymExpandingExtendedDismaxQParserPlugin.java#L494].
{quote}
If this issue is to be seriously pursued as part of edismax, the following
should be included here in JIRA:
{quote}
I don't think it should be included in EDisMax itself. Extending EDisMax was
just a temporary shortcut I took, but [Jan points
out|https://github.com/healthonnet/hon-lucene-synonyms/issues/6] that the
solution itself could be applied outside EDisMax, or even outside Solr.
{quote}
1. A concise summary of the overall approach, with key technical details.
{quote}
Please see [this blog
post|http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/] for
the best explanation.
{quote}
2. A few example queries, both source and the resulting "parsed query". Key
test cases, if you will.
{quote}
Good idea. [Added to the
README.|https://github.com/healthonnet/hon-lucene-synonyms#tweaking-the-results]
{quote}
3. A semi-detailed summary of what the user of the change needs to know, in
terms of how to set it up, manage it, use it, and its precise effects.
{quote}
[In the
README|https://github.com/healthonnet/hon-lucene-synonyms#query-parameters] for
now.
{quote}
4. Detail any limitations.
{quote}
Currently handling this in the [Issues
page|https://github.com/healthonnet/hon-lucene-synonyms/issues?state=open].
Otherwise the standard query-time expansion concerns apply: increased delay in
query execution, configuration is in the request parameters instead of the
{{schema.xml}}, query becomes bloated and incomprehensible. Also potential
user confusion on the single "best practice" solution for synonyms in Solr,
since Solr already has a well-documented way of handling synonyms through the
[SynonymFilterFactory|http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory].
As of right now, I assume people will only use my solution if they try the
standard solution and are unsatisfied.
{quote}
4. Specifically what features of the Synonym Filter will be lost by using this
approach.
{quote}
As far as I know, none, because [I'm still using the
SynonymFilterFactory|https://github.com/healthonnet/hon-lucene-synonyms/blob/master/README.md#step-6]
and it's configurable by the user.
In general, I agree with you that some rapid iteration outside of the Solr core
would probably be a better approach than outright integration. Please consider
my "merge request" withdrawn; I'll let the code incubate for a bit, and then
look into integration later.
> Query-time multi-word synonym expansion
> ---------------------------------------
>
> Key: SOLR-4381
> URL: https://issues.apache.org/jira/browse/SOLR-4381
> Project: Solr
> Issue Type: Improvement
> Components: query parsers
> Reporter: Nolan Lawson
> Priority: Minor
> Labels: multi-word, queryparser, synonyms
> Fix For: 4.2, 5.0
>
> Attachments: SOLR-4381-2.patch, SOLR-4381.patch
>
>
> This is an issue that seems to come up perennially.
> The [Solr
> docs|http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory]
> caution that index-time synonym expansion should be preferred to query-time
> synonym expansion, due to the way multi-word synonyms are treated and how IDF
> values can be boosted artificially. But query-time expansion should have huge
> benefits, given that changes to the synonyms don't require re-indexing, the
> index size stays the same, and the IDF values for the documents don't get
> permanently altered.
> The proposed solution is to move the synonym expansion logic from the
> analysis chain (either query- or index-type) and into a new QueryParser. See
> the attached patch for an implementation.
> The core Lucene functionality is untouched. Instead, the EDismaxQParser is
> extended, and synonym expansion is done on-the-fly. Queries are parsed into
> a lattice (i.e. all possible synonym combinations), while individual
> components of the query are still handled by the EDismaxQParser itself.
> It's not an ideal solution by any stretch. But it's nice and self-contained,
> so it invites experimentation and improvement. And I think it fits in well
> with the merry band of misfit query parsers, like {{func}} and {{frange}}.
> More details about this solution can be found in [this blog
> post|http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/] and
> [the Github page for the
> code|https://github.com/healthonnet/hon-lucene-synonyms].
> At the risk of tooting my own horn, I also think this patch sufficiently
> fixes SOLR-3390 (highlighting problems with multi-word synonyms) and
> LUCENE-4499 (better support for multi-word synonyms).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]