[ https://issues.apache.org/jira/browse/SOLR-12243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456929#comment-16456929 ]
Elizabeth Haubert commented on SOLR-12243: ------------------------------------------ The fix I pushed up really only handles the case where you're starting with the single-word synonym well. So matching "foo bar" queries to "foo tropical cyclone" documents. This was a real problem for my use case, because the pf clauses weren't being generated at all. The other direction, to match "foo tropical cyclone" queries to "foo bar" documents is harder. I've gone a little ways into the pf2 "b tropical" problem, but it is a deeper problem than the spans getting thrown out because they were the wrong type of query. Start small. Here's what I've got for the other direction: One of first thing edismax does is generate a list of different kinds of clauses off the user query, and that seems to be unaffected by the sow flag. So "foo tropical cyclone" has three bareword clauses: "foo", "tropical", and "cyclone". But 'foo "tropical cyclone"' (with quotes) has two: a bareword foo and a phrase "tropical cyclone". When it goes to construct pf2 and pf3, edismax re-assembles the bareword clauses, then makes the 2- and 3- word shingles. So "foo tropical cyclone" would get pf2="foo tropical" and "tropical cyclone", pf2="foo tropical" can't get expanded, because it is missing cyclone, and will go through such as it is; "tropical cyclone" will get expanded, but then removed as not a phrase, not just because it is a Span. That seems consistent if we think of "tropical cyclone" as a single entity. So to do anything, we need to address how the shingle queries are being constructed. I opened Jira-12260 to start looping in the phrases to pf clauses, not just the barewords, because that has some other weird semantics. So 'foo "tropical cyclone" baz' (with quotes) would generate pf="foo baz", which is unintuitive - it would make more sense if it became "foo "tropical cyclone"" and "tropical cyclone" baz. Beyond looking a little into whether the graph queries could handle the phrase, I haven't really dug how to do that yet. That matters here, because if that works and the semantics are acceptable, multi-word synoynms are already handled as quoted in the logic that creates the graph queries. So it would (probably) be safe to take that another step to stuff the multiword synonyms into a single phrase clause, rather than individual bareword clauses. Maybe. > Edismax missing phrase queries when phrases contain multiterm synonyms > ---------------------------------------------------------------------- > > Key: SOLR-12243 > URL: https://issues.apache.org/jira/browse/SOLR-12243 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers > Affects Versions: 7.1 > Environment: RHEL, MacOS X > Do not believe this is environment-specific. > Reporter: Elizabeth Haubert > Priority: Major > Attachments: SOLR-12243.patch > > > synonyms.txt: > allergic, hypersensitive > aspirin, acetylsalicylic acid > dog, canine, canis familiris, k 9 > rat, rattus > request handler: > <requestHandler name="/test_qparse_error" class="solr.SearchHandler"> > <lst name="defaults"> > <!-- Query settings --> > <str name="defType">edismax</str> > <str name="tie"> 0.4</str> > <str name="qf">title^100</str> > <str name="pf">title~20^5000</str> > <str name="pf2">title~11</str> > <str name="pf3">title~22^1000</str> > <str name="df">text</str> > <!-- mm If two or fewer clauses exist, they all must match. > If three to five clauses exist, one can be missing. If six to eight clauses > exist, all but three must match. > If more than nine clauses exist, only require 30% to match.--> > <str name="mm">3<-1 6<-3 9<30%</str> > <str name="q.alt">*:*</str> > <str name="rows">25</str> > </lst> > </requestHandler> > Phrase queries (pf, pf2, pf3) containing "dog" or "aspirin" against the > above list will not be generated. > "allergic reaction dog" will generate pf2: "allergic reaction", but not > pf:"allergic reaction dog", pf2: "reaction dog", or pf3: "allergic reaction > dog" > "aspirin dose in rats" will generate pf3: "dose ? rats" but not pf2: "aspirin > dose" or pf3:"aspirin dose ?" > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org