On 10/01/2012 18:16, Chris Hostetter wrote:
: The book said that dismax query was similar but different to
:
: DisjunctionMaxQuery
the dismax *parser* in Solr is relatively simple, the majority of the
code in it relates to parsing config options, reporting debugging, etc...
if you wanted to do something similar in non-Solr java code my personal
suggestion would be to just borrow the key ponts of the impl in your own code.
: and additionally did Phrase Boosting which I didnt think DisjunctionMaxQuery
: did.
the crux of the issue is that the "dismax" parser is named after the fact
that it heavily uses DisjunctionMaxQuery, constructing one for each
"clause" of user input, but things like the phrase boosting and function
boosting it supports are just other queries it takes and adds to the top
level boolean query it builds. You can find a writeup i did on the
concept of the dismax parser at the link below...
https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/DisMaxQParser.java?view=markup
https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/util/SolrPluginUtils.java?view=markup
https://wiki.apache.org/solr/DisMax
http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/
-Hoss
Hi Chris
Thanks I now have something working but any comment on it would be more
then welcome
Some background this is what I used to do.
I took the query entered by user , escape any lucene special characters
then do a string replacement as follows, where {0} is the escaped
original query
"artist:\"{0}\"^1.6 " +
"(+sortname:\"{0}\"^1.6 -artist:\"{0}\") " +
"(+alias:\"{0}\" -artist:\"{0}\" -sortname:\"{0}\") " +
"(+($artist:({0})^0.8) -artist:\"{0}\" -sortname:\"{0}\"
-alias:\"{0}\") " +
"(+(sortname:({0})^0.8) -artist:({0}) -sortname:\"{0}\"
-alias:\"{0}\") " +
"(+(alias:({0})^0.4) -artist:({0}) -sortname:({0})
-alias:\"{0}\")";
which I then parsed using the standard QueryParser. What I tried to deal
was construct a query so that only one section of the six part query
could match, in retrospect
trying to replicate the way DisjunctionMaxQuery takes the maximum rather
than sum of each score. Also I preferred complete phrase match in one
field rather than matching individual terms matching different fields,
which I think is the same as the tie value in a disjunction.
Now with my new parser a user query of 'farming incident' would return
+((alias:farming^0.4 | sortname:farming^0.8 | artist:farming^1.6)~0.1
(alias:incident^0.4 | sortname:incident^0.8 | artist:incident^1.6)~0.1)
(alias:"farming incident"^0.4 | sortname:"farming incident"^0.8 |
artist:"farming incident"^1.6)~0.1
and a search for "farming" would return
(alias:farming^0.4 | sortname:farming^0.8 | artist:farming^1.6)~0.1
A couple of specific questions
1. Do I need a different boost for the phrase parts compared to the
individual term queries so that a phrase match scores higher or i s that
taken care of.
2. Does the order of the fields in the resultant query make any
difference, whatever order I add them to the map they are always output
as alias, sortname, artist)
3. Is tie 0.1 a good value, in the example above I want a match to
phrase "farming incident" in the artist field to score higher then any
other match, also I would want a match to alias to phrase "farming
incident" to do better than a match of just farming to artist and
incident to sortname fields.
Here is my DismaxQueryParser class
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.*;
import org.musicbrainz.search.LuceneVersion;
import java.util.HashMap;
import java.util.Map;
public class DismaxQueryParser {
public static String IMPOSSIBLE_FIELD_NAME = "\uFFFC\uFFFC\uFFFC";
private DisjunctionQueryParser dqp;
public DismaxQueryParser(org.apache.lucene.analysis.Analyzer
analyzer) {
dqp = new DisjunctionQueryParser(IMPOSSIBLE_FIELD_NAME, analyzer);
}
public Query parse(String query) throws
org.apache.lucene.queryParser.ParseException {
Query q0 =
dqp.parse(DismaxQueryParser.IMPOSSIBLE_FIELD_NAME+":("+query+")");
Query phrase =
dqp.parse(DismaxQueryParser.IMPOSSIBLE_FIELD_NAME+":(\""+query+"\")");
if (phrase instanceof DisjunctionMaxQuery) {
BooleanQuery bq = new BooleanQuery(true);
bq.add(q0, BooleanClause.Occur.MUST);
bq.add(phrase, BooleanClause.Occur.SHOULD);
System.out.println(bq);
return bq;
}
else {
System.out.println(q0);
return q0;
}
}
public void addAlias(String field, Alias alias) {
dqp.addAlias(field, alias);
}
static class DisjunctionQueryParser extends QueryParser {
public DisjunctionQueryParser(String defaultField,
org.apache.lucene.analysis.Analyzer analyzer) {
super(LuceneVersion.LUCENE_VERSION, defaultField,
analyzer);
}
protected Map<String, Alias> aliases = new HashMap<String,
Alias>(3);
//Field to Alias
public void addAlias(String field, Alias alias) {
aliases.put(field, alias);
}
protected Query getFieldQuery(String field, String queryText,
boolean quoted) {
//If field is an alias
if (aliases.containsKey(field)) {
Alias a = aliases.get(field);
DisjunctionMaxQuery q = new
DisjunctionMaxQuery(a.getTie());
boolean ok = false;
for (String f : a.getFields().keySet()) {
//if query can be created for this field and text
Query sub = getFieldQuery(f, queryText, quoted);
if (sub != null) {
//if query was quoted but doesnt generate a
phrase query we reject
if(quoted==false || sub instanceof PhraseQuery)
{
//If Field has a boost
if (a.getFields().get(f) != null) {
sub.setBoost(a.getFields().get(f));
}
q.add(sub);
ok = true;
}
}
}
//Something has been added to disjunction query
return ok ? q : null;
} else {
//usual Field
try {
return super.getFieldQuery(field, queryText, quoted);
} catch (Exception e) {
return null;
}
}
}
}
static class Alias {
public Alias()
{
}
private float tie;
//Field Boosts
private Map<String, Float> fields;
public float getTie() {
return tie;
}
public void setTie(float tie) {
this.tie = tie;
}
public Map<String, Float> getFields() {
return fields;
}
public void setFields(Map<String, Float> fields) {
this.fields = fields;
}
}
}
And this is how I call it:
Map<String, Float> fieldBoosts = new HashMap<String, Float>(3);
fieldBoosts.put(ArtistIndexField.ARTIST.getName(), 1.6f);
fieldBoosts.put(ArtistIndexField.SORTNAME.getName(), 0.8f);
fieldBoosts.put(ArtistIndexField.ALIAS.getName(), 0.4f);
alias = new DismaxQueryParser.Alias();
alias.setFields(fieldBoosts);
alias.setTie(0.1f);
query=QueryParser.escape(query);
DismaxQueryParser queryParser = new DismaxQueryParser(analyzer);
queryParser.addAlias(DismaxQueryParser.IMPOSSIBLE_FIELD_NAME,
alias);
Query q = queryParser.parse(query);
return q;
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org