I think I'll try to find a place for your lucene_ext code somewhere in Lucene Sandbox, what do you think?
Otis --- Che Dong <[EMAIL PROTECTED]> wrote: > How about add sortType in IndexSearcher first? > User can speciefy IndexSearcher.sortType(by score:default, by docID, > by docID desc) before indexing. > > Che, Dong > > diff IndexSearcher.java > ~/lucene-1.2-src/src/java/org/apache/lucene/search/IndexSearcher.java > > 66,81c66 > < /** > < * Implements search over a single IndexReader. > < * > < * user can customize search result sort behavior via > <code>sortType</code>: > < * if data source sorted by some field before indexing docID can be > take > < * as the alias to the sort field, so > < * search result sort by docID(or desc) equals to sort by field > < * > < * search results sort method: > < * 0: sort by score (default) > < * 1: sort by docID > < * -1: sort by docID desc > < * > < * @author Che, Dong <[EMAIL PROTECTED]> > < * $Header: > /home/cvsroot/lucene_ext/src/org/apache/lucene/search/IndexSearcher.java,v > 1.1.1.1 2002/09/22 19:36:08 chedong Exp $ > < */ > --- > > /** Implements search over a single IndexReader. */ > 83,89d67 > < /** > < > < */ > < public static final int ORDER_BY_SCORE = 0; > < public static final int ORDER_BY_DOCID = 1; > < public static final int ORDER_BY_DOCID_DESC = -1; > < public int sortType = ORDER_BY_SCORE; > 96c74 > < > --- > > > 101c79 > < > --- > > > 106c84 > < > --- > > > 134,162c112,127 > < final int md = reader.maxDoc(); > < > < scorer.score(new HitCollector() > < { > < private float minScore = 0.0f; > < public final void collect(int doc, float score) { > < if (score > 0.0f && // ignore > zeroed buckets > < (bits==null || bits.get(doc))) { // skip > docs not in bits > < totalHits[0]++; > < if (score >= minScore) { > < // update hit queue > < switch (sortType) { > < case ORDER_BY_SCORE: //sort results by > score > < hq.put(new ScoreDoc(doc, score)); > < case ORDER_BY_DOCID: //sort results by > docID > < hq.put(new ScoreDoc(doc, doc)); > < case ORDER_BY_DOCID_DESC: //sort results > by docID desc > < hq.put(new ScoreDoc(doc, (md - doc) ) > ); > < default: //sort results by > score(default) > < hq.put(new ScoreDoc(doc, score)); > < } > < if (hq.size() > nDocs) { // if hit > queue overfull > < hq.pop(); // > remove lowest in hit queue > < minScore = > ((ScoreDoc)hq.top()).score; // reset minScore > < } > < } > < } > < } > < }, md); > --- > > scorer.score(new HitCollector() { > > private float minScore = 0.0f; > > public final void collect(int doc, float score) { > > if (score > 0.0f && // ignore zeroed > buckets > > (bits==null || bits.get(doc))) { // skip docs not in > bits > > totalHits[0]++; > > if (score >= minScore) { > > hq.put(new ScoreDoc(doc, score)); // update hit queue > > if (hq.size() > nDocs) { // if hit queue > overfull > > hq.pop(); // remove lowest in > hit queue > > minScore = ((ScoreDoc)hq.top()).score; // reset > minScore > > } > > } > > } > > } > > }, reader.maxDoc()); > 167c132 > < > --- > > > > > ----- Original Message ----- > From: "Doug Cutting" <[EMAIL PROTECTED]> > To: "Lucene Developers List" <[EMAIL PROTECTED]> > Sent: Thursday, October 17, 2002 5:21 AM > Subject: Re: Question: using boost for sorting > > > > Please submit diffs before committing anything, as this is delicate > > > code. Small changes here can affect performance in a big way. > > > > Also, we must be extra-careful when making a new public API: once a > > > method is public it's very hard to remove it. The Similarity > methods > > also need to be well documented. > > > > Doug > > > > Otis Gospodnetic wrote: > > > This sounds good to me, as it would lead us to pluggable > similarity > > > computation...mmmm. > > > I can refactor some of this tonight. > > > > > > Otis > > > > > > > > > --- Doug Cutting <[EMAIL PROTECTED]> wrote: > > > > > >>This looks like a good approach. When I get a chance, I'd like > to > > >>make > > >>Similarity an interface or an abstract class, whose default > > >>implementation would do what the current class does, but whose > > >>methods > > >>can be overridden. Then I'd add methods like: > > >> > > >> public static void Similarity.setDefaultSimilarity(Similarity > > >>sim); > > >> public void IndexWriter.setSimilarity(Similarity sim); > > >> public void Searcher.setSimilarity(Similarity sim); > > >> > > >>So to override Similarity methods you'd define a subclass of the > > >>standard implementation, then either install yours globally via > > >>setDefaultSimilarity, or set it in your IndexWriter before adding > > > >>documents and in your Searcher before searching. Does that sound > > > >>reasonable? > > >> > > >>This would let you do what you describe below without changing > > >>Lucene's > > >>sources. However I'm very short on time right now and don't know > how > > >> > > >>soon I'll get to this. > > >> > > >>Doug > > >> > > >>David Birtwell wrote: > > >> > > >>>Hi Dmitry, > > >>> > > >>>I was faced with a similar problem. We wanted to have a numeric > > >> > > >>rank > > >> > > >>>field in each document influence the order in which the > documents > > >> > > >>were > > >> > > >>>returned by lucene. While investigating a solution for this, I > > >> > > >>wanted > > >> > > >>>to see if I could implement strict sorting based on this numeric > > >> > > >>value. > > >> > > >>>I was able to accomplish this using document boosting, but not > > >> > > >>without > > >> > > >>>modifying the lucene source. Our "ranking" field is an integer > > >> > > >>value > > >> > > >>>from one to one hundred. I'm not sure if this will help you, > but > > >> > > >>I'll > > >> > > >>>include a summary of what I did. > > >>> > > >>>In DocumentWriter remove the normalization by field length: > > >>> float norm = fieldBoosts[n] * > > >>>Similarity.normalizeLength(fieldLengths[n]); > > >>>to > > >>> float norm = fieldBoosts[n]; > > >>> > > >>>In TermScorer and PhraseScorer, modify the score() method to > ignore > > >> > > >>the > > >> > > >>>lucene base score: > > >>> score *= Similarity.decodeNorm(norms[d]); > > >>>to > > >>> score = Similarity.decodeNorm(norms[d]); > > >>> > > >>>In Similarity.java, make byteToFloat() public. > > >>> > > >>>At index time, use Similarity.byteToFloat() to determine your > boost > > >> > > >>>value as in the following pseudocode: > > >>> Document d = new Document(); > > >>> ... add your fields ... > > >>> int rank = d.getField("RANK"); (range of rank can be 0 to > 255) > > >>> float sortVal = Similarity.byteToFloat(rank) > > >>> d.setBoost(sortVal) > > >>> > > >>>If you'd like the reasoning behind any or all of these items, > let > > >> > > >>me know. > > >> > > >>>DaveB > > >>> > > >>> > > >>> > > >>>Dmitry Serebrennikov wrote: > > >>> > > >>> > > >>>>Greetings Everyone, > > >>>> > > >>>>I'm thinking of trying to build something that manipulates a > query > > >>> > > >>>>score in order to achieve a sort order other then the default > > >>>>relevance sort. The idea is to create a new type of query: > > >>>>SortingQuery( Query query, String sortByField ) > > >>>> > > >>>>It would run the sub-query and return results in an order of > the > > >>>>values found in the "sortByField" for those documents. Now, > I've > > >>>>looked at all of the sorting discussion prior to this, and the > > >>> > > >>best > > >> > > >>>>approach (recommended by Doug among others) is to provide some > > >>> > > >>sort of > > >> > > >>>>a fast access to the field values inside the HitCollector. > Reading > > >>> > > >>>>documents at search time is too slow, so people access the data > > > >>>>elsewhere or build an in-memory index of that data (such as is > > >>> > > >>done in > > >> > > >>>>the SearchBean's SortField). > > >>>> > > >>>>My idea is different. I want to try to do the following: > > >>>>- compose a query that consists of the original sub-query > followed > > >>> > > >>by > > >> > > >>>>a special "sorting query" > > >>>>- "boost" the score of the original sub-query to 0 > > >>>>- compute the score of the sorting query such that it would > > >>> > > >>reflect > > >> > > >>>>the desired sort order > > >>>> > > >>>>Has anyone tried to do something like this? > > >>>>Would this work? > > >>>>Is this worth doing? > > >>>>If it would, would then I have to do something during the > indexing > > >>> > > >>>>time to set normalization / scoring factors for that field to > > >>>>something or other? > > >>>> > > >>>>Thanks. > > >>>>Dmitry. > > >>>> > > >>>> > > >>>> > > >>>>-- > > >>>>To unsubscribe, e-mail: > > >>>><mailto:[EMAIL PROTECTED]> > > >>>>For additional commands, e-mail: > > >>>><mailto:[EMAIL PROTECTED]> > > >>>> > > >>>> > > >>> > > >>> > > >>>-- > > >>>To unsubscribe, e-mail: > > >>><mailto:[EMAIL PROTECTED]> > > >>>For additional commands, e-mail: > > >>><mailto:[EMAIL PROTECTED]> > > >>> > > >> > > >> > > >>-- > > >>To unsubscribe, e-mail: > > >><mailto:[EMAIL PROTECTED]> > > >>For additional commands, e-mail: > > >><mailto:[EMAIL PROTECTED]> > > >> > > > > > > > > > __________________________________________________ > > > Do you Yahoo!? > > > Faith Hill - Exclusive Performances, Videos & More > > > http://faith.yahoo.com > > > > > > -- > > > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > > > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > > > > > > > > > > > -- > > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > > > ATTACHMENT part 2 application/octet-stream name=IndexSearcher.java > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>