Hi, I've hit a wall here.

 

In brief, users search a library of documents. Every indexed document has a
version number field which is always populated for release notes, sometimes
for other docs. Every document also has a category field which is how
release notes are identified, among other content types.

 

The requirement is to make sure that release notes are boosted relative to
other content, and that release notes with higher versions are boosted more
than those with lower versions.

 

I've currently implemented a crude method to achieve this, and the crucial
part of the process is here:

 

  // have IndexReader reader, IndexSearcher searcher, Analyzer analyzer,
String userQuery

  QueryParser parser = new QueryParser( "content", analyzer );

  parser.setDefaultOperator( QueryParserBase.AND_OPERATOR );

  BooleanQuery query = new BooleanQuery.Builder()

     .add( parser.parse( userQuery ), Occur.MUST )

     .add( new BoostQuery( parser.parse( "category:relnotes version:9*" ),
90.0f ), Occur.SHOULD )

     .add( new BoostQuery( parser.parse( "category:relnotes version:8*" ),
80.0f ), Occur.SHOULD )

     .add( new BoostQuery( parser.parse( "category:relnotes version:7*" ),
70.0f ), Occur.SHOULD )

     .add( new BoostQuery( parser.parse( "category:relnotes version:6*" ),
60.0f ), Occur.SHOULD )

     .add( new BoostQuery( parser.parse( "category:relnotes version:5*" ),
50.0f ), Occur.SHOULD )

     .add( new BoostQuery( parser.parse( "category:relnotes version:4*" ),
40.0f ), Occur.SHOULD )

     .add( new BoostQuery( parser.parse( "category:relnotes version:3*" ),
30.0f ), Occur.SHOULD )

     .add( new BoostQuery( parser.parse( "category:relnotes version:2*" ),
20.0f ), Occur.SHOULD )

     .add( new BoostQuery( parser.parse( "category:relnotes version:1*" ),
10.0f ), Occur.SHOULD )

     .build();

 

I found through experimentation that the boost factors are not
multiplicative (as most of the explanations on the web implied) but are
simply added to the score. If I've misunderstood how boosting works, please
enlighten me!

The versions and boost factors above are arbitrary just to keep the example
simple; in reality the versions cover a much wider range and the boost
values do too.

 

This is working to a degree. But it's not granular enough, I really want the
boost factor to be calculated directly from the version value, if that is
possible.

I also imagine doing it this way makes searches quite expensive.

 

How could I improve this?

 

cheers

T

 

 

Reply via email to