I was trying to test whether the Document Boosts I calculate and add during indexing were being preserved correctly. I understand that what's actually preserved by default is Field Boost * Document Boost * lengthNorm I'm using default similarity and initially had no field boosts or document boosts so I would have expected my initially query of document boosts to be 1.0, but it ranged which likely means I'm not calculating it correctly. Here's the java I tried to use to calculate the document boost: IndexReader ir = IndexReader.open(indexDir); IndexSearcher searcher = new IndexSearcher(ir) ; byte[] norms = ir.norms("FullText"); //FullText is the name of the default field to be searched StandardAnalyzer sa = new StandardAnalyzer(); Similarity sim = searcher.getSimilarity(); TermEnum terms = ir.terms(); int numTerms = 0; while (terms.next()) { Term t = terms.term(); if (t.field().equals("FullText")) numTerms++; } double lengthNorm = 1.0 / Math.sqrt(numTerms); //since lengthNorm was defined as 1/sqrt(numTerms) by default
String key = "bush" // some term to be searched Query q2 = QueryParser.parse(key, "FullText", sa); Hits hits2 = searcher.search(q2) ; float f = sim.decodeNorm(norms[hits2.id(0)]); //ie get the norm for the first hit returned in the search System.out.println("Boost: " + f / lengthNorm); What am I missing in the calculation? I understand that there are precision limitation, but the results I'm getting vary and are mostly in the range 71.68 to 143.35 Is there a faster way than iterating through the terms to calculate lengthNorms?