Mike Tinnes wrote:
> I've been working on tying in a PageRank algo to
> my web crawler using lucene and have a few problems. If I don't know the
> boost factor until AFTER the crawl is it possible to still set the boost?

Why not: (1) crawl, saving pages to disk; (2) analyze links and compute 
boosts; then, finally, (3) build the Lucene index?

The API does not currently let you change a field's boost after a 
document is indexed.  It is in theory possible, but would require 
overwriting .fXX files, which further complicates inter-process 
synchronization of index access.  Perhaps this can be added as a caveat 
emptor API, but, in the meantime, I suggest the above approach.

> Also what does setBoost() actually do to the rank?

The rank is the position of a document in a hit list: the first hit has 
rank one, and so on.  Hits are sorted by score.  The boost is multiplied 
into score of hits.  So a boost which is greater than 1.0 will tend to 
increase the rank of hits on that field, while a boost which is less 
than 1.0 will tend to decrease the rank of hits on that field.

Doug


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to