I've always wondered if it would be useful to try to fit the PageRank (heuristic?) into Lucene.


As an experiment I ran PageRank on 2 trees of Javadoc (the Lucene javadoc and the JDK1.4 javadoc) and product a report that shows the PageRank value for every page.

The Lucene javadoc report is here:

http://www.searchmorph.com/static/lucene-report.html

The weblog entry has a bit more details and links to the much larger jdk1.4 report:

http://searchmorph.com/weblog/index.php?id=29

And my feeling is that in the context of machine-generated pages, Page Rank doesn't help that much.

Also, it's not clear how to use it e.g. make it the Document boost or put it into a separate field for use by a custom scoring function? I think the Google scoring function is a secret.

And...I'm pretty sure it can't easily be used w/ incremental index additions as it wants an entire link graph.

Hope this isn't too far off topic, sorry if so, but thought it was relevant enough to mention...

- Dave


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to