For my own amusement I've indexed the Wikipedia and put up pages that:
- display search results
- cluster the results using Carrot2 (my first use of this)
- display similar pages using the entire text to re-query for similar docs and
- display similar pages using the "more like this" algorithm (TBD is get this into the sandbox, sorry for delays..)
You start off here to search:
http://www.searchmorph.com/kat/wikipedia.jsp
And the weblog entry goes into a bit more detail:
http://www.searchmorph.com/weblog/index.php?id=37
It's kinda fun to explore the Wikipedia by looking for pages similar to other ones.
Hope people find this useful...
- Dave
PS
I'm in the process of running the page rank algorithm (from jung.sf.net) on most of the entries in the Wikipedia. It has taken over 2 days so far....
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]