David Spencer wrote:
Code rewritten, automagically chooses lots of defaults, lets you override
the defs thru the static vars at the bottom or the non-static vars also at the bottom.
Has anyone used this? Was it useful?
I've put it up on my "demo" site (rfc::search) in which I have a humble index of approx 3500 RFCs.
This is the site:
http://www.hostmon.com/rfc/index.jsp
A typical search takes you here:
http://www.hostmon.com/rfc/search.jsp?s=LDAP+Security&x=33&y=9
Then clicking on a match takes you to a link to view an RFC like this where things start to get interesting.
http://www.hostmon.com/rfc/get.jsp?id=1823&s=LDAP%20Security
There are 3 links of interest now at the top/middle of the page in the brownish background.
[a] "show similar" - forms a query from *all* words in the doc - no heuristics wrt idf(), etc.
[b] "more like this" - uses the MoreLikeThis code I wrote with the default settings.
[c] "interesting words" - uses code from MoreLikeThis to give a table of all interesting
words in the current "source" doc ordered by score.
Remember score is idf*tf as per Dougs mail (and as per my
hopefully correct understanding of these things). This page is of course more of a debugging
tool that something a normal user would see. One possible area of improvement that jumped out at me after reviewing this table is using stemming, say, allowing more words in the generated query when 2 words have the same stem.
Note - [a] uses no code from [b] and [c]. It is just there for comparision.
Should we add it to the sandbox?
I'd appreciate if someone could proofread MoreLikeThis.like(Reader) and mlt(Reader).
At a glance it seems to return reasonable results on my site.
-- Dave
Doug
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]