Doug Cutting wrote:

David Spencer wrote:

Code rewritten, automagically chooses lots of defaults, lets you override
the defs thru the static vars at the bottom or the non-static vars also at the bottom.


Has anyone used this? Was it useful?

I've put it up on my "demo" site (rfc::search) in which I have a humble index of approx 3500 RFCs.


This is the site:

http://www.hostmon.com/rfc/index.jsp

A typical search takes you here:

http://www.hostmon.com/rfc/search.jsp?s=LDAP+Security&x=33&y=9



Then clicking on a match takes you to a link to view an RFC like this where things start to get interesting.

http://www.hostmon.com/rfc/get.jsp?id=1823&s=LDAP%20Security

There are 3 links of interest now at the top/middle of the page in the brownish background.

[a] "show similar" - forms a query from *all* words in the doc - no heuristics wrt idf(), etc.

[b] "more like this" - uses the MoreLikeThis code I wrote with the default settings.

[c] "interesting words" - uses code from MoreLikeThis to give a table of all interesting
words in the current "source" doc ordered by score.
Remember score is idf*tf as per Dougs mail (and as per my
hopefully correct understanding of these things). This page is of course more of a debugging
tool that something a normal user would see. One possible area of improvement that jumped out at me after reviewing this table is using stemming, say, allowing more words in the generated query when 2 words have the same stem.


Note - [a] uses no code from [b] and [c]. It is just there for comparision.

Should we add it to the sandbox?

I'd appreciate if someone could proofread MoreLikeThis.like(Reader) and mlt(Reader).


At a glance it seems to return reasonable results on my site.

-- Dave


Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to