On Nov 11, 2003, at 16:05, Marcel Stör wrote:


As everybody seems to be so exited about it, would someone please be so kind to explain
what "document based clustering" is?

This mostly means finding document which are "similar" in some way(s). The "similitude" is mostly in the eyes of the beholder. In such a world, a "cluster" would be a pile of document sharing something. As far as Lucene goes, a straightforward way of approaching this could be to use an entire document content to query an index. Lucene's result set could be construed as a "document cluster". Admittedly, this is ground zero of "document clustering", but here you go anyway :)


Here is an illustration:

"Patterns in Unstructured Data"
Discovery, Aggregation, and Visualization
http://javelina.cet.middlebury.edu/lsa/out/cover_page.htm

Cheers,

PA.


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to