Re: Document Clustering

petite_abeille Tue, 11 Nov 2003 08:46:03 -0800

On Nov 11, 2003, at 16:05, Marcel St�r wrote:

As everybody seems to be so exited about it, would someone please be so kind to explain what "document based clustering" is?

This mostly means finding document which are "similar" in some way(s). The "similitude" is mostly in the eyes of the beholder. In such a world, a "cluster" would be a pile of document sharing something. As far as Lucene goes, a straightforward way of approaching this could be to use an entire document content to query an index. Lucene's result set could be construed as a "document cluster". Admittedly, this is ground zero of "document clustering", but here you go anyway :)

Here is an illustration:

"Patterns in Unstructured Data"
Discovery, Aggregation, and Visualization
http://javelina.cet.middlebury.edu/lsa/out/cover_page.htm

Cheers,

PA.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Document Clustering

Reply via email to