Marcel Stor wrote:

Stefan Groschupf wrote:


Hi,


How is document clustering different/related to text categorization?


Clustering: try to find own categories and put documents that match
in it. You group all documents with minimal distance together.



Would I be correct to say that you have to define a "distance threshold"
parameter in order to define when to build a new category for a certain
group?


I'm not sure. There are different data mining algorithms that could be used. Depends on this algoritm. I prefer Support vector machines(SVM). There you calculate distances of multi demensional vectors in a multidemensional "room".
One vector represent one document.


Stefan


Reply via email to