You might want to look at the Carrot2 project
(http://www.carrot2.org/website/xml/index.xml).
It does clustering and has support for Lucene.
Valerio Schiavoni wrote:
Hello,
not sure if the term 'cluster' is the correct one, but here what i would
like to do:
given I have a small set of categories; i manually defined some keywords for
each category.
ie:
-spielberg: ET, munich, indiana jones;
-sport: football, basket, volley, etc etc;
then, i have a quite large archive of documents (html, pdf, doc) (~5000,
still growing) and I want to 'assign' each document
to those categories, using Lucene possibly (if it can help!).
what approach could I adopt ?
thanks,
valerio
--
To Iterate is Human, to Recurse, Divine
James O. Coplien, Bell Labs
(how good is to be human indeed)
--
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]