On May 11, 2006, at 3:36 AM, Jérôme Charron wrote:

Actually, the clustering uses the summaries as input. I assumes it would provides some better results if it takes the whole documents content. no? I assumes that clustering uses the summaries instead of documents content
for some performances purpose.
But there is a (bad) side effect : since the size of the summaries is
configurable, the clustering "quality" will vary depending on the summaries size configuration. I really found this very confusing : when folks adjust this parameter it is only for front-end consideration (they want to display
a long or a short summary), but certainly not for clustering reasons.

What you and others thinks about this?

Bob Carpenter of alias-i had this to say when I brought up this very idea:

http://article.gmane.org/gmane.comp.jakarta.lucene.devel/12599

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

Reply via email to