I get your point.  Thanks you.

I am using Eucleadean Distance.

--shashi

On Thu, May 14, 2009 at 1:51 AM, Jeff Eastman
<[email protected]> wrote:
> I think the "optimum" value for these parameters is pretty subjective. You
> may find some estimation procedures that will give you values you like some
> times, but canopy will put every point into a cluster so the number of
> clusters is very sensitive to these values. I don't think normalizing your
> vectors will help, since you need to normalize all vectors in your corpus by
> the same amount. You might then find t1 and t2 values always on 0..1 but the
> number of clusters will still be sensitive to your choices on this range and
> you will be dealing with decimal values.
>
> It really depends upon how "similar" the documents in your corpus are and
> how fine a distinction you want to draw between documents before declaring
> them "different". What kind of distance measure are you using? A cosine
> distance measure will always give you distances on 0..1.
>
> Jeff
>

Reply via email to