Thanks for the clarification Ted.

Thanks & Regards,
B Anil Kumar.


On Mon, Jul 14, 2014 at 3:47 AM, Ted Dunning <[email protected]> wrote:

> On Sun, Jul 13, 2014 at 7:19 AM, AnilKumar B <[email protected]>
> wrote:
>
> > Is it numerical vectorization only for performance optimization? or is
> > there any other reason.
> >
> > Does it make sense to apply clustering directly on actual records?
> >
>
> You can define distance measures on the original data, but you can also
> pretty much also define numerical vectorizations which allow those same
> distance measures to be calculated on the vectorized form.  Distance
> measures which have complex forms which are not computable in this way
> will, in many cases, defeat clustering algorithms since assumptions about
> the topological space implied by the distance function are often baked into
> these algorithms.
>
> A good example of this is the triangle inequality.  Using Elkan's
> optimization can improve clustering speed by as much as 10x in some cases,
> but if your distance doesn't satisfy this, then the optimization becomes
> incorrect.
>
> On the other hand, it is easy to guarantee that any distance that is
> computed by first vectorizing and then using a standard distance works
> correctly.
>

Reply via email to