Re: [mlpack] A better way to do DBSCAN on a dataset with twi different units of measure?

Ryan Curtin Tue, 17 Jul 2018 01:26:06 -0700

On Tue, Jul 17, 2018 at 04:07:15AM +0000, Yew Khong See wrote:
> Hi all,
> I am using DBSCAN to cluster a dataset consisting of an individual's
> weight (in kg) and height (in cm). 
> What I am doing now is to cluster the weights first and then do
> another clustering on the heights from each weight cluster. 
> This method is not efficient and will not scale with larger datasets.
> 
> Is there a better way to perform clustering one time on both the
> weights and heights, but with different epsilon and minpoints?


Hi there,

Can you clarify what you mean by 'different minpoints'?  I can picture
what you mean when you say 'different epsilon'---I think that you mean
that you want a different epsilon value for weight and height, and that
you want to cluster simultaneously using both weight and height values.

In this case you could just normalize your data accordingly: if, e.g.,
you want epsilon 1 for weight and 2 for height, simply divide all the
height values by 2, and then use epsilon = 1.

Hope this helps; let me know if I can clarify further.

Thanks,

Ryan

-- 
Ryan Curtin    | "Indeed!"
[email protected] |   - David Lo Pan
_______________________________________________
mlpack mailing list
[email protected]
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

Re: [mlpack] A better way to do DBSCAN on a dataset with twi different units of measure?

Reply via email to