* Veronica Andreo <[email protected]> [2018-10-31 00:23:57 +0100]:
Hi devs,
Hi Vero, (not a real dev, but I'll share what I think)
I'm writing to ask how do one determine the best number of classes/clusters in a set of unsupervised classifications with different k in GRASS?
You already know better than me I guess, but I'd like to refresh my mind on all this a bit. I guess the only way to tell if the number of classes is "best", is to judge yourself by inspecting the "quality" of the clusters returned. One way to tell would be to compute the "error of clusters" which would be the overall distance between the points that are assigned to a cluster and its center. I guess comparing the overall errors between different clustering settings (or even algorithms?), would give an idea about how close points are around the centers of clusters. Maybe we could implement something like this. (All this I practiced during an generic Algorithmic Thinking course. I guess it's applicable in our "domain" too.)
I use i.cluster with different number of classes and then i.maxlik that uses a modified version of k-means according to the manual page. Now, I would like to know which unsup classif is the best within the set.
Sorry, I guess I have to read up: what is "unsup classif"?
I check the i.cluster reports (looking for separability) and then explored the rejection maps. But none of those seems to work as a crisp and clear indicator. BTW, does anyone know which separability index does i.cluster use?
I am interested to learn about the distance measure too. I am looking at the source code of `i.cluster`. And then, searching around, I think it's this file: grasstrunk/lib/cluster/c_sep.c and I/we just need to identify which distance it measures. Nikos
In any case, I have seen some indices elsewhere (mainly R and Python) that are used to choose the best clustering results (coming from the same or different clustering methods). Examples of those indices are Silhouette, Dunn, etc. Some are called internal as they do not require test data and just characterize the compactness of clusters. On the other hand, the ones requiring test data are called external. I have seen them in dtwclust R package [0] (the package is oriented to time series clustering but validation indices are more general) and in scikit-learn in Python [1]. Does any of you have something already implemented in this direction? or how do you assess your unsup classification (clustering) results? Any ideas or suggestions within GRASS? Thanks much in advance! Vero [0] https://rdrr.io/cran/dtwclust/man/cvi.html [1] http://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation
_______________________________________________ grass-dev mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/grass-dev
-- Nikos Alexandris | Remote Sensing & GeomaticsGPG Key Fingerprint 6F9D4506F3CA28380974D31A9053534B693C4FB3
signature.asc
Description: PGP signature
_______________________________________________ grass-dev mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/grass-dev
