Hi Nikos and Moritz, Thanks for your replies and for the recipe :)
Cheers, Vero El mié., 31 oct. 2018 a las 22:09, Moritz Lennert (< [email protected]>) escribió: > On 31/10/18 12:19, Nikos Alexandris wrote: > > * Veronica Andreo <[email protected]> [2018-10-31 00:23:57 +0100]: > > > >> Hi devs, > >> > > > > Hi Vero, > > > > (not a real dev, but I'll share what I think) > > > >> I'm writing to ask how do one determine the best number of > classes/clusters > >> in a set of unsupervised classifications with different k in GRASS? > > > > You already know better than me I guess, but I'd like to refresh my mind > > on all this a bit. > > > > I guess the only way to tell if the number of classes is "best", is to > > judge yourself by inspecting the "quality" of the clusters returned. > > > > One way to tell would be to compute the "error of clusters" which would > > be the overall distance between the points that are assigned to a > > cluster and its center. I guess comparing the overall errors between > > different clustering settings (or even algorithms?), would give an idea > > about how close points are around the centers of clusters. > > Maybe we could implement something like this. > > > > (All this I practiced during an generic Algorithmic Thinking course. I > > guess it's applicable in our "domain" too.) > > > > > >> I use i.cluster with different number of classes and then i.maxlik that > uses a > >> modified version of k-means according to the manual page. Now, I would > like > >> to know which unsup classif is the best within the set. > > > > Sorry, I guess I have to read up: what is "unsup classif"? > > > >> I check the i.cluster reports (looking for separability) and then > explored the > >> rejection maps. But none of those seems to work as a crisp and clear > >> indicator. BTW, does anyone know which separability index does i.cluster > >> use? > > > > > > I am interested to learn about the distance measure too. I am looking > > at the source code of `i.cluster`. And then, searching around, I think > > it's this file: > > > > grasstrunk/lib/cluster/c_sep.c > > > > and I/we just need to identify which distance it measures. > > i.cluster uses a simple k-means approach based on the spectral euclidean > distance between pixels or between pixels and existing clusters. By > including a min cluster size and a min cluster separation parameter, > total number of clusters might change which is different from a > classical k-means. > > i.cluster also works on a sample of the image pixels to define the > clusters, so there is no guarantee that the clusters it identifies would > be those one would find if using all pixels, but AFAIK it is generally > reasonable close to justify the pay-off as it provides greater speed. > > i.maxlik does not interfere in the clustering part. It uses the > signatures of classes provided as input (possibly the signatures of the > clusters if the input is the output of i.cluster) to then assign each > pixel to one of the classes. The reject map of i.maxlik allows you see > the probability of a pixels membership in the chosen class. It does not > really allow you to measure cluster "quality", nor ideal number of > clusters (well you could try with many different cluster numbers and > then chose the one where the reject map values are the lowest on average). > > If you want to use a very simple approach to Nikos' suggestion of > calculating the error, you could use something like this: > > - For each i.cluster + i.maxlik result: > - For each original band > - Create new pseudo band with mean values of the > original band per cluster (r.stats.zonal) > - Calculate euclidean distance in spectral space of each pixel > to its cluster (r.mapcalc): > > (pixel_value_band1 - r.stats.zonal result on band 1)^2 + > (pixel_value_band2 - r.stats.zonal result on band 2)^2 + > etc > > - Calculate mean euclidean distance on the result of (or median, > or whatever you are looking for) (r.univar) > > - Identify the i.cluster + i.maxlik result that reaches the best score > > > Moritz > >
_______________________________________________ grass-dev mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/grass-dev
