Re: [GRASS-dev] RandomForest classifier for imagery groups add-on

Paulo van Breugel Sun, 27 Mar 2016 11:35:11 -0700


On 27-03-16 16:58, Steven Pawley wrote:

Hello Paulo,
Many thanks for this. I updated the mode last night to include theability to force regression mode, as well as including some more errorchecking for valid combinations of input parameters. Classificationmode also checks that the input labelled pixels are CELL type. I'm notoutputting all of the appropriate uncertainty measures like RSQ yetfor regression mode yet, but I'll add those in.

Great, I'll check it out.

That is interesting that you had better performance when usingregression. I will have to check that for my application using scikitlearn. In R using the randomforest package, the results were prettymuch identical but my classes were balanced already, which I think isone factor that can lead to significant differences between binaryclassification probabilities vs regression.

It was a study by somebody else, I can't remember which one right now,but it will come back to me. But yes, the fact that for speciesdistribution modeling the sampling is often highly unbalanced (withlarge number of pseudo-absence) is likely to play a role.

Yes definitely will use this as a template to include other methods. IOnly recently switched my work from R to Python but am just submittinga paper based on R which uses a range of classifiers likerandomforest, GLM, GAM, and MARS which it was useful to evaluate thedifferences.

It sometimes seems there are almost as many different conclusions aboutthe best method as there are publications (OK, I might exaggerate a bithere), so comparing difference models is very useful. So very glad youare doing this (as I said, I have looked at scipy before and how itcould be implemented in GRASS, but my Python skills are just not up to it).


Steve

_____________________________

From: Paulo van Breugel <[email protected]<mailto:[email protected]>>

Sent: Sunday, March 27, 2016 3:11 AM
Subject: Re: [GRASS-dev] RandomForest classifier for imagery groups add-on

To: Vaclav Petras <[email protected]<mailto:[email protected]>>, Steven Pawley<[email protected] <mailto:[email protected]>>

Cc: <[email protected] <mailto:[email protected]>>


Hi Steve

Yes, your user case will not differ methodologically from speciesmodeling based on presence/absence. One reason I was asking for theregression randomForest is that in one article (can't remember thetitle, will look it up) it was found that the regression approachyielded better results, even though the response variable is binary.One your help page, you write that r.randomforest performs randomforest classification and regression, and the regression mode can beused by setting the mode to the regression option. But I am not seeingthat option?

Great you are planning other methods as well. Giving modeluncertainties (quite an issue in species distribution modeling),having multiple methods is really a plus, especially as it allows oneto build consensus models [1] and combine them to create uncertaintymaps.


Cheers,

Paulo

[1]Marmion, M., Parviainen, M., Luoto, M., Heikkinen, R.K., &Thuiller, W. 2009. Evaluation of consensus methods in predictivespecies distribution modelling. /Diversity and Distributions/ 15: 59–69.



On 27-03-16 00:47, Steven Pawley wrote:

    Hi Vaclaw and Paulo,

    Thanks for those pointers re. lazy technique and documentation. I
    have a RandomForest diagram to explain the process, as well as
    some examples, so I'll update documentation next week.

    Paulo thanks for running a few tests. It looks there is an error
    with the class_weight parameter, I'll check into that.

    In terms of species distribution modelling, I have been using the
    tool for landslide susceptibility modelling, which I believe is
    methodologically similar to SDM in terms of having a binary
    response variable. I have been doing this for the area of Alberta,
    using an 8000 x 14000 pixel and 17 band stack of predictors. In
    the case of a binary response variable, the usual approach is to
    run random forest in classification mode, i.e. with fully grown
    trees, but use the class probabilities to represent the 'species'
    or 'landslide' index.

    I am planning to implement other methods in the scikit learn
    package, which represents a trivial change to the module once he
    bugs are ironed out. I will probably look to create modules for
    SVM and logistic regression, and maybe  nearest neighbours
    classification. Certainly open to any suggestions.

    Steve
    _____________________________
    From: Vaclav Petras < [email protected]
    <mailto:[email protected]>>
    Sent: Saturday, March 26, 2016 11:21 AM
    Subject: Re: [GRASS-dev] RandomForest classifier for imagery
    groups add-on
    To: Steven Pawley < [email protected]
    <mailto:[email protected]>>
    Cc: < [email protected] <mailto:[email protected]>>



    On Sat, Mar 26, 2016 at 12:40 PM, Steven Pawley
    <[email protected]> wrote:

        I would like to draw your attention to a new GRASS add-on,
        r.randomforest, which uses the scikit-learn and pandas Python

packages to classify GRASS rasters.


    Thanks, this looks good. Please consider adding an image to the
    documentation to better promote the module [1] and also an example
    which would work with the NC SPM dataset [2]. For the addon to
    generate documentation on the server and work well at few other
    special occasions, it is advantageous to employ lazy import
    technique for the non-standard dependencies, see for example
    v.class.ml <http://v.class.ml> and v.class.mlpy [3].

    Vaclav

    [1] https://trac.osgeo.org/grass/wiki/Submitting/Docs#Images
    [2] https://grass.osgeo.org/download/sample-data/
    [3] https://trac.osgeo.org/grass/changeset/66482/

_______________________________________________
grass-dev mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] RandomForest classifier for imagery groups add-on

Reply via email to