Hi ChenLiang!

This is great starting point.

I am taking some time reviewing and also getting some help from geospatial
experts who are my co-workers.  In just a couple more days I will have more
concrete feedback on this.

Talk to u all again soon.

Cheers,
Ivan

On Tue, Jan 5, 2016 at 4:06 PM, WangChenLiang <[email protected]> wrote:

> Hi MADlib Developers,
>     To follow Ivan and Frank's suggestion, I am trying to propose the
> description and interface of Geographically weighted regression (GWR).
> PostGIS functions will be invoked to compute distance in some CRS and
> extract rectangle coordinates of study area. If MADlib doesn't have access
> to PostGIS routines, we can only implement some simple GIS utils with our
> own code .
>      GWR models a local relationship of a numerical dependent variable to
> one or more explanatory independent variables to build a model of spatially
> varying relationships. It has been widely used for understanding the
> spatial pattern of natural or social phenomena .
>      GWR constructs local equations
> seperately for each location in the table incorporating the dependent
> and independent variables falling within the bandwidth of each target
> geometry. The shape and
> extent of the bandwidth is dependent on the spatial kernel type( guass,
> exp and bisquare), distance in fixed methods ( or number of neighbors
> parameters in adpative methods ). Therefore,  the computational burden of
> GWR increases with prediction locations. Parallelized GWR is necessary in
> high-performance environment such as GPDB.
>     There are two important hints about GWR. Firstly, GWR can estimate
> coefficients in any locations but can only provide diagnostic information
> in observation locations. In addition, according to P ez et al.(2011), the
> basic GWR is not an appropriate method for small sample sizes (<160). Many
> advanced geographically-weighted methods are proposed in some papers (see
> Wheeler DC 2009, Brunsdon C et al. 2012,Gollini I et al. 2015) which are
> planned to implement in the future.        The description about interface
> and function for GWR is also provided . Coefficients columns in output are
> seperated for easily mapping result in GIS. Can you  kindly  take a look
> and give me advice or feedback to improve it ?  Many Thanks!
> Best,ChenLiang Wang
>
> --------------------------------------------------------------------------------------------------------------------------------------
> The description about Geographically Weighted Regression (Spatial
> Statistics->Regression Models)
> Training Function of geographically weighted regression training function
> has the following syntax.
> gwregr_train(source_table,
>         out_table,
>         dependent_varname,
>         independent_varname,
>         kernel_params,
>         adaptive_option,
>         ftest_option,
>         regression_location,
>         prediction_location,
>         grouping_cols,
>         verbose
>     )
>
> -----------------------------------------------------------------------------------------------------------------------------------
> Arguments
> source_table
>     TEXT. The name of the table containing the training data.
> out_table
>     TEXT. Name of the generated table containing the output model.
>
>     The output table contains the following columns.
>     <...>     Any grouping columns provided during training. Present only
> if the grouping option is used.
>     coef_<independent_varname1>, coef_<independent_varname2> ...
>  FLOAT8[].  Any columns corresponding to independent_varname of the vector
> of coefficients of the regression in each location.
>     r2     FLOAT8. R-squared coefficient of determination of the model.
>     adjr2    FLOAT8. Adjusted-R-squared coefficient of determination of
> the model.
>    local_cond_no     FLOAT8[]. The local condition number of GWR in each
> location  (see Wheeler D2007)  indicates when results are unstable due to
> local multicollinearity (above 30).
>    F1_stats     FLOAT8[]. The F-test array{F-statistic,Numerator
> DF,Denominator DF,p_value} for comparing Ordinary Linear Regression(OLR)
> and GWR models (see Leung et al. 2000)
>    F2_stats     FLOAT8[]. The F-test
> array{F-statistic,Numerator DF,Denominator DF,p_value} for comparing
> Ordinary Linear Regression(OLR) and GWR models (see Leung et al. 2000)
>    F3_stats     FLOAT8[]. The spatial stationary test statistic  for GWR
> coefficients (see Leung et al. 2000)
>    F3_ndf       FLOAT8[]. The spatial stationary test Numerator DF for GWR
> coefficients
> (see Leung et al. 2000)
>    F3_ddf     FLOAT8[]. The spatial stationary test Denominator DF for GWR
> coefficients
> (see Leung et al. 2000)
>    F3_pv     FLOAT8[]. The spatial stationary test p_value for GWR
> coefficients
> (see Leung et al. 2000)
>    F4_stats     FLOAT8[]. The F-test
> array{F-statistic,Numerator DF,Denominator DF,p_value} for comparing
> Ordinary Linear Regression(OLR) and GWR models (see GWR book p92)
>     num_missing_rows_skipped     INTEGER. The number of rows that have
> NULL values in the dependent and independent variables, and were skipped in
> the computation for each group.
>
>     A summary table named <out_table>_summary is created together with the
> output table. It has the following columns:
>     source_table     The data source table name
>     out_table     The output table name
>     dependent_varname     The dependent variable
>     independent_varname     The independent variables
>     num_rows_processed     The total number of rows that were used in the
> computation.
>     num_missing_rows_skipped     The total number of rows that were
> skipped because of NULL values in them.
>     kernel_function    The spatial kernel function
>     bandwidth    The bandwidth parameter
>     adaptive_option    The Boolean variable indicates whether to perform a
> adaptive kernel function.
> dependent_varname
>     TEXT. Expression to evaluate for the dependent variable.
> independent_varname
>     TEXT. Expression list to evaluate for the independent variables. An
> intercept variable is not assumed. It is common to provide an explicit
> intercept term by including a single constant 1 term in the independent
> variable list.
> kernel_params(optional)
>     TEXT,default: 'kernel=guass,bw=CV', Parameters for kernel function.
>     The kernel parameter is the name of the kernel function to use
>     ‘gauss’: wgt = exp(-.5*(vdist/bw)^2);
>     ‘exp’: wgt = exp(-vdist/bw);
>     ‘bisquare’: wgt = (1-(vdist/bw)^2)^2 if vdist < bw, wgt=0 otherwise;
>     Where,wgt indicates weight ,vdist indicates vector of distance, and bw
> indicates bandwidth.
>     We can select either CV or AICc when you aren't sure what to use for
> the Distance or Number of neighbors parameter.We can also specify a
> numerical value for bw.If bw is large enough(above 1e7,for example), the
> estimation of coefficients in GWR is equal to the global estimation in
> ordinary linear regression.
> adaptive_option(optional)
>     BOOLEAN,default:FALSE. When TRUE, an adaptive kernel is calculated
> where the bandwidth corresponds to the number of nearest neighbours (i.e.
> adaptive distance)
> ftest_option(optional)
>     BOOLEAN,default:FALSE .  When TRUE, three F-tests and
> spatial-stationary test of coefficients are also conducted and returned
> with the results according to Leung et al. (2000).
> regression_location
>     2D Point or Polygon Geometry, A geometry (usually 2D point geometry)
> representing locations where training should be conducted. The length of
> regression_location must be equal to the length of source_table.In most
> cases,it is a geometry field of source_table.
> prediction_location(optional)
>     2D Point or Polygon Geometry,default:regression_location. A geometry
> (usually 2D point geometry) representing locations where estimation of
> coefficients should be computed.
> grouping_cols (optional)
>     TEXT, default: NULL. An expression list used to group the input
> dataset into discrete groups, running one regression per group. Similar to
> the SQL GROUP BY clause. When this value is null, no grouping is used and a
> single result model is generated.
> verbose(optional)
>     BOOLEAN, default: FALSE. Provides verbose output of the results of
> training.
>
> ---------------------------------------------------------------------------------------------------------------------------------------------
> Prediction Function
> gwregr_predict(coef, col_ind,newdata_table)
> Arguments
> coef
>     FLOAT8[][]. Vector of the coefficients of regression.
> col_ind
>     FLOAT8[]. An array containing the independent variable column names.
> newdata_table(optional)
>     TEXT. default: NULL. The name of table which provide new data in
> prediction locations. If prediction_location is  same as
> regression_locations (default value) in training fucntion, this parameter
> is omitted automatically. Otherwise, newdata_table is obligatory to provide
> independent variables with identical field names in source_table in
> prediction locations .
>
> > Date: Fri, 18 Dec 2015 09:18:22 -0800
> > Subject: Re: How to contribute a spatial module to MADlib manipulating
> objects from PostGIS
> > From: [email protected]
> > To: [email protected]
> >
> > Thanks ChenLiang Wang for your interest.
> >
> > I would repeat Ivan's welcome to you, and I look forward to your
> > contributions in the area of GIS.
> >
> > To answer your questions:
> >
> > 1.  Yes, it is possible to call PostGIS functions from MADlib.
> >
> > 2.  Yes, spatial statistics are suitable for MADlib.
> >
> > For documentation, please refer to the Apache MADlib wiki
> > http://madlib.incubator.apache.org/
> >
> > which includes:
> > Quick Start Guides
> >
> > Get going with a minimum of fuss.
> >
> >    - Installation Guide
> >    <
> https://cwiki.apache.org/confluence/display/MADLIB/Installation+Guide>
> >    - Quick Start Guide for Users
> >    <
> https://cwiki.apache.org/confluence/display/MADLIB/Quick+Start+Guide+for+Users
> >
> >    - Quick Start Guide for Developers
> >    <
> https://cwiki.apache.org/confluence/display/MADLIB/Quick+Start+Guide+for+Developers
> >
> >
> >
> > As Ivan mentioned, writing down the functions you would like to build and
> > the interface is a good place to begin.  Then we can discuss on the open
> > mailing list.
> >
> > Regards,
> > Frank
> >
> > On Thu, Dec 17, 2015 at 8:11 PM, 王晨 亮 <[email protected]> wrote:
> >
> > > Thanks for your quick reply. Your suggestion is great. I will give a
> > > definitions and description for the spatial statistic functions and
> > > comparison with ordinary statistic models.
> > >
> > >
> > > > Date: Thu, 17 Dec 2015 21:56:06 -0500
> > > > Subject: Re: How to contribute a spatial module to MADlib
> manipulating
> > > objects from PostGIS
> > > > From: [email protected]
> > > > To: [email protected]
> > > >
> > > > Hi ChenLiang,
> > > >
> > > > I think your proposal is good and worth trying to do it!
> > > >
> > > > Can I suggest the first steps if you send a proposal of the function
> > > > definitions and the parameters and return values as well as
> description
> > > of
> > > > the functions and what they do.
> > > >
> > > > Based on that we can discuss the design of the interface and once it
> > > looks
> > > > good you can start working on the actual implementation of the
> coding.
> > > > When you get to implementation we can help you on technical
> challenges.
> > > >
> > > > Cheers,
> > > > Ivan
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Dec 17, 2015 at 9:50 PM, 王晨 亮 <[email protected]> wrote:
> > > >
> > > > > Hi MADlib Developers,
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > I am a GIS Researcher and have some knowledge on PostGIS, Python,
> > > > > C/C++,Java and R.
> > > > >
> > > > >
> > > > >
> > > > > I have learned some spatial statistical models during My PhD
> research
> > > in
> > > > > GIS. Recently, I have done a job translating GWR (Geographical
> Weighted
> > > > > Regression) from R into Java for my company.  And I would like to
> > > > > contribute to MADLib if possible.  I believe PostGIS and MADlib
> are the
> > > > > most powerful extensions of PostgreSQL . Therefore, a spatial
> > > statistical
> > > > > module connecting the two libraries could be significant . If I can
> > > start
> > > > > the task , the first goal to implement will be GWR model.
> > > > >
> > > > >
> > > > >
> > > > > Now I am reading the developer guide of MADlib. I not quite sure
> how to
> > > > > contribute a geospatial module to MADlib. Is it possible to
> manipulate
> > > > > spatial object or attribute from PostGIS in MADlib ?
> > > > >
> > > > >
> > > > >
> > > > > So could anyone suggest a few pointers & links that I can follow
> to get
> > > > > to know:
> > > > >
> > > > >
> > > > >
> > > > > 1. how to deal with these dependencies about MADlib?
> > > > >
> > > > >
> > > > >
> > > > > 2. whether the spatial statistics module is suitable for MADlib?
> > > > >
> > > > >
> > > > >
> > > > > Thank you in advance.
> > > > >
> > > > >
> > > > > ChenLiang Wang
> > > > >
> > > > >
> > >
> > >
>
>

Reply via email to