Hello ChenLiang, I have read your description of the interface and to my understanding this is a supervised machine learning algorithm that supports geometry data. Am I correct?
What could be a good industrial use case for this model for some examples? Could you train a system based on locations and weather to find bad signals for cell phone? Can you provide any real world example scenario where this type of model will be useful for end users? Also I am adding CC to some of my colleagues at work. Kuien, Max, Yandong can you provide any feedback on this proposal from your Point of View? http://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201601.mbox/%[email protected]%3E Cheers, Ivan On Wed, Jan 13, 2016 at 11:20 AM, WangChenLiang <[email protected]> wrote: > Sorry, the link of attachment (http://1drv.ms/1ZjAiCg) is lost in the > previous letter. > > > From: [email protected] > > To: [email protected] > > Subject: RE: How to contribute a spatial module to MADlib manipulating > objects from PostGIS > > Date: Wed, 13 Jan 2016 11:09:17 +0800 > > > > > > > > Hi ,Caleb and Ivan! > > Thanks for your attention and help. I reviewed the previous draft and > find > > something inappropriate. The archive containing the new draft and > example code > > is attached in the letter which would be more reasonable than the > earlier edition. > > Please go over the manuscript and give suggestion again . > > The following are my answers to Caleb's questions. > > - Does this function require PostGIS to also be > > installed? If yes, it would be better > > if we disable the function if > > PostGIS is not present rather than introduce PostGIS > > as a dependency. (Similar > > to what we do with our requirement on the xml module with our PMML export > > functionality). > > > > > > > > A:Yes. I am trying to avoid > > input any spatial datatypes in the interface of GWR. > > But I have no > > idea if it is necessary to provide simple alternative when PostGIS is not > > available. > > > > > > > > - What are the exact datatypes in the function > > definition for regression_location > > and prediction_location? > > > > > > > > > > > > A:I changed the datatype > > to TEXT as the name of POINT or MULTIPOLYGON > > (centroid of > > each polygon for estimation for GWR). > > > > > > > > - In the description it describes > > regression_location as "The length of > > regression_location must be equal to the length of > > source_table", which signals to me that it is likely intended to be a > > column of the source table? If not then how is > > this length represented? > > > > > > A: In the previous > > interface, I was trying to input a geometry field which could be > > from another > > table having different row number. Now, I alter the argument > > definition and make it > > to TEXT. It must be the name of geometry field in the > > source table. > > > > > > > > - You didn't mark regression_location as > > (optional). Due to the way Postgres > > functions work all optional arguments > > must come after all required arguments, > > so having a non-optional argument in > > the middle of the optional list must be > > avoided. > > > > > > > > A:Thanks for > > reminding me of this mistake. It is really my fault. The order of > > argument is changed in this edition. > > > > > > > > > > - I haven't read through the literature, but it is > > not immediately clear to me why > > prediction_location is a parameter to > > gwregr_train() rather than gwregr_predict(). > > Can you provide a brief > > description to the way that prediction_location is used in > > the model and its > > relationship to training and prediction. > > > > > > > > A: Actually, > > there are three kinds location data including location of sample data, > > regression and prediction in the modeling of GWR. > > > > Locations of sample data indicate where is sample > > data. Locations of regression > > indicate where regression should be conducted. If > > it is identical to data location > > (in most instances),diagnostic information can > > be calculated. > > > > Locations of > > prediction indicate where coefficients should be predicted. It should be > a > > parameter for a predict function. Putting regression_location into > training > > function is just for omitting kernel arguments and maybe not > appropriate. In the process of > > training, GWR estimates weight and coefficients with distance > > between data_loctions and regression_loctions. Then, diagnostic > information are > > estimated when these two locations are identical. We can treat > data_locationas regression_location to simplify the process not taking > different locations from > > data location in the training step. > > > > In the process of > > prediction , there are two new information including new > > independent variables and new locations. Therefore, coefficients and > weight > > vector must be estimated > > again. GWR can > > estimate coefficients in any positions > > using independent variables of sample data. > > If we also provide independent > > variables in any positions,we can also obtain > > dependent variable in any position. So if we treat coefficients at > prediction_location as a training result to put > > coefficients into prediction > > directly, it is reasonable to put it into training function. But if we > treat it as a part of prediction, it is appropriate to set > predicton_location within predict function. And then, prediction function > must require kernel > > parameters in addition to new data and locations for prediction. Maybe > this way > > is more clear > > and reasonable, and is similar with others GWR packages in R. > > > > > > > > I > > rewrote the description of interface taking your suggestion into > account. I > > moved > > prediction_location into predict function and modified > > some mistake and > > unnecessary arguments. The new draft of interface design is attached in > the > > letter. > > > > > > Regards, > > > > ChenLiang Wang > > > > > > > > > From: [email protected] > > > Date: Tue, 5 Jan 2016 10:31:20 -0800 > > > Subject: Re: How to contribute a spatial module to MADlib manipulating > objects from PostGIS > > > To: [email protected] > > > > > > Hi ChenLiang, > > > > > > Thanks for taking the next step to flush this out. > > > > > > As a whole: > > > - naming and basic interface seems consistent with existing > conventions. > > > - names are descriptive. > > > - references to the literature is provided. > > > - functionality is complementary to the library. > > > > > > What is not clear to me is: > > > - Does this function require PostGIS to also be installed? If yes, it > > > would be better if we disable the function if PostGIS is not present > rather > > > than introduce PostGIS as a dependency. (Similar to what we do with > our > > > requirement on the xml module with our PMML export functionality). > > > - What are the exact datatypes in the function definition for > > > regression_location and prediction_location? > > > - In the description it describes regression_location as "The length of > > > regression_location must be equal to the length of source_table", which > > > signals to me that it is likely intended to be a column of the source > > > table? If not then how is this length represented? > > > - You didn't mark regression_location as (optional). Due to the way > > > Postgres functions work all optional arguments must come after all > required > > > arguments, so having a non-optional argument in the middle of the > optional > > > list must be avoided. > > > - I haven't read through the literature, but it is not immediately > clear to > > > me why prediction_location is a parameter to gwregr_train() rather than > > > gwregr_predict(). Can you provide a brief description to the way that > > > prediction_location is used in the model and its relationship to > training > > > and prediction. > > > > > > Regards, > > > Caleb > > ChenLiang 要与你在 OneDrive > 上共享一个文件。要查看该文件,请单击下面的链接。 > gwr4madlib.rar > >
