Perhaps ChenLiang would like to join a call with the MADlib community and discuss his contribution?
We have a call this Friday 10AM PST which is not a friendly time for China, but we can schedule a next call at a friendlier time. This email encrypted by tiny buttons & fat thumbs, beta voice recognition, and autocorrect on my iPhone. > On Jan 13, 2016, at 1:53 AM, Ivan Novick <[email protected]> wrote: > > Cool! > >> On Wed, Jan 13, 2016 at 5:52 PM, Kuien Liu <[email protected]> wrote: >> >> Got it, I think I can have a (f2f) talk with Chenliang Wang, as he was >> graduated from an institute of CAS which is not far from our Beijing >> office, and I am familiar with his supervisor and lab director. So I think >> it is highly possible to find him directly in Beijing. >> >> Cheers, >> Kuien Liu >> >>> On Wed, Jan 13, 2016 at 3:05 PM, Ivan Novick <[email protected]> wrote: >>> >>> Hello ChenLiang, >>> >>> I have read your description of the interface and to my understanding >>> this is a supervised machine learning algorithm that supports geometry >>> data. Am I correct? >>> >>> What could be a good industrial use case for this model for some >>> examples? Could you train a system based on locations and weather to find >>> bad signals for cell phone? Can you provide any real world example >>> scenario where this type of model will be useful for end users? >>> >>> Also I am adding CC to some of my colleagues at work. Kuien, Max, >>> Yandong can you provide any feedback on this proposal from your Point of >>> View? >>> >>> >>> http://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201601.mbox/%[email protected]%3E >>> >>> Cheers, >>> Ivan >>> >>> >>> On Wed, Jan 13, 2016 at 11:20 AM, WangChenLiang <[email protected]> >>> wrote: >>> >>>> Sorry, the link of attachment (http://1drv.ms/1ZjAiCg) is lost in the >>>> previous letter. >>>> >>>>> From: [email protected] >>>>> To: [email protected] >>>>> Subject: RE: How to contribute a spatial module to MADlib manipulating >>>> objects from PostGIS >>>>> Date: Wed, 13 Jan 2016 11:09:17 +0800 >>>>> >>>>> >>>>> >>>>> Hi ,Caleb and Ivan! >>>>> Thanks for your attention and help. I reviewed the previous draft >>>> and find >>>>> something inappropriate. The archive containing the new draft and >>>> example code >>>>> is attached in the letter which would be more reasonable than the >>>> earlier edition. >>>>> Please go over the manuscript and give suggestion again . >>>>> The following are my answers to Caleb's questions. >>>>> - Does this function require PostGIS to also be >>>>> installed? If yes, it would be better >>>>> if we disable the function if >>>>> PostGIS is not present rather than introduce PostGIS >>>>> as a dependency. (Similar >>>>> to what we do with our requirement on the xml module with our PMML >>>> export >>>>> functionality). >>>>> >>>>> >>>>> >>>>> A:Yes. I am trying to avoid >>>>> input any spatial datatypes in the interface of GWR. >>>>> But I have no >>>>> idea if it is necessary to provide simple alternative when PostGIS is >>>> not >>>>> available. >>>>> >>>>> >>>>> >>>>> - What are the exact datatypes in the function >>>>> definition for regression_location >>>>> and prediction_location? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> A:I changed the datatype >>>>> to TEXT as the name of POINT or MULTIPOLYGON >>>>> (centroid of >>>>> each polygon for estimation for GWR). >>>>> >>>>> >>>>> >>>>> - In the description it describes >>>>> regression_location as "The length of >>>>> regression_location must be equal to the length of >>>>> source_table", which signals to me that it is likely intended to be a >>>>> column of the source table? If not then how is >>>>> this length represented? >>>>> >>>>> >>>>> A: In the previous >>>>> interface, I was trying to input a geometry field which could be >>>>> from another >>>>> table having different row number. Now, I alter the argument >>>>> definition and make it >>>>> to TEXT. It must be the name of geometry field in the >>>>> source table. >>>>> >>>>> >>>>> >>>>> - You didn't mark regression_location as >>>>> (optional). Due to the way Postgres >>>>> functions work all optional arguments >>>>> must come after all required arguments, >>>>> so having a non-optional argument in >>>>> the middle of the optional list must be >>>>> avoided. >>>>> >>>>> >>>>> >>>>> A:Thanks for >>>>> reminding me of this mistake. It is really my fault. The order of >>>>> argument is changed in this edition. >>>>> >>>>> >>>>> >>>>> >>>>> - I haven't read through the literature, but it is >>>>> not immediately clear to me why >>>>> prediction_location is a parameter to >>>>> gwregr_train() rather than gwregr_predict(). >>>>> Can you provide a brief >>>>> description to the way that prediction_location is used in >>>>> the model and its >>>>> relationship to training and prediction. >>>>> >>>>> >>>>> >>>>> A: Actually, >>>>> there are three kinds location data including location of sample data, >>>>> regression and prediction in the modeling of GWR. >>>>> >>>>> Locations of sample data indicate where is sample >>>>> data. Locations of regression >>>>> indicate where regression should be conducted. If >>>>> it is identical to data location >>>>> (in most instances),diagnostic information can >>>>> be calculated. >>>>> >>>>> Locations of >>>>> prediction indicate where coefficients should be predicted. It should >>>> be a >>>>> parameter for a predict function. Putting regression_location into >>>> training >>>>> function is just for omitting kernel arguments and maybe not >>>> appropriate. In the process of >>>>> training, GWR estimates weight and coefficients with distance >>>>> between data_loctions and regression_loctions. Then, diagnostic >>>> information are >>>>> estimated when these two locations are identical. We can treat >>>> data_locationas regression_location to simplify the process not taking >>>> different locations from >>>>> data location in the training step. >>>>> >>>>> In the process of >>>>> prediction , there are two new information including new >>>>> independent variables and new locations. Therefore, coefficients and >>>> weight >>>>> vector must be estimated >>>>> again. GWR can >>>>> estimate coefficients in any positions >>>>> using independent variables of sample data. >>>>> If we also provide independent >>>>> variables in any positions,we can also obtain >>>>> dependent variable in any position. So if we treat coefficients at >>>> prediction_location as a training result to put >>>>> coefficients into prediction >>>>> directly, it is reasonable to put it into training function. But if we >>>> treat it as a part of prediction, it is appropriate to set >>>> predicton_location within predict function. And then, prediction function >>>> must require kernel >>>>> parameters in addition to new data and locations for prediction. Maybe >>>> this way >>>>> is more clear >>>>> and reasonable, and is similar with others GWR packages in R. >>>>> >>>>> >>>>> >>>>> I >>>>> rewrote the description of interface taking your suggestion into >>>> account. I >>>>> moved >>>>> prediction_location into predict function and modified >>>>> some mistake and >>>>> unnecessary arguments. The new draft of interface design is attached >>>> in the >>>>> letter. >>>>> >>>>> >>>>> Regards, >>>>> >>>>> ChenLiang Wang >>>>> >>>>> >>>>> >>>>>> From: [email protected] >>>>>> Date: Tue, 5 Jan 2016 10:31:20 -0800 >>>>>> Subject: Re: How to contribute a spatial module to MADlib >>>> manipulating objects from PostGIS >>>>>> To: [email protected] >>>>>> >>>>>> Hi ChenLiang, >>>>>> >>>>>> Thanks for taking the next step to flush this out. >>>>>> >>>>>> As a whole: >>>>>> - naming and basic interface seems consistent with existing >>>> conventions. >>>>>> - names are descriptive. >>>>>> - references to the literature is provided. >>>>>> - functionality is complementary to the library. >>>>>> >>>>>> What is not clear to me is: >>>>>> - Does this function require PostGIS to also be installed? If yes, >>>> it >>>>>> would be better if we disable the function if PostGIS is not present >>>> rather >>>>>> than introduce PostGIS as a dependency. (Similar to what we do with >>>> our >>>>>> requirement on the xml module with our PMML export functionality). >>>>>> - What are the exact datatypes in the function definition for >>>>>> regression_location and prediction_location? >>>>>> - In the description it describes regression_location as "The length >>>> of >>>>>> regression_location must be equal to the length of source_table", >>>> which >>>>>> signals to me that it is likely intended to be a column of the source >>>>>> table? If not then how is this length represented? >>>>>> - You didn't mark regression_location as (optional). Due to the way >>>>>> Postgres functions work all optional arguments must come after all >>>> required >>>>>> arguments, so having a non-optional argument in the middle of the >>>> optional >>>>>> list must be avoided. >>>>>> - I haven't read through the literature, but it is not immediately >>>> clear to >>>>>> me why prediction_location is a parameter to gwregr_train() rather >>>> than >>>>>> gwregr_predict(). Can you provide a brief description to the way >>>> that >>>>>> prediction_location is used in the model and its relationship to >>>> training >>>>>> and prediction. >>>>>> >>>>>> Regards, >>>>>> Caleb >>>>> ChenLiang 要与你在 OneDrive >>>> 上共享一个文件。要查看该文件,请单击下面的链接。 >>>> gwr4madlib.rar >>
