Yeah, there are a lot of possible interfaces for this. Early on in JuliaStats there was a little bit of work to do polynomial regression, which fizzled out because of its considerable complexity.
— John On Aug 31, 2014, at 1:11 PM, Bradley Setzler <[email protected]> wrote: > Yeah, or it might be easier to do it separately, like a function > seriesData = createSeries(data, rank=2) > which returns a DataFrame that contains all of those series terms. Then > seriesData would simply be used as the data argument in glm(). > > Bradley > > On Sunday, August 31, 2014 3:05:12 PM UTC-5, John Myles White wrote: > I see. This is a pretty radical change to how GLM’s would be specified. I > think the only realistic way you could make any progress on such a radical > proposal is to undertake this change as a project on your own and then give > people a demo of a system you’ve built that’s noticeably better than what > they’re used to having in R. > > — John > > On Aug 31, 2014, at 1:02 PM, Bradley Setzler <[email protected]> wrote: > >> Sorry, I meant for those to be in the ... term. >> >> Let me write them explicitly for the case of 3 independent variables, X1 X2 >> X3, seriesRank=2 would be, >> >> (intercept) >> X1.^2 >> X2.^2 >> X3.^2 >> X1.*X2 >> X1.*X3 >> X2.*X3 >> X1.*X2.*X3 >> >> Bradley >> >> On Sunday, August 31, 2014 2:55:22 PM UTC-5, John Myles White wrote: >> Bradley, you’re forgetting about interactions terms. >> >> — John >> >> On Aug 31, 2014, at 12:53 PM, Bradley Setzler <[email protected]> wrote: >> >>> No problem. >>> >>> Honestly, I'm not sure formula is a useful way to think about regression, >>> the formula is uniquely determined from: >>> (depVar, indepVars, data, family, link) >>> >>> so that the + symbols are redundant given family and link, >>> glm(Y ~ X1 + X2 + X3 + X4 + X5 +...., family, link) >>> >>> and it would be nice to have an explicit intercept argument like, >>> glm(Y,X,data,family,link,intercept=true) >>> >>> Adding to the wish list, I would like to see something like a series option >>> for non-parametric regression, >>> glm(Y,X,data,family,link,seriesRank=2) >>> where seriesRank=2 means all of the terms X1.^2, X1.*X2, X1.*X3,...,X5.^2 >>> are included as regressors. >>> >>> Bradley >>> >>> >>> >>> >>> On Sunday, August 31, 2014 2:32:30 PM UTC-5, John Myles White wrote: >>> Merged. Thanks, Bradley. >>> >>> — John >>> >>> On Aug 31, 2014, at 12:29 PM, Bradley Setzler <[email protected]> wrote: >>> >>>> Thank you for suggesting this, John. >>>> >>>> https://github.com/JuliaStats/GLM.jl/pull/90 >>>> >>>> Bradley >>>> >>>> >>>> On Sunday, August 31, 2014 1:33:04 PM UTC-5, John Myles White wrote: >>>> Bradley, it’s especially easy to edit documentation because you can make a >>>> Pull Request right from the website. >>>> >>>> — John >>>> >>>> On Aug 31, 2014, at 11:30 AM, Bradley Setzler <[email protected]> wrote: >>>> >>>>> Thank you Adam, this works. >>>>> >>>>> Let me suggest that this information be included in the GLM documentation: >>>>> >>>>> To fit a GLM model, use the function, >>>>> glm(formula, data, family, link), >>>>> where, >>>>> - formula uses column symbols from the DataFrame data, e.g., if >>>>> names(data)=[:Y,:X], then a valid formula is Y~X; >>>>> - data is a DataFrame which may contain NA values, the rows with NA >>>>> values will be ignored (apparently); >>>>> - family may be chosen from Binomial(), Gamma(), Normal(), or Poisson(), >>>>> and the parentheses are required; and, >>>>> - link may be chosen from the list in the GLM documentation, such as >>>>> LogitLink(), and again the parentheses are required. For some families, a >>>>> default link is available so the link argument may be left blank. >>>>> >>>>> Bradley >>>>> >>>>> >>>>> On Sunday, August 31, 2014 12:56:19 PM UTC-5, Adam Kapor wrote: >>>>> This works for me: >>>>> >>>>> ``` >>>>> julia> fit(GeneralizedLinearModel,Y~X,data,Binomial(),ProbitLink()) >>>>> >>>>> DataFrameRegressionModel{GeneralizedLinearModel,Float64}: >>>>> >>>>> Coefficients: >>>>> >>>>> Estimate Std.Error z value Pr(>|z|) >>>>> >>>>> (Intercept) 0.430727 1.98019 0.217518 0.8278 >>>>> >>>>> X 2.37745e-17 0.91665 2.59362e-17 1.0000 >>>>> >>>>> julia> fit(GeneralizedLinearModel,Y~X,data,Binomial(),LogitLink()) >>>>> >>>>> DataFrameRegressionModel{GeneralizedLinearModel,Float64}: >>>>> >>>>> Coefficients: >>>>> >>>>> Estimate Std.Error z value Pr(>|z|) >>>>> >>>>> (Intercept) 0.693147 3.24037 0.21391 0.8306 >>>>> >>>>> X -7.44332e-17 1.5 -4.96221e-17 1.0000 >>>>> >>>>> ``` >>>>> >>>>> >>>>> On Sunday, August 31, 2014 1:27:15 PM UTC-4, Bradley Setzler wrote: >>>>> Has anyone successfully performed probit or logit regression in Julia? >>>>> The GLM documentation does not provide a generalizable example of how to >>>>> use glm(). It gives a Poisson example without any suggestion of how to >>>>> switch from Poisson to some other type. >>>>> >>>>> Using the Poisson example from GLM documentation works: >>>>> >>>>> julia> X = [1;2;3.] >>>>> julia> Y = [1;0;1.] >>>>> julia> data = DataFrame(X=X,Y=Y) >>>>> julia> fit(GeneralizedLinearModel, Y ~ X,data, Poisson()) >>>>> DataFrameRegressionModel{GeneralizedLinearModel,Float64}: >>>>> Coefficients: >>>>> Estimate Std.Error z value Pr(>|z|) >>>>> (Intercept) -0.405465 1.87034 -0.216787 0.8284 >>>>> X -3.91448e-17 0.8658 -4.52123e-17 1.0000 >>>>> >>>>> But does not generalize: >>>>> >>>>> julia> fit(GeneralizedLinearModel, Y ~ X ,data, Logit()) >>>>> ERROR: Logit not defined >>>>> >>>>> julia> fit(GeneralizedLinearModel, Y ~ X, data, link=:ProbitLink) >>>>> ERROR: `fit` has no method matching fit(::Type{GeneralizedLinearModel}, >>>>> ::Array{Float64,2}, ::Array{Float64,1}) >>>>> >>>>> julia> fit(GeneralizedLinearModel, Y ~ X, data, >>>>> family="binomial",link="probit") >>>>> ERROR: `fit` has no method matching fit(::Type{GeneralizedLinearModel}, >>>>> ::Array{Float64,2}, ::Array{Float64,1}) >>>>> >>>>> ....and a dozen other similar attempts fail. >>>>> >>>>> >>>>> Thanks, >>>>> Bradley >>>>> >>>> >>> >> >
