Does anyone know how to get predicted Y values after fitting the glm
regression of Y on X? The documentation mentions LinPred, which may be it,
but I'm not having luck getting it to work.
I would have guessed it was something like this:
julia> X = [1;2;3.]
julia> Y = [1;0;1.]
julia> data = DataFrame(X=X,Y=Y)
julia> OLS = glm(Y~X,data,Normal(),IdentityLink())
DataFrameRegressionModel{GeneralizedLinearModel,Float64}:
Coefficients:
Estimate Std.Error z value Pr(>|z|)
(Intercept) 0.666667 1.24722 0.534522 0.5930
X -4.16334e-16 0.57735 -7.21111e-16 1.0000
julia> LinPred(OLS)
ERROR: type cannot be constructed
julia> LinPred(OLS,data,X)
ERROR: type cannot be constructed
julia> OLS(X)
ERROR: type: apply: expected Function, got
DataFrameRegressionModel{GeneralizedLinearModel,Float64}
Thanks,
Bradley
On Sunday, August 31, 2014 3:12:55 PM UTC-5, John Myles White wrote:
>
> Yeah, there are a lot of possible interfaces for this. Early on in
> JuliaStats there was a little bit of work to do polynomial regression,
> which fizzled out because of its considerable complexity.
>
> — John
>
> On Aug 31, 2014, at 1:11 PM, Bradley Setzler <[email protected]
> <javascript:>> wrote:
>
> Yeah, or it might be easier to do it separately, like a function
> seriesData = createSeries(data, rank=2)
> which returns a DataFrame that contains all of those series terms. Then
> seriesData would simply be used as the data argument in glm().
>
> Bradley
>
> On Sunday, August 31, 2014 3:05:12 PM UTC-5, John Myles White wrote:
>>
>> I see. This is a pretty radical change to how GLM’s would be specified. I
>> think the only realistic way you could make any progress on such a radical
>> proposal is to undertake this change as a project on your own and then give
>> people a demo of a system you’ve built that’s noticeably better than what
>> they’re used to having in R.
>>
>> — John
>>
>> On Aug 31, 2014, at 1:02 PM, Bradley Setzler <[email protected]>
>> wrote:
>>
>> Sorry, I meant for those to be in the ... term.
>>
>> Let me write them explicitly for the case of 3 independent variables, X1
>> X2 X3, seriesRank=2 would be,
>>
>> (intercept)
>> X1.^2
>> X2.^2
>> X3.^2
>> X1.*X2
>> X1.*X3
>> X2.*X3
>> X1.*X2.*X3
>>
>> Bradley
>>
>> On Sunday, August 31, 2014 2:55:22 PM UTC-5, John Myles White wrote:
>>>
>>> Bradley, you’re forgetting about interactions terms.
>>>
>>> — John
>>>
>>> On Aug 31, 2014, at 12:53 PM, Bradley Setzler <[email protected]>
>>> wrote:
>>>
>>> No problem.
>>>
>>> Honestly, I'm not sure formula is a useful way to think about
>>> regression, the formula is uniquely determined from:
>>> (depVar, indepVars, data, family, link)
>>>
>>> so that the + symbols are redundant given family and link,
>>> glm(Y ~ X1 + X2 + X3 + X4 + X5 +...., family, link)
>>>
>>> and it would be nice to have an explicit intercept argument like,
>>> glm(Y,X,data,family,link,intercept=true)
>>>
>>> Adding to the wish list, I would like to see something like a series
>>> option for non-parametric regression,
>>> glm(Y,X,data,family,link,seriesRank=2)
>>> where seriesRank=2 means all of the terms X1.^2, X1.*X2,
>>> X1.*X3,...,X5.^2 are included as regressors.
>>>
>>> Bradley
>>>
>>>
>>>
>>>
>>> On Sunday, August 31, 2014 2:32:30 PM UTC-5, John Myles White wrote:
>>>>
>>>> Merged. Thanks, Bradley.
>>>>
>>>> — John
>>>>
>>>> On Aug 31, 2014, at 12:29 PM, Bradley Setzler <[email protected]>
>>>> wrote:
>>>>
>>>> Thank you for suggesting this, John.
>>>>
>>>> https://github.com/JuliaStats/GLM.jl/pull/90
>>>>
>>>> Bradley
>>>>
>>>>
>>>> On Sunday, August 31, 2014 1:33:04 PM UTC-5, John Myles White wrote:
>>>>>
>>>>> Bradley, it’s especially easy to edit documentation because you can
>>>>> make a Pull Request right from the website.
>>>>>
>>>>> — John
>>>>>
>>>>> On Aug 31, 2014, at 11:30 AM, Bradley Setzler <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Thank you Adam, this works.
>>>>>
>>>>> Let me suggest that this information be included in the GLM
>>>>> documentation:
>>>>>
>>>>> To fit a GLM model, use the function,
>>>>> glm(formula, data, family, link),
>>>>> where,
>>>>> - formula uses column symbols from the DataFrame data, e.g., if
>>>>> names(data)=[:Y,:X], then a valid formula is Y~X;
>>>>> - data is a DataFrame which may contain NA values, the rows with NA
>>>>> values will be ignored (apparently);
>>>>> - family may be chosen from Binomial(), Gamma(), Normal(), or
>>>>> Poisson(), and the parentheses are required; and,
>>>>> - link may be chosen from the list in the GLM documentation, such as
>>>>> LogitLink(), and again the parentheses are required. For some families, a
>>>>> default link is available so the link argument may be left blank.
>>>>>
>>>>> Bradley
>>>>>
>>>>>
>>>>> On Sunday, August 31, 2014 12:56:19 PM UTC-5, Adam Kapor wrote:
>>>>>>
>>>>>> This works for me:
>>>>>>
>>>>>> ```
>>>>>>
>>>>>> *julia> *
>>>>>> *fit(GeneralizedLinearModel,Y~X,data,Binomial(),ProbitLink())*
>>>>>>
>>>>>> *DataFrameRegressionModel{GeneralizedLinearModel,Float64}:*
>>>>>>
>>>>>> *Coefficients:*
>>>>>>
>>>>>> * Estimate Std.Error z value Pr(>|z|)*
>>>>>>
>>>>>> *(Intercept) 0.430727 1.98019 0.217518 0.8278*
>>>>>>
>>>>>> *X 2.37745e-17 0.91665 2.59362e-17 1.0000*
>>>>>>
>>>>>> *julia> *
>>>>>> *fit(GeneralizedLinearModel,Y~X,data,Binomial(),LogitLink())*
>>>>>>
>>>>>> *DataFrameRegressionModel{GeneralizedLinearModel,Float64}:*
>>>>>>
>>>>>> *Coefficients:*
>>>>>>
>>>>>> * Estimate Std.Error z value Pr(>|z|)*
>>>>>>
>>>>>> *(Intercept) 0.693147 3.24037 0.21391 0.8306*
>>>>>>
>>>>>> *X -7.44332e-17 1.5 -4.96221e-17 1.0000*
>>>>>>
>>>>>> *```*
>>>>>>
>>>>>> On Sunday, August 31, 2014 1:27:15 PM UTC-4, Bradley Setzler wrote:
>>>>>>>
>>>>>>> Has anyone successfully performed probit or logit regression in
>>>>>>> Julia? The GLM documentation <https://github.com/JuliaStats/GLM.jl>
>>>>>>> does not provide a generalizable example of how to use glm(). It gives
>>>>>>> a
>>>>>>> Poisson example without any suggestion of how to switch from Poisson to
>>>>>>> some other type.
>>>>>>>
>>>>>>> *Using the Poisson example from GLM documentation works:*
>>>>>>>
>>>>>>> julia> X = [1;2;3.]
>>>>>>> julia> Y = [1;0;1.]
>>>>>>> julia> data = DataFrame(X=X,Y=Y)
>>>>>>> julia> fit(GeneralizedLinearModel, Y ~ X,data, Poisson())
>>>>>>> DataFrameRegressionModel{GeneralizedLinearModel,Float64}:
>>>>>>> Coefficients:
>>>>>>> Estimate Std.Error z value Pr(>|z|)
>>>>>>> (Intercept) -0.405465 1.87034 -0.216787 0.8284
>>>>>>> X -3.91448e-17 0.8658 -4.52123e-17 1.0000
>>>>>>>
>>>>>>> *But does not generalize:*
>>>>>>>
>>>>>>> julia> fit(GeneralizedLinearModel, Y ~ X ,data, Logit())
>>>>>>> ERROR: Logit not defined
>>>>>>>
>>>>>>> julia> fit(GeneralizedLinearModel, Y ~ X, data, link=:ProbitLink)
>>>>>>> ERROR: `fit` has no method matching
>>>>>>> fit(::Type{GeneralizedLinearModel}, ::Array{Float64,2},
>>>>>>> ::Array{Float64,1})
>>>>>>>
>>>>>>> julia> fit(GeneralizedLinearModel, Y ~ X, data,
>>>>>>> family="binomial",link="probit")
>>>>>>> ERROR: `fit` has no method matching
>>>>>>> fit(::Type{GeneralizedLinearModel}, ::Array{Float64,2},
>>>>>>> ::Array{Float64,1})
>>>>>>>
>>>>>>> ....and a dozen other similar attempts fail.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Bradley
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>