On Sunday, August 31, 2014 1:30:32 PM UTC-5, Bradley Setzler wrote:
>
> Thank you Adam, this works.
>
> Let me suggest that this information be included in the GLM documentation:
>
> To fit a GLM model, use the function,
> glm(formula, data, family, link),
> where,
> - formula uses column symbols from the DataFrame data, e.g., if
> names(data)=[:Y,:X], then a valid formula is Y~X;
> - data is a DataFrame which may contain NA values, the rows with NA values
> will be ignored (apparently);
> - family may be chosen from Binomial(), Gamma(), Normal(), or Poisson(),
> and the parentheses are required; and,
> - link may be chosen from the list in the GLM documentation, such as
> LogitLink(), and again the parentheses are required. For some families, a
> default link is available so the link argument may be left blank.
>
It would be more accurate to say that if the link argument is omitted
("left blank" is ambiguous) the canonical link is used. A distribution
from the exponential family (en.wikipedia.org/wiki/Exponential_family) has
a canonical link function derived from the probability mass or probability
density function. Because it is difficult to distinguish between models
fit using the same distribution but different links, it is uncommon to use
non-canonical links. What I am trying to say is that the canonical link is
more than an arbitrarily chosen default.
It is unfortunate that the names "Poisson regression", "Logistic
regression" and "Probit regression" had existed before Nelder and
Wedderburn came up with a unifying framework for such models. These names
refer to three different aspects of the model; "Poisson" refers to the
distribution, "Logistic" to the inverse link function and "Probit" to the
link.
Of course most statistics nomenclature is badly botched so this
inconsistency should not be a surprise.
The parentheses after the name are to create an instance of a distribution
type or of a link type for the purposes of dispatch. It should be possible
to dispatch on a DataType as well (i.e. you could write Poisson instead of
Poisson()). I took a look at the sources but I have lost track of the
changes relative to the original design and am not sure the changes would
be made now.
> On Sunday, August 31, 2014 12:56:19 PM UTC-5, Adam Kapor wrote:
>
>> This works for me:
>>
>> ```
>>
>> *julia> **fit(GeneralizedLinearModel,Y~X,data,Binomial(),ProbitLink())*
>>
>> *DataFrameRegressionModel{GeneralizedLinearModel,Float64}:*
>>
>> *Coefficients:*
>>
>> * Estimate Std.Error z value Pr(>|z|)*
>>
>> *(Intercept) 0.430727 1.98019 0.217518 0.8278*
>>
>> *X 2.37745e-17 0.91665 2.59362e-17 1.0000*
>>
>> *julia> **fit(GeneralizedLinearModel,Y~X,data,Binomial(),LogitLink())*
>>
>> *DataFrameRegressionModel{GeneralizedLinearModel,Float64}:*
>>
>> *Coefficients:*
>>
>> * Estimate Std.Error z value Pr(>|z|)*
>>
>> *(Intercept) 0.693147 3.24037 0.21391 0.8306*
>>
>> *X -7.44332e-17 1.5 -4.96221e-17 1.0000*
>>
>> *```*
>>
>> On Sunday, August 31, 2014 1:27:15 PM UTC-4, Bradley Setzler wrote:
>>>
>>> Has anyone successfully performed probit or logit regression in Julia?
>>> The GLM documentation <https://github.com/JuliaStats/GLM.jl> does not
>>> provide a generalizable example of how to use glm(). It gives a Poisson
>>> example without any suggestion of how to switch from Poisson to some other
>>> type.
>>>
>>> *Using the Poisson example from GLM documentation works:*
>>>
>>> julia> X = [1;2;3.]
>>> julia> Y = [1;0;1.]
>>> julia> data = DataFrame(X=X,Y=Y)
>>> julia> fit(GeneralizedLinearModel, Y ~ X,data, Poisson())
>>> DataFrameRegressionModel{GeneralizedLinearModel,Float64}:
>>> Coefficients:
>>> Estimate Std.Error z value Pr(>|z|)
>>> (Intercept) -0.405465 1.87034 -0.216787 0.8284
>>> X -3.91448e-17 0.8658 -4.52123e-17 1.0000
>>>
>>> *But does not generalize:*
>>>
>>> julia> fit(GeneralizedLinearModel, Y ~ X ,data, Logit())
>>> ERROR: Logit not defined
>>>
>>> julia> fit(GeneralizedLinearModel, Y ~ X, data, link=:ProbitLink)
>>> ERROR: `fit` has no method matching fit(::Type{GeneralizedLinearModel},
>>> ::Array{Float64,2}, ::Array{Float64,1})
>>>
>>> julia> fit(GeneralizedLinearModel, Y ~ X, data,
>>> family="binomial",link="probit")
>>> ERROR: `fit` has no method matching fit(::Type{GeneralizedLinearModel},
>>> ::Array{Float64,2}, ::Array{Float64,1})
>>>
>>> ....and a dozen other similar attempts fail.
>>>
>>>
>>> Thanks,
>>> Bradley
>>>
>>>