Agreed regarding Stata formulas.
We could always go the way of SAS and use = to separate the response
from predictors, though that could get a tricky and/or confusing due to
the similarity with how keyword arguments are specified. Not to mention
problematic behavior if used incorrectly...
Milan Bouchet-Valat <mailto:[email protected]>
August 17, 2016 at 7:32 AM
I don't find it particularly clear that in Stata the response isn't
visually separated from the dependent variables. ~ is really useful
IMHO.
As regards +, it's needed so that the formula is a valid Julia
expression, which is good for consistency (even if formulas end up
being written as strings). That convention also follows the Wilkinson &
Rodgers notation, so there's a precedent in the literature other than
R.
Regards
Matthieu <mailto:[email protected]>
August 15, 2016 at 8:33 PM
In stata one specifies a formula without ~ or +
y x1 x2
It works pretty well in my experience. How about dropping ~ and +?
Michael Krabbe Borregaard <mailto:[email protected]>
May 18, 2016 at 6:04 AM
One might argue that the mathematical symbol ⇒ means something
entirely different from what is implied by the formula operator: that
the left side leads to the right by material implication
<https://en.wikipedia.org/wiki/Material_conditional> . Also, the
intuitive interpretation of => (that the left side leads to the right)
is wrong.
--
You received this message because you are subscribed to a topic in the
Google Groups "julia-stats" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/julia-stats/LdozV7o4zuM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
[email protected]
<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.
Milan Bouchet-Valat <mailto:[email protected]>
May 18, 2016 at 4:24 AM
Le mardi 17 mai 2016 à 22:22 -0700, Alex Arslan a écrit :
I definitely agree with Stefan on this.
Milan, you mentioned that we should have a strong reason to break the
convention if we choose to do so. I've always found R's use of `~` to
be a bit unfortunate. It's a carryover from S, which introduced `~`
for formulas before R was even a thing. But S is also the language
that brought us `<-` for assignment, so I tend to be wary of its
gifts. ;) At this point I think other languages that offer
statistical modeling facilities are using `~` just because R does it.
Julia has the opportunity to potentially set a new precedent as it
gains traction for stats, so I think we should think carefully about
the choices.
Stefan's suggestion of `@model` opens the possibility for just about
any kind of separator because it just becomes the `head` in the
`Expr` that goes into the macro. But I think the syntax for pairs,
i.e. `=>`, would make the most sense in terms of consistency with
existing Julia structures because a model is essentially a pair; it's
some combination of responses paired with some combination of
predictors.
As I said, I find => a good idea too. @model sounds a bit verbose to me
(it should really be called @formula anyway), but maybe that's OK if in
practice we can use @fit as a shorthand.
Anyway, I don't really have strong feelings about this either.
Regards
Anyway, just thinking aloud.
-Alex
I don't have a strong feeling about `~` versus `=>` – although the
R tradition of using `~` seems to at least give a hint about what's
going on, which is kind of nice. But I do think that it would be
good to get rid of the macro business for ~ and start spelling
model specifications as `@model y ~ 1 + x + z` or `@model y => 1 +
x + z` and returning some kind of Model type instead of using bare
expression objects for this kind of thing. Expression objects
already have a meaning in Julia code and it is not to specify
statistical models – it is to represent Julia expression trees. The
fact that those two meanings can usually be disambiguated easily
doesn't mean they should be represented the same way.
On Tue, Feb 2, 2016 at 10:15 AM, Milan Bouchet-Valat wrote:
Le mardi 02 février 2016 à 15:30 +0100, Milan Bouchet-Valat a écrit :
Using Pairs (and therefore =>) sounds like a good idea to me, as it
conveys exactly the meaning of associating two parts of a formula
together, in a structure designed for that. (Well, the direction of the
arrow isn't very natural, but...)
But maybe to make it nicer to read we could make the whole formula an
expression, i.e.:
:(y => 1 + x + z)
instead of:
:y => :(1 + x + z)
That would make the syntax very close to what macros would allow to fit
a model:
@fit(LinearModel, y => 1 + x + z, data)
(Such a macro, while not strictly necessary, could also allow saving
the full call expression and the name of the dataset used when fitting
the model, to print it to the user as R does.)
Maybe more importantly, it would remove the requirement for the left
hand-side of the formula to be a symbol. Indeed, some models (like PLS
regression) accept several dependent variables, which could be written
like this:
:(y + z => 1 + x)
Actually, scratch that, as the two features are orthogonal.
:(y => 1 + x + z) and :y => :(1 + x + z) both have Symbol as the type
for the LHS. The only difference is whether we have a Pair of
expressions/symbols or a => call with two expression/symbol arguments.
Yet it might be a bit nicer to write
:(y + z => 1 + x)
rather than
:(y + z) => :(1 + x)
My two cents
Le lundi 01 février 2016 à 13:18 -0800, Douglas Bates a écrit :
The current formula interface for packages like GLM and MixedModels
emulates that of R in that a formula is written like
y ~ 1 + x + z
The difficulty with this form is that the ~ character is used
elsewhere in Julia so somewhat nasty tricks need to be used to
parse
such an expression as a formula.
One way to break away from this R-centric approach is to use a Pair
to represent a formula. Because we don't want to evaluate the
expressions in a formula at function call it would be necessary to
use Pair(Symbol,Expr) or Pair(Expr,Expr) to represent a two-sided
formula. The translation of the previous formula would be
:y => :(1 + x + z)
This requires a few extra keystrokes but is not a terrible burden
and
it would use a native Julia construct. It also serves to visually
distinguish a formula in Julia from a formula in R so that we can
make other changes in the formula language (e.g. require an
explicit
1 for the intercept term) with less confusion for users. Because a
formula in Julia looks different from a formula in R it is less
confusing that other aspects of the formula syntax are different in
Julia and in R.
--
You received this message because you are subscribed to the Google
Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [email protected].
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the
Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [email protected].
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [email protected].
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.