Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

josef . pktd Sat, 15 Mar 2014 21:54:07 -0700

On Sat, Mar 15, 2014 at 11:30 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:


>
>
>
> On Sat, Mar 15, 2014 at 7:20 PM, <josef.p...@gmail.com> wrote:
>
>>
>>
>>
>> On Fri, Mar 14, 2014 at 11:41 PM, Nathaniel Smith <n...@pobox.com> wrote:
>>
>>> Hi all,
>>>
>>> Here's the main blocker for adding a matrix multiply operator '@' to
>>> Python: we need to decide what we think its precedence and associativity
>>> should be. I'll explain what that means so we're on the same page, and what
>>> the choices are, and then we can all argue about it. But even better would
>>> be if we could get some data to guide our decision, and this would be a lot
>>> easier if some of you all can help; I'll suggest some ways you might be
>>> able to do that.
>>>
>>> So! Precedence and left- versus right-associativity. If you already know
>>> what these are you can skim down until you see CAPITAL LETTERS.
>>>
>>> We all know what precedence is. Code like this:
>>>   a + b * c
>>> gets evaluated as:
>>>   a + (b * c)
>>> because * has higher precedence than +. It "binds more tightly", as they
>>> say. Python's complete precedence able is here:
>>>
>>> http://docs.python.org/3/reference/expressions.html#operator-precedence
>>>
>>> Associativity, in the parsing sense, is less well known, though it's
>>> just as important. It's about deciding how to evaluate code like this:
>>>   a * b * c
>>> Do we use
>>>   a * (b * c)    # * is "right associative"
>>> or
>>>   (a * b) * c    # * is "left associative"
>>> ? Here all the operators have the same precedence (because, uh...
>>> they're the same operator), so precedence doesn't help. And mostly we can
>>> ignore this in day-to-day life, because both versions give the same answer,
>>> so who cares. But a programming language has to pick one (consider what
>>> happens if one of those objects has a non-default __mul__ implementation).
>>> And of course it matters a lot for non-associative operations like
>>>   a - b - c
>>> or
>>>   a / b / c
>>> So when figuring out order of evaluations, what you do first is check
>>> the precedence, and then if you have multiple operators next to each other
>>> with the same precedence, you check their associativity. Notice that this
>>> means that if you have different operators that share the same precedence
>>> level (like + and -, or * and /), then they have to all have the same
>>> associativity. All else being equal, it's generally considered nice to have
>>> fewer precedence levels, because these have to be memorized by users.
>>>
>>> Right now in Python, every precedence level is left-associative, except
>>> for '**'. If you write these formulas without any parentheses, then what
>>> the interpreter will actually execute is:
>>>   (a * b) * c
>>>   (a - b) - c
>>>   (a / b) / c
>>> but
>>>   a ** (b ** c)
>>>
>>> Okay, that's the background. Here's the question. We need to decide on
>>> precedence and associativity for '@'. In particular, there are three
>>> different options that are interesting:
>>>
>>> OPTION 1 FOR @:
>>> Precedence: same as *
>>> Associativity: left
>>> My shorthand name for it: "same-left" (yes, very creative)
>>>
>>> This means that if you don't use parentheses, you get:
>>>    a @ b @ c  ->  (a @ b) @ c
>>>    a * b @ c  ->  (a * b) @ c
>>>    a @ b * c  ->  (a @ b) * c
>>>
>>> OPTION 2 FOR @:
>>> Precedence: more-weakly-binding than *
>>> Associativity: right
>>> My shorthand name for it: "weak-right"
>>>
>>> This means that if you don't use parentheses, you get:
>>>    a @ b @ c  ->  a @ (b @ c)
>>>    a * b @ c  ->  (a * b) @ c
>>>    a @ b * c  ->  a @ (b * c)
>>>
>>> OPTION 3 FOR @:
>>> Precedence: more-tightly-binding than *
>>> Associativity: right
>>> My shorthand name for it: "tight-right"
>>>
>>> This means that if you don't use parentheses, you get:
>>>    a @ b @ c  ->  a @ (b @ c)
>>>    a * b @ c  ->  a * (b @ c)
>>>    a @ b * c  ->  (a @ b) * c
>>>
>>> We need to pick which of which options we think is best, based on
>>> whatever reasons we can think of, ideally more than "hmm, weak-right gives
>>> me warm fuzzy feelings" ;-). (In principle the other 2 possible options are
>>> tight-left and weak-left, but there doesn't seem to be any argument in
>>> favor of either, so we'll leave them out of the discussion.)
>>>
>>> Some things to consider:
>>>
>>> * and @ are actually not associative (in the math sense) with respect to
>>> each other, i.e., (a * b) @ c and a * (b @ c) in general give different
>>> results when 'a' is not a scalar. So considering the two expressions 'a * b
>>> @ c' and 'a @ b * c', we can see that each of these three options gives
>>> produces different results in some cases.
>>>
>>> "Same-left" is the easiest to explain and remember, because it's just,
>>> "@ acts like * and /". So we already have to know the rule in order to
>>> understand other non-associative expressions like a / b / c or a - b - c,
>>> and it'd be nice if the same rule applied to things like a * b @ c so we
>>> only had to memorize *one* rule. (Of course there's ** which uses the
>>> opposite rule, but I guess everyone internalized that one in secondary
>>> school; that's not true for * versus @.) This is definitely the default we
>>> should choose unless we have a good reason to do otherwise.
>>>
>>> BUT: there might indeed be a good reason to do otherwise, which is the
>>> whole reason this has come up. Consider:
>>>     Mat1 @ Mat2 @ vec
>>> Obviously this will execute much more quickly if we do
>>>     Mat1 @ (Mat2 @ vec)
>>> because that results in two cheap matrix-vector multiplies, while
>>>     (Mat1 @ Mat2) @ vec
>>> starts out by doing an expensive matrix-matrix multiply. So: maybe @
>>> should be right associative, so that we get the fast behaviour without
>>> having to use explicit parentheses! /If/ these kinds of expressions are
>>> common enough that having to remember to put explicit parentheses in all
>>> the time is more of a programmer burden than having to memorize a special
>>> associativity rule for @. Obviously Mat @ Mat @ vec is more common than vec
>>> @ Mat @ Mat, but maybe they're both so rare that it doesn't matter in
>>> practice -- I don't know.
>>>
>>> Also, if we do want @ to be right associative, then I can't think of any
>>> clever reasons to prefer weak-right over tight-right, or vice-versa. For
>>> the scalar multiplication case, I believe both options produce the same
>>> result in the same amount of time. For the non-scalar case, they give
>>> different answers. Do people have strong intuitions about what expressions
>>> like
>>>   a * b @ c
>>>   a @ b * c
>>> should do actually? (I'm guessing not, but hey, you never know.)
>>>
>>> And, while intuition is useful, it would be really *really* nice to be
>>> basing these decisions on more than *just* intuition, since whatever we
>>> decide will be subtly influencing the experience of writing linear algebra
>>> code in Python for the rest of time. So here's where I could use some help.
>>> First, of course, if you have any other reasons why one or the other of
>>> these options is better, then please share! But second, I think we need to
>>> know something about how often the Mat @ Mat @ vec type cases arise in
>>> practice. How often do non-scalar * and np.dot show up in the same
>>> expression? How often does it look like a * np.dot(b, c), and how often
>>> does it look like np.dot(a * b, c)? How often do we see expressions like
>>> np.dot(np.dot(a, b), c), and how often do we see expressions like np.dot(a,
>>> np.dot(b, c))? This would really help guide the debate. I don't have this
>>> data, and I'm not sure the best way to get it. A super-fancy approach would
>>> be to write a little script that uses the 'ast' module to count things
>>> automatically. A less fancy approach would be to just pick some code you've
>>> written, or a well-known package, grep through for calls to 'dot', and make
>>> notes on what you see. (An advantage of the less-fancy approach is that as
>>> a human you might be able to tell the difference between scalar and
>>> non-scalar *, or check whether it actually matters what order the 'dot'
>>> calls are done in.)
>>>
>>> -n
>>>
>>> --
>>> Nathaniel J. Smith
>>> Postdoctoral researcher - Informatics - University of Edinburgh
>>> http://vorpus.org
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> I'm in favor of same-left because it's the easiest to remember.
>> with scalar factors it is how I read formulas.
>>
>
> Note that if there are no (interior) vectors involved then the two methods
> of association give theoretically identical results. But when there is a
> vector on the right and no vector on the left, then right association is
> more efficient and likely more numerically accurate.
>

What's so special about a vector on the right? What if I have a vector on
the left, or, as is pretty common, a quadratic form?

having a different associative rule between
a * b * c
and
A @ B @ C
looks confusing to me.

there is something special about the last array (in numpy)
np.arange(5).dot(np.diag(np.ones(5))).dot(np.arange(10).reshape(5, 2,
order="F"))
np.arange(5).dot(np.diag(np.ones(5))).dot(np.arange(5))
np.arange(5).dot(np.diag(np.ones(5))).dot(np.arange(5).reshape(5, 1))

chains go left to right

Josef




>
>
>> Both calculating dot @ first or calculating elementwise * first sound
>> logical, but I wouldn't know which should go first. (My "feeling" would be
>> @ first.)
>>
>>
>> two cases I remembered in statsmodels
>> H = np.dot(results.model.pinv_wexog, scale[:,None] *
>> results.model.pinv_wexog.T)
>> se = (exog * np.dot(covb, exog.T).T).sum(1)
>>
>> we are mixing * and dot pretty freely in all combinations AFAIR
>>
>> my guess is that I wouldn't trust any sequence without parenthesis for a
>> long time.
>> (and I don't trust a sequence of dots @ without parenthesis either, in
>> our applications.)
>>
>> x @ (W.T @ W) @ x      ( W.shape = (10000, 5) )
>> or
>> x * (W.T @ W) * x
>>
>>
> Judicious use of parenthesis is definitely recommended no matter what is
> decided.
>
>
>>  (w * x) @ x    weighted sum of squares
>>
>>
> Chuck
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

Reply via email to