Re: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions)

Hameer Abbasi Mon, 09 Apr 2018 04:38:19 -0700

I've renamed the kwarg to `initial`. I'm willing to make the object dtype
changes as well, if someone pointed me to relevant bits of code.


Unfortunately, currently, the identity is only used for object dtypes if
the reduction is empty. I think this is to prevent things like `0` being
passed in the sum of objects (and similar cases), which makes sense.

However, with the kwarg, it makes sense to include it in the reduction. I
think the change will be somewhere along the lines of: Detect if `initial`
was passed, if so, include for object, otherwise exclude.

I personally feel `initial` renders `default` redundant. It can be used for
both purposes. I can't think of a reasonable use case where you would want
the default to be different from the initial value. However, I do agree
that fixing the object case is important, we don't want users to get used
to this behaviour and then rely on it later.

Hameer

On Mon, Mar 26, 2018 at 8:09 PM, Sebastian Berg <[email protected]>
wrote:

> On Mon, 2018-03-26 at 17:40 +0000, Eric Wieser wrote:
> > The difficulty in supporting object arrays is that func.reduce(arr,
> > initial=func.identity) and func.reduce(arr) have different meanings -
> > whereas with the current patch, they are equivalent.
> >
>
> True, but the current meaning is:
>
> func.reduce(arr, intial=<NoValue>, default=func.identity)
>
> in the case for object dtype. Luckily for normal dtypes, func.identity
> is both the correct default "default" and a no-op for initial. Thus the
> name "identity" kinda works there. I am also not really sure that both
> kwargs would make real sense (plus initial probably disallows
> default...), but I got some feeling that the "default" meaning may be
> even more useful to simplify special casing the empty case.
>
> Anyway, still just pointing out that I it gives me some headaches to
> see such a special case for objects :(.
>
> - Sebastian
>
>
> >
> > On Mon, 26 Mar 2018 at 10:10 Sebastian Berg <[email protected]
> > et> wrote:
> > > On Mon, 2018-03-26 at 12:59 -0400, Hameer Abbasi wrote:
> > > > That may be complicated. Currently, the identity isn't used in
> > > object
> > > > dtype reductions. We may need to change that, which could cause a
> > > > whole lot of other backwards incompatible changes. For example,
> > > sum
> > > > actually including zero in object reductions. Or we could pass in
> > > a
> > > > flag saying an initializer was passed in to change that
> > > behaviour. If
> > > > this is agreed upon and someone is kind enough to point me to the
> > > > code, I'd be willing to make this change.
> > >
> > > I realize the implication, I am not suggesting to change the
> > > default
> > > behaviour (when no initial=... is passed), I would think about
> > > deprecating it, but probably only if we also have the `default`
> > > argument, since otherwise you cannot replicate the old behaviour.
> > >
> > > What I think I would like to see is to change how it works if (and
> > > only
> > > if) the initializer is passed in. Yes, this will require holding on
> > > to
> > > some extra information since you will have to know/remember whether
> > > the
> > > "identity" was passed in or defined otherwise.
> > >
> > > I did not check the code, but I would hope that it is not awfully
> > > tricky to do that.
> > >
> > > - Sebastian
> > >
> > >
> > > PS: A side note, but I see your emails as a single block of text
> > > with
> > > no/broken new-lines.
> > >
> > >
> > > >  On 26/03/2018 at 18:54,
> > > > Sebastian wrote: On Mon, 2018-03-26 at 18:48 +0200, Sebastian
> > > Berg
> > > > wrote: On Mon, 2018-03-26 at 11:53 -0400, Hameer Abbasi wrote:
> > > It'll
> > > > need to be thought out for object arrays and subclasses. But for
> > > > Regular numeric stuff, Numpy uses fmin and this would have the
> > > > desired
> > > > effect. I do not want to block this, but I would like a clearer
> > > > opinion about this issue, `np.nansum` as Benjamin noted would
> > > require
> > > > something like: np.nansum([np.nan], default=np.nan) because
> > > > np.sum([1], initializer=np.nan) np.nansum([1],
> > > initializer=np.nan)
> > > > would both give NaN if the logic is the same as the current
> > > `np.sum`.
> > > > And yes, I guess for fmin/fmax NaN happens to work. And then
> > > there
> > > > are
> > > > many nonsense reduces which could make sense with `initializer`.
> > > Now
> > > > nansum is not implemented in a way that could make use of the new
> > > > kwarg anyway, so maybe it does not matter in some sense. We can
> > > in
> > > > principle use `default` in nansum and at some point possibly add
> > > > `default` to the normal ufuncs. If we argue like that, the only
> > > > annoying thing is the `object` dtype which confuses the two use
> > > cases
> > > > currently. This confusion IMO is not harmless, because I might
> > > want
> > > > to
> > > > use it (e.g. sum with initializer=5), and I would expect things
> > > like
> > > > dropping in `decimal.Decimal` to work most of the time, while
> > > here it
> > > > would give silently bad results. In other words: I am very very
> > > much
> > > > in favor if you get rid that object dtype special case. I frankly
> > > not
> > > > see why not (except that it needs a bit more code change). If
> > > given
> > > > explicitly, we might as well force the use and not do the funny
> > > stuff
> > > > which is designed to be more type agnostic! If it happens to fail
> > > due
> > > > to not being type agnostic, it will at least fail loudly. If you
> > > > leave
> > > > that object special case I am *very* hesitant about it. That I
> > > think
> > > > I
> > > > would like a `default` argument as well, is another issue and it
> > > can
> > > > wait to another day. - Sebastian - Sebastian On 26/03/2018 at
> > > 17:45,
> > > > Sebastian wrote: On Mon, 2018-03-26 at 11:39 -0400, Hameer Abbasi
> > > > wrote: That is the idea, but NaN functions are in a separate
> > > branch
> > > > for another PR to be discussed later. You can see it on my fork,
> > > if
> > > > you're interested. Except that as far as I understand I am not
> > > sure
> > > > it
> > > > will help much with it, since it is not a default, but an
> > > > initializer.
> > > > Initializing to NaN would just make all results NaN. - Sebastian
> > > On
> > > > 26/03/2018 at 17:35, Benjamin wrote: Hmm, this is neat. I imagine
> > > it
> > > > would finally give some people a choice on what
> > > np.nansum([np.nan])
> > > > should return? It caused a huge hullabeloo a few years ago when
> > > we
> > > > changed it from returning NaN to returning zero. Ben Root On Mon,
> > > Mar
> > > > 26, 2018 at 11:16 AM, Sebastian Berg <[email protected]>
> > > > wrote: OK, the new documentation is actually clear: initializer :
> > > > scalar, optional The value with which to start the reduction.
> > > > Defaults
> > > > to the `~numpy.ufunc.identity` of the ufunc. If ``None`` is
> > > given,
> > > > the
> > > > first element of the reduction is used, and an error is thrown if
> > > the
> > > > reduction is empty. If ``a.dtype`` is ``object``, then the
> > > > initializer
> > > > is _only_ used if reduction is empty. I would actually like to
> > > say
> > > > that I do not like the object special case much (and it is
> > > probably
> > > > the reason why I was confused), nor am I quite sure this is what
> > > > helps
> > > > a lot? Logically, I would argue there are two things: 1.
> > > > initializer/start (always used) 2. default (oly used for empty
> > > > reductions) For example, I might like to give `np.nan` as the
> > > default
> > > > for some empty reductions, this will not work. I understand that
> > > this
> > > > is a minimal invasive PR and I am not sure I find the solution
> > > bad
> > > > enough to really dislike it, but what do other think? My first
> > > > expectation was the default behaviour (in all cases, not just
> > > object
> > > > case) for some reason. To be honest, for now I just wonder a bit:
> > > How
> > > > hard would it be to do both, or is that too annoying? It would at
> > > > least get rid of that annoying thing with object ufuncs (which
> > > > currently have a default, but not really an
> > > identity/initializer).
> > > > Best, Sebastian On Mon, 2018-03-26 at 08:20 -0400, Hameer Abbasi
> > > > wrote: > Actually, the behavior right now isn’t that of `default`
> > > but
> > > > that of > `initializer` or `start`. > > This was discussed
> > > further
> > > > down in the PR but to reiterate: > `np.sum([10], initializer=5)`
> > > > becomes `15`. > > Also, `np.min([5], initializer=0)` becomes `0`,
> > > so
> > > > it isn’t really > the default value, it’s the initial value among
> > > > which the reduction > is performed. > > This was the reason to
> > > call
> > > > it
> > > > initializer in the first place. I like > `initial` and
> > > > `initial_value`
> > > > as well, and `start` also makes sense > but isn’t descriptive
> > > enough.
> > > > > > Hameer > Sent from Astro for Mac > > > On Mar 26, 2018 at
> > > 12:06,
> > > >
> > > > Sebastian Berg <[email protected] > > t> wrote: > > > >
> > > > Initializer or this sounds fine to me. As an other data point
> > > which >
> > > > > I > > think has been mentioned before, `sum` uses start and
> > > min/max
> > > >
> > > > use > > default. `start` does not work, unless we also change the
> > > > code
> > > > to > > always use the identity if given (currently that is not
> > > the
> > > > case), > > in > > which case it might be nice. However, "start"
> > > seems
> > > > a bit like > > solving > > a different issue in any case. > > > >
> > > > Anyway, mostly noise. I really like adding this, the only thing >
> > > >
> > > > worth > > discussing a bit is the name :). - Sebastian > > > > >
> > > > On
> > > > Mon, 2018-03-26 at 05:57 -0400, Hameer Abbasi wrote: > > > It
> > > calls
> > > > it
> > > > `initializer` - See https://docs.python.org/3.5/libra > > > ry/f
> > > > >
> > > > >
> > > > unctools.html#functools.reduce > > > > > > Sent from Astro for
> > > Mac On
> > > > Mar 26, 2018 at 09:54, Eric Wieser <wieser.eric+numpy@gmail. > >
> > > com>
> > > > > > > > wrote: > > > > > > > > It turns out I mispoke -
> > > >
> > > > functools.reduce calls the argument > > > > `initial` > > > > > >
> > > >
> > > > On
> > > > Mon, 26 Mar 2018 at 00:17 Stephan Hoyer <[email protected]> > > >
> > > > wrote: > > > > > This looks like a very logical addition to the
> > > > reduce
> > > > interface. > > > > > It has my support! > > > > > > > > > I would
> > > > have
> > > > preferred the more descriptive name > > > > > "initial_value", >
> > > > >
> > > > >
> > > > > but consistency with functools.reduce makes a compelling case >
> > > > >
> > > > > > for > > > > > "initializer". > > > > > On Sun, Mar 25, 2018
> > > at
> > > >
> > > > 1:15 PM Eric Wieser <wieser.eric+nump y@gm > > > > > ail.com>
> > > wrote:
> > > > To reiterate my comments in the issue - I'm in favor of > this. >
> > > > >
> > > > > > > > > > > > > It seems seem especially valuable for identity-
> > > less
> > > > > > > > > >
> > > > > > > > > > functions > > > > (`min`, `max`, `lcm`), and the
> > > argument
> > > >
> > > > name is consistent > > > with > `functools.reduce`. too. > > > >
> > > > >
> > > > >
> > > > > > > > > The only argument I can see against merging this would
> > > be >
> > > > > > > > > `kwarg`-creep of `reduce`, and I think this has enough
> > > use
> > > > > > > > > >
> > > > > > >
> > > > > > > cases to justify that. > > > > > > > > > > > > I'd like to
> > > > > > > merge
> > > >
> > > > in a few days, if no one else has any > > > > > > opinions. > > >
> > > > >
> > > > > Eric > > > > > > > > > > > > On Fri, 16 Mar 2018 at 10:13
> > > Hameer
> > > >
> > > > Abbasi <einstein.edison > > > > > > @gma > > > > > > il.com>
> > > wrote:
> > > > Hello, everyone. I’ve submitted a PR to add a initializer kwarg
> > > to
> > > > ufunc.reduce. This is useful in a few cases, e.g., > > > > > > >
> > > it
> > > > allows one to supply a “default” value for identity- > > > > > >
> > > >
> > > > less > > > > > > > ufunc reductions, and specify an initial value
> > > for
> > > > > > > > > > > reductions such as sum (other than zero.) > > > > >
> > > > >
> > > > > > > > >
> > > > > > > > > Please feel free to review or leave feedback, (although
> > > I >
> > > > > > > >
> > > > > > > > think Eric and Marten have picked it apart pretty well).
> > > > >
> > > > > > > > >
> > > > >
> > > > > https://github.com/numpy/numpy/pull/10635 > > > > > Thanks, > >
> > > > >
> > > >
> > > > Hameer > > > > Sent from Astro for Mac > > > > >
> > > > _______________________________________________ > > > NumPy-
> > > > Discussion
> > > > mailing list > > > > > > > [email protected] > > > > >
> > > > >
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > >
> > > >
> > > > _______________________________________________ > > > > >
> > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion@python
> > > .o
> > > > rg
> > > > https://mail.python.org/mailman/listinfo/numpy-discussi on
> > > > _______________________________________________ > > > >
> > > > NumPy-Discussion mailing list > > > > > [email protected]
> > > rg
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion > >
> > > > _______________________________________________ > > > >
> > > > NumPy-Discussion mailing list > > > > [email protected]
> > >  >
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > >
> > > > _______________________________________________ > > > NumPy-
> > > > Discussion mailing list > > > [email protected] > > >
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > >
> > > > _______________________________________________ > > NumPy-
> > > Discussion
> > > > mailing list > > [email protected] > >
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion > >
> > > > _______________________________________________ > NumPy-
> > > Discussion
> > > > mailing list > [email protected] >
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > _______________________________________________ NumPy-Discussion
> > > > mailing list [email protected]
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > _______________________________________________ NumPy-Discussion
> > > > mailing list [email protected]
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > _______________________________________________ NumPy-Discussion
> > > > mailing list [email protected]
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > _______________________________________________ NumPy-Discussion
> > > > mailing list [email protected]
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > [email protected]
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion________
> > > _______________________________________
> > > NumPy-Discussion mailing list
> > > [email protected]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [email protected]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>

_______________________________________________
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions)

Reply via email to