On Thu, 2022-10-20 at 23:26 +0000, ntess...@pm.me wrote:
> Hello,
> 

Hi,

I don't have a strong opinion yet it does seem potentially useful, but
I think there would be some details to hash out in the proposal.

Some thoughts:

* `np.add.at` should be able to do what you want (but of course is very
  slow right now, and maybe hard to get as fast as bincount even if
  improved).
* `out=` by itself usually does not mean to use the values in `out`.
  So I think you would need either a different name or another flag
  to indicate use of `out` (rather than overwriting it).
* bincount resizes its output dynamically, however, if you provide an
  output then resizing is not really feasible.

Probably you can find a design that solves this.  If you always add
only a moderate amount of points `np.add.at(tally, indices, weights)`
may just be a good solution?  (It is "very" slow, but if the problem is
having the giant `tally` array, then a factor of 10 "too slow" probably
doesn't matter)

- Sebastian


> I would like to propose adding the `out` array as an optional
> parameter to `bincount`.  This makes `bincount` very useful when
> iteratively tallying data with large indices.
> 
> Consider this example tallying batches of values from some fictional
> source of data:
> 
> > > > tally = np.zeros(10000**2)
> > > > for indices, weights in read_sensor_data():
> ...    tally += np.bincount(indices, weights, 10000**2)  # slow:
> repeatedly adding large arrays
> 
> This could be trivially sped up:
> 
> > > > tally = np.zeros(10000**2)
> > > > for indices, weights in read_sensor_data():
> ...    np.bincount(indices, weights, out=tally)  # fast: plain sum-
> loop in C
> 
> As far as I can see, there is no equivalent numpy functionality. In
> fact, as far as I'm aware, there isn't any fast alternative outside
> of C/Cython/numba/... It also fits the purpose of `bincount` nicely,
> and does not change existing functionality. One might argue about the
> exact semantics if both `minlength` and `out` are given, but I think
> that a sensible answer exists in requiring `len(out) >=
> max(list.max(), minlength)`.
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net
> 


_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to