On Thu, 2022-10-20 at 23:26 +0000, ntess...@pm.me wrote: > Hello, > Hi,
I don't have a strong opinion yet it does seem potentially useful, but I think there would be some details to hash out in the proposal. Some thoughts: * `np.add.at` should be able to do what you want (but of course is very slow right now, and maybe hard to get as fast as bincount even if improved). * `out=` by itself usually does not mean to use the values in `out`. So I think you would need either a different name or another flag to indicate use of `out` (rather than overwriting it). * bincount resizes its output dynamically, however, if you provide an output then resizing is not really feasible. Probably you can find a design that solves this. If you always add only a moderate amount of points `np.add.at(tally, indices, weights)` may just be a good solution? (It is "very" slow, but if the problem is having the giant `tally` array, then a factor of 10 "too slow" probably doesn't matter) - Sebastian > I would like to propose adding the `out` array as an optional > parameter to `bincount`. This makes `bincount` very useful when > iteratively tallying data with large indices. > > Consider this example tallying batches of values from some fictional > source of data: > > > > > tally = np.zeros(10000**2) > > > > for indices, weights in read_sensor_data(): > ... tally += np.bincount(indices, weights, 10000**2) # slow: > repeatedly adding large arrays > > This could be trivially sped up: > > > > > tally = np.zeros(10000**2) > > > > for indices, weights in read_sensor_data(): > ... np.bincount(indices, weights, out=tally) # fast: plain sum- > loop in C > > As far as I can see, there is no equivalent numpy functionality. In > fact, as far as I'm aware, there isn't any fast alternative outside > of C/Cython/numba/... It also fits the purpose of `bincount` nicely, > and does not change existing functionality. One might argue about the > exact semantics if both `minlength` and `out` are given, but I think > that a sensible answer exists in requiring `len(out) >= > max(list.max(), minlength)`. > _______________________________________________ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: sebast...@sipsolutions.net > _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com