On Wed, May 31, 2023 at 12:37 PM Devulapalli, Raghuveer <
raghuveer.devulapa...@intel.com> wrote:

> I wouldn’t the discount the performance impact on real world benchmarks
> for these functions. Just to name a couple of examples:
>
>
>
>    - 7x speed up of np.exp and np.log results in a 2x speed up of
>    training neural networks like logistic regression [1]. I would expect
>    np.tanh will show similar results for neural networks.
>    - Vectorizing even simple functions like np.maximum results in a 1.3x
>    speed up of sklearn’s Kmeans algorithm [2]
>
>
>
> Raghuveer
>
>
>
> [1] https://github.com/numpy/numpy/pull/13134
>
> [2] https://github.com/numpy/numpy/pull/14867
>

Perfect, those are precisely the concrete use cases I would want to see so
we can talk about the actual ramifications of the changes.

These particular examples suggest to me that a module or package providing
fast-inaccurate functions would be a good idea, but not across-the-board
fast-inaccurate implementations (though it's worth noting that the
exp/log/maximum replacements that you cite don't seem to be particularly
inaccurate). The performance improvements show up in situational use cases.
Logistic regression is not really a neural network (unless if you squint
real hard) so the loss function does take a significant amount of whole
function performance; the activation and loss functions of real neural
networks take up a rather small amount of time compared to the matmuls.
Nonetheless, people do optimize activation functions, but often by avoiding
special functions entirely with ReLUs (which have other benefits in terms
of nice gradients). Not sure anyone really uses tanh for serious work.

ML is a perfect use case for *opt-in* fast-inaccurate implementations. The
whole endeavor is to replace complicated computing logic with a smaller
number of primitives that you can optimize the hell out of, and let the
model size and training data size handle the complications. And a few
careful choices by people implementing the marquee packages can have a
large effect. In the case of transcendental activation functions in NNs, if
you really want to optimize them, it's a good idea to trade *a lot* of
accuracy (measured in %, not ULPs) for performance, in addition to doing it
on GPUs. And that makes changes to the `np.*` implementations mostly
irrelevant for them, and you can get that performance without making anyone
else pay for it.

Does anyone have compelling concrete use cases for accelerated trig
functions, per se, rather than exp/log and friends? I'm more on board with
accelerating those than trig functions because of their role in ML and
statistics (I'd still *prefer* to opt in, though). They don't have many
special values (which usually have alternates like expm1 and log1p to get
better precision in any case). But for trig functions, I'm much more likely
to be doing geometry where I'm with Archimedes: do not disturb my circles!

-- 
Robert Kern
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to