[
https://issues.apache.org/jira/browse/ARROW-11758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293497#comment-17293497
]
Yibo Cai commented on ARROW-11758:
----------------------------------
Current sum kernel code is after several optimizations (manual unroll, simd,
bitblockcounter, etc).
There will be radical changes to adopt pairwise summation, most existing code
may be dropped or refactored.
cc [~apitrou] [~frank.du]
> [C++][Compute] Summation kernel round-off error
> -----------------------------------------------
>
> Key: ARROW-11758
> URL: https://issues.apache.org/jira/browse/ARROW-11758
> Project: Apache Arrow
> Issue Type: Bug
> Reporter: Yibo Cai
> Assignee: Yibo Cai
> Priority: Major
>
> From below test, summation kernel is of lower precision than numpy.sum.
> Numpy implements pairwise summation [1] with O(logn) round-off error, better
> than O(n) error from naive summation.
> *sum.py*
> {code:python}
> import numpy as np
> import pyarrow.compute as pc
> t = np.arange(321000, dtype='float64')
> t2 = t - np.mean(t)
> t2 *= t2
> print('numpy sum:', np.sum(t2))
> print('arrow sum:', pc.sum(t2))
> {code}
> *test result*
> {noformat}
> # Verified with wolfram alpha (arbitrary precision), Numpy's result is
> correct.
> $ ARROW_USER_SIMD_LEVEL=SSE4_2 python sum.py
> numpy sum: 2756346749973250.0
> arrow sum: 2756346749973248.0
> $ ARROW_USER_SIMD_LEVEL=AVX2 python sum.py
> numpy sum: 2756346749973250.0
> arrow sum: 2756346749973249.0
> {noformat}
> [1] https://en.wikipedia.org/wiki/Pairwise_summation
--
This message was sent by Atlassian Jira
(v8.3.4#803005)