cyb70289 commented on a change in pull request #9604: URL: https://github.com/apache/arrow/pull/9604#discussion_r585316069
########## File path: cpp/src/arrow/compute/kernels/aggregate_test.cc ########## @@ -1205,6 +1205,21 @@ TEST_F(TestVarStdKernelMergeStability, Basics) { #endif } +// Test round-off error +class TestVarStdKernelRoundOff : public TestPrimitiveVarStdKernel<DoubleType> {}; + +TEST_F(TestVarStdKernelRoundOff, Basics) { + // build array: np.arange(321000, dtype='float64') + double value = 0; + ASSERT_OK_AND_ASSIGN( + auto array, ArrayFromBuilderVisitor(float64(), 321000, [&](DoubleBuilder* builder) { + builder->UnsafeAppend(value++); + })); + + // reference value from numpy.var() + this->AssertVarStdIs(*array, VarianceOptions{0}, 8586749999.916667); +} Review comment: Hmm... some difference from numpy. Our variance kernel always returns `double` for any input type. Numpy can select output dtype, by default, it returns `float32` for `float32`, `double` for any other types. https://numpy.org/doc/stable/reference/generated/numpy.var.html I would prefer always returning `double`. It's simpler and looks more reasonable. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org