[GitHub] [arrow] cyb70289 commented on a change in pull request #9604: ARROW-11567: [C++][Compute] Improve variance kernel precision

GitBox Mon, 01 Mar 2021 23:23:23 -0800


cyb70289 commented on a change in pull request #9604:
URL: https://github.com/apache/arrow/pull/9604#discussion_r585316069




##########
File path: cpp/src/arrow/compute/kernels/aggregate_test.cc
##########
@@ -1205,6 +1205,21 @@ TEST_F(TestVarStdKernelMergeStability, Basics) {
 #endif
 }
 
+// Test round-off error
+class TestVarStdKernelRoundOff : public TestPrimitiveVarStdKernel<DoubleType> 
{};
+
+TEST_F(TestVarStdKernelRoundOff, Basics) {
+  // build array: np.arange(321000, dtype='float64')
+  double value = 0;
+  ASSERT_OK_AND_ASSIGN(
+      auto array, ArrayFromBuilderVisitor(float64(), 321000, 
[&](DoubleBuilder* builder) {
+        builder->UnsafeAppend(value++);
+      }));
+
+  // reference value from numpy.var()
+  this->AssertVarStdIs(*array, VarianceOptions{0}, 8586749999.916667);
+}

Review comment:
       Hmm... some difference from numpy. Our variance kernel always returns 
`double` for any input type. Numpy can select output dtype, by default, it 
returns `float32` for `float32`, `double` for any other types. 
https://numpy.org/doc/stable/reference/generated/numpy.var.html
   
   I would prefer always returning `double`. It's simpler and looks more 
reasonable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] cyb70289 commented on a change in pull request #9604: ARROW-11567: [C++][Compute] Improve variance kernel precision

Reply via email to