Ben Kietzman created ARROW-13013:
------------------------------------
Summary: [C++][Compute][Python] Move (majority of) kernel unit
tests to python
Key: ARROW-13013
URL: https://issues.apache.org/jira/browse/ARROW-13013
Project: Apache Arrow
Issue Type: Improvement
Components: C++, Python
Reporter: Ben Kietzman
mailing list discussion:
https://lists.apache.org/thread.html/r09e0e0fbb8b655bbec8cf5662d224f3dfc4fba894a312900f73ae3bf%40%3Cdev.arrow.apache.org%3E
Writing unit tests for compute functions in c++ is laborious, entails a lot of
boilerplate, and slows iteration since it requires recompilation when adding
new tests. The majority of these test cases need not be written in C++ at all
and could instead be made part of the pyarrow test suite.
In order to make the kernels' C++ implementations easily debuggable from unit
tests, we'll have to expose a c++ function named {{AssertCallFunction}} or so.
{{AssertCallFunction}} will invoke the named compute::Function and compare
actual results to expected without crossing the C++/python boundary, allowing a
developer to step through all relevant code with a single breakpoint in GDB.
Construction of scalars/arrays/function options and any other inputs to the
function is amply supported by {{pyarrow}}, and will happen outside the scope
of {{AssertCallFunction}}.
{{AssertCallFunction}} should not try to derive additional assertions from its
arguments - for example {{CheckScalar("add", {left, right}, expected)}} will
first assert that {{left + right == expected}} then {{left.slice(1) +
right.slice(1) == expected.slice(1)}} to ensure that offsets are handled
correctly. This has value but can be easily expressed in Python and
configuration of such behavior would overcomplicate the interface of
{{AssertCallFunction}}.
NB: Some unit tests will probably still reside in C++ since we'll need to test
things we don't wish to expose in a user facing API, such as "whether a boolean
kernel avoids clobbering bits when outputting into a slice". These should be
far more manageable since they won't need to assert correct logic across all
possible input types
--
This message was sent by Atlassian Jira
(v8.3.4#803005)