[
https://issues.apache.org/jira/browse/ARROW-13013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359495#comment-17359495
]
David Li commented on ARROW-13013:
----------------------------------
Quick notes from a discussion: this would mean a pure-C++ build wouldn't be
able to run all the tests. But note some kernels are already in this position;
the bulk of their tests are in Python for convenience.
Also while I was initially concerned about the impact on the local dev
workflow, I think this will be a net improvement. For one, you don't have to
rebuild PyArrow itself, only libarrow, to get the updated tests. And for
another, because the current C++ kernel tests squash all the various tests into
one build target (or, well, one target per kernel type), rebuilding (and
especially linking) that target takes a long time, and if you touch a common
header file, you're rebuilding all the tests for all the kernels - being able
to avoid that would be nice. (Of course, you could imagine splitting the C++
test targets further as well.)
> [C++][Compute][Python] Move (majority of) kernel unit tests to python
> ---------------------------------------------------------------------
>
> Key: ARROW-13013
> URL: https://issues.apache.org/jira/browse/ARROW-13013
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, Python
> Reporter: Ben Kietzman
> Priority: Major
>
> mailing list discussion:
> https://lists.apache.org/thread.html/r09e0e0fbb8b655bbec8cf5662d224f3dfc4fba894a312900f73ae3bf%40%3Cdev.arrow.apache.org%3E
> Writing unit tests for compute functions in c++ is laborious, entails a lot
> of boilerplate, and slows iteration since it requires recompilation when
> adding new tests. The majority of these test cases need not be written in C++
> at all and could instead be made part of the pyarrow test suite.
> In order to make the kernels' C++ implementations easily debuggable from unit
> tests, we'll have to expose a c++ function named {{AssertCallFunction}} or
> so. {{AssertCallFunction}} will invoke the named compute::Function and
> compare actual results to expected without crossing the C++/python boundary,
> allowing a developer to step through all relevant code with a single
> breakpoint in GDB. Construction of scalars/arrays/function options and any
> other inputs to the function is amply supported by {{pyarrow}}, and will
> happen outside the scope of {{AssertCallFunction}}.
> {{AssertCallFunction}} should not try to derive additional assertions from
> its arguments - for example {{CheckScalar("add", {left, right}, expected)}}
> will first assert that {{left + right == expected}} then {{left.slice(1) +
> right.slice(1) == expected.slice(1)}} to ensure that offsets are handled
> correctly. This has value but can be easily expressed in Python and
> configuration of such behavior would overcomplicate the interface of
> {{AssertCallFunction}}.
> NB: Some unit tests will probably still reside in C++ since we'll need to
> test things we don't wish to expose in a user facing API, such as "whether a
> boolean kernel avoids clobbering bits when outputting into a slice". These
> should be far more manageable since they won't need to assert correct logic
> across all possible input types
--
This message was sent by Atlassian Jira
(v8.3.4#803005)