[
https://issues.apache.org/jira/browse/ARROW-13013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392537#comment-17392537
]
David Li commented on ARROW-13013:
----------------------------------
Another consideration here (from [~edponce]) is that it would be nice if we
make it easy & consistent to test all the different argument types for a
function. Right now, with Googletest, we have some parameterization, but it is
not done very consistently, and the parameterization methods we use make it
hard to support some types. For instance, we might template a set of tests on
all numeric types, but we can't extend those tests to cover decimals, because
1) the literal syntax is different and 2) the type is parameterized (and so
just templating on the type's class, as we most commonly do, is insufficient).
We can solve this in C++, but if we're going to rewrite all the tests anyways,
we should keep this in mind (either if we reorganize the C++ tests or port them
elsewhere).
On top of that, it would be nice if the solution we had made it easier to track
completeness of type support for kernels so we could easily see where support
might be lacking. Right now this is sort of encoded into the tests and API docs
but these are either hard to read through or get out of sync easily. Ideally
every kernel would support every (applicable) type but while we are working
towards that goal it would be nice to have an easy way to know how close we are
to it.
> [C++][Compute][Python] Move (majority of) kernel unit tests to python
> ---------------------------------------------------------------------
>
> Key: ARROW-13013
> URL: https://issues.apache.org/jira/browse/ARROW-13013
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, Python
> Reporter: Ben Kietzman
> Priority: Major
>
> mailing list discussion:
> [https://lists.apache.org/thread.html/r09e0e0fbb8b655bbec8cf5662d224f3dfc4fba894a312900f73ae3bf%40%3Cdev.arrow.apache.org%3E]
> Writing unit tests for compute functions in c++ is laborious, entails a lot
> of boilerplate, and slows iteration since it requires recompilation when
> adding new tests. The majority of these test cases need not be written in C++
> at all and could instead be made part of the pyarrow test suite.
> In order to make the kernels' C++ implementations easily debuggable from unit
> tests, we'll have to expose a c++ function named {{AssertCallFunction}} or
> so. {{AssertCallFunction}} will invoke the named compute::Function and
> compare actual results to expected without crossing the C++/python boundary,
> allowing a developer to step through all relevant code with a single
> breakpoint in GDB. Construction of scalars/arrays/function options and any
> other inputs to the function is amply supported by {{pyarrow}}, and will
> happen outside the scope of {{AssertCallFunction}}.
> {{AssertCallFunction}} should not try to derive additional assertions from
> its arguments - for example {{CheckScalar("add",
> {left, right}
> , expected)}} will first assert that {{left + right == expected}} then
> {{left.slice(1) + right.slice(1) == expected.slice(1)}} to ensure that
> offsets are handled correctly. This has value but can be easily expressed in
> Python and configuration of such behavior would overcomplicate the interface
> of {{AssertCallFunction}}.
> Unit tests for kernels would then be written in
> {{arrow/python/pyarrow/test/kernels/test_*.py}}. The C++ unit test for
> [addition with implicit
> casts|https://github.com/apache/arrow/blob/b38ab81cb96e393a026d05a22e5a2f62ff6c23d7/cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc#L897-L918]
> could then be rewritten as
> {code:python}
> def test_addition_implicit_casts():
> AssertCallFunction("add", [[ 0, 1, 2, None],
> [ 0.25, 1.5, 2.75, None]],
> expected=[0.25, 1.5, 2.75, None])
> # ...
> {code}
> NB: Some unit tests will probably still reside in C++ since we'll need to
> test things we don't wish to expose in a user facing API, such as "whether a
> boolean kernel avoids clobbering bits when outputting into a slice". These
> should be far more manageable since they won't need to assert correct logic
> across all possible input types
--
This message was sent by Atlassian Jira
(v8.3.4#803005)