[jira] [Commented] (ARROW-13013) [C++][Compute][Python] Move (majority of) kernel unit tests to python

David Li (Jira) Tue, 03 Aug 2021 13:53:07 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-13013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392537#comment-17392537
 ]


David Li commented on ARROW-13013:
----------------------------------

Another consideration here (from [~edponce]) is that it would be nice if we 
make it easy & consistent to test all the different argument types for a 
function. Right now, with Googletest, we have some parameterization, but it is 
not done very consistently, and the parameterization methods we use make it 
hard to support some types. For instance, we might template a set of tests on 
all numeric types, but we can't extend those tests to cover decimals, because 
1) the literal syntax is different and 2) the type is parameterized (and so 
just templating on the type's class, as we most commonly do, is insufficient). 
We can solve this in C++, but if we're going to rewrite all the tests anyways, 
we should keep this in mind (either if we reorganize the C++ tests or port them 
elsewhere).

On top of that, it would be nice if the solution we had made it easier to track 
completeness of type support for kernels so we could easily see where support 
might be lacking. Right now this is sort of encoded into the tests and API docs 
but these are either hard to read through or get out of sync easily. Ideally 
every kernel would support every (applicable) type but while we are working 
towards that goal it would be nice to have an easy way to know how close we are 
to it.

> [C++][Compute][Python] Move (majority of) kernel unit tests to python
> ---------------------------------------------------------------------
>
>                 Key: ARROW-13013
>                 URL: https://issues.apache.org/jira/browse/ARROW-13013
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>            Reporter: Ben Kietzman
>            Priority: Major
>
> mailing list discussion: 
> [https://lists.apache.org/thread.html/r09e0e0fbb8b655bbec8cf5662d224f3dfc4fba894a312900f73ae3bf%40%3Cdev.arrow.apache.org%3E]
> Writing unit tests for compute functions in c++ is laborious, entails a lot 
> of boilerplate, and slows iteration since it requires recompilation when 
> adding new tests. The majority of these test cases need not be written in C++ 
> at all and could instead be made part of the pyarrow test suite.
> In order to make the kernels' C++ implementations easily debuggable from unit 
> tests, we'll have to expose a c++ function named {{AssertCallFunction}} or 
> so. {{AssertCallFunction}} will invoke the named compute::Function and 
> compare actual results to expected without crossing the C++/python boundary, 
> allowing a developer to step through all relevant code with a single 
> breakpoint in GDB. Construction of scalars/arrays/function options and any 
> other inputs to the function is amply supported by {{pyarrow}}, and will 
> happen outside the scope of {{AssertCallFunction}}.
> {{AssertCallFunction}} should not try to derive additional assertions from 
> its arguments - for example {{CheckScalar("add",
> {left, right}
> , expected)}} will first assert that {{left + right == expected}} then 
> {{left.slice(1) + right.slice(1) == expected.slice(1)}} to ensure that 
> offsets are handled correctly. This has value but can be easily expressed in 
> Python and configuration of such behavior would overcomplicate the interface 
> of {{AssertCallFunction}}.
> Unit tests for kernels would then be written in 
> {{arrow/python/pyarrow/test/kernels/test_*.py}}. The C++ unit test for 
> [addition with implicit 
> casts|https://github.com/apache/arrow/blob/b38ab81cb96e393a026d05a22e5a2f62ff6c23d7/cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc#L897-L918]
>  could then be rewritten as
> {code:python}
> def test_addition_implicit_casts():
>     AssertCallFunction("add", [[ 0,    1,   2,    None],
>                                [ 0.25, 1.5, 2.75, None]],
>                        expected=[0.25, 1.5, 2.75, None])
>     # ...
> {code}
> NB: Some unit tests will probably still reside in C++ since we'll need to 
> test things we don't wish to expose in a user facing API, such as "whether a 
> boolean kernel avoids clobbering bits when outputting into a slice". These 
> should be far more manageable since they won't need to assert correct logic 
> across all possible input types



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13013) [C++][Compute][Python] Move (majority of) kernel unit tests to python

Reply via email to