[ 
https://issues.apache.org/jira/browse/ARROW-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-6923.
-------------------------------
    Fix Version/s:     (was: 1.0.0)
       Resolution: Duplicate

There is currently a option between dropping nulls or emitting a null when 
there is a null in the filter vector. If there are more options desired, let's 
open new a new JIRA issue

> [C++] Option for Filter kernel how to handle nulls in the selection vector
> --------------------------------------------------------------------------
>
>                 Key: ARROW-6923
>                 URL: https://issues.apache.org/jira/browse/ARROW-6923
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Joris Van den Bossche
>            Priority: Major
>
> How nulls are handled in the boolean mask (selection vector) in a filter 
> kernel varies between languages / data analytics systems (e.g. base R 
> propagates nulls, dplyr R skips (sees as False), SQL generally skips them as 
> well I think, Julia raises an error).
> Currently, in Arrow C++ we "propagate" nulls (null in the selection vector 
> gives a null in the output):
> {code}
> In [7]: arr = pa.array([1, 2, 3]) 
> In [8]: mask = pa.array([True, False, None]) 
> In [9]: arr.filter(mask) 
> Out[9]: 
> <pyarrow.lib.Int64Array object at 0x7fefe44b3048>
> [
>   1,
>   null
> ]
> {code}
> Given the different ways this could be done (propagate, skip, error), should 
> we provide an option to control this behaviour?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to