[
https://issues.apache.org/jira/browse/ARROW-12960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ian Cook updated ARROW-12960:
-----------------------------
Description:
(This is the flip side of ARROW-12959.)
Currently the Arrow compute kernel {{is_nan}} always treats {{null}} as a
missing value, returning {{null}} at positions of the input datum with {{null}}
(missing) values.
It would be helpful to be able to control this behavior with an option. The
option could be named {{value_for_null}} or something similar. It would
default to {{null}}, consistent with current behavior. When set to {{false}} or
{{true}}, it would return {{false}} or {{true}} at positions of the input datum
with {{null}} values.
Among other things, this would enable the {{arrow}} R package to evaluate
{{is.nan()}} consistently with the way base R does. In base R, {{is.nan()}}
returns {{FALSE}} on {{NA}}. But in the {{arrow}} R package, it returns {{NA}}:
{code:r}
> is.nan(c(3.14, NA, NaN))
##[1] FALSE FALSE TRUE
as.vector(is.nan(Array$create(c(3.14, NA, NaN))))
##[1] FALSE NA TRUE{code}
I think solving this with an option in the C++ kernel is the best solution,
because I suspect there are other cases in which users would want the ability
to return all non-missing values in the output from {{is_nan}} without needing
to call another kernel to fill the missing values in. However, it would also be
possible to solve this just in the R package, by changing the mapping of
{{is.nan}} in the R package. If we choose to go that route, we should change
this Jira issue summary to "[R] Make is.nan(NA) consistent with base R".
was:
(This is the flip side of ARROW-12959.)
Currently the Arrow compute kernel {{is_nan}} always treats {{null}} as a
missing value, returning {{null}} at positions of the input datum with {{null}}
(missing) values.
It would be helpful to be able to control this behavior with an option. The
option could be named {{value_when_null}} or something similar. It would
default to {{null}}, consistent with current behavior. When set to {{false}} or
{{true}}, it would return {{false}} or {{true}} at positions of the input datum
with {{null}} values.
Among other things, this would enable the {{arrow}} R package to evaluate
{{is.nan()}} consistently with the way base R does. In base R, {{is.nan()}}
returns {{FALSE}} on {{NA}}. But in the {{arrow}} R package, it returns {{NA}}:
{code:r}
> is.nan(c(3.14, NA, NaN))
##[1] FALSE FALSE TRUE
as.vector(is.nan(Array$create(c(3.14, NA, NaN))))
##[1] FALSE NA TRUE{code}
I think solving this with an option in the C++ kernel is the best solution,
because I suspect there are other cases in which users would want the ability
to return all non-missing values in the output from {{is_nan}} without needing
to call another kernel to fill the missing values in. However, it would also be
possible to solve this just in the R package, by changing the mapping of
{{is.nan}} in the R package. If we choose to go that route, we should change
this Jira issue summary to "[R] Make is.nan(NA) consistent with base R".
> [C++][R] Option for is_nan(null) to evaluate to false
> -----------------------------------------------------
>
> Key: ARROW-12960
> URL: https://issues.apache.org/jira/browse/ARROW-12960
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, R
> Reporter: Ian Cook
> Priority: Major
>
> (This is the flip side of ARROW-12959.)
> Currently the Arrow compute kernel {{is_nan}} always treats {{null}} as a
> missing value, returning {{null}} at positions of the input datum with
> {{null}} (missing) values.
> It would be helpful to be able to control this behavior with an option. The
> option could be named {{value_for_null}} or something similar. It would
> default to {{null}}, consistent with current behavior. When set to {{false}}
> or {{true}}, it would return {{false}} or {{true}} at positions of the input
> datum with {{null}} values.
> Among other things, this would enable the {{arrow}} R package to evaluate
> {{is.nan()}} consistently with the way base R does. In base R, {{is.nan()}}
> returns {{FALSE}} on {{NA}}. But in the {{arrow}} R package, it returns
> {{NA}}:
> {code:r}
> > is.nan(c(3.14, NA, NaN))
> ##[1] FALSE FALSE TRUE
> as.vector(is.nan(Array$create(c(3.14, NA, NaN))))
> ##[1] FALSE NA TRUE{code}
> I think solving this with an option in the C++ kernel is the best solution,
> because I suspect there are other cases in which users would want the ability
> to return all non-missing values in the output from {{is_nan}} without
> needing to call another kernel to fill the missing values in. However, it
> would also be possible to solve this just in the R package, by changing the
> mapping of {{is.nan}} in the R package. If we choose to go that route, we
> should change this Jira issue summary to "[R] Make is.nan(NA) consistent with
> base R".
--
This message was sent by Atlassian Jira
(v8.3.4#803005)