[
https://issues.apache.org/jira/browse/ARROW-17490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nicola Crane updated ARROW-17490:
---------------------------------
Description:
We get different results for dplyr versus Acero if we call log on a column that
contains 0, i.e.
{code:r}
library(arrow)
library(dplyr)
df <- tibble(x = 0:10)
df %>%
mutate(y = log(x)) %>%
collect()
#> # A tibble: 11 × 2
#> x y
#> <int> <dbl>
#> 1 0 -Inf
#> 2 1 0
#> 3 2 0.693
#> 4 3 1.10
#> 5 4 1.39
#> 6 5 1.61
#> 7 6 1.79
#> 8 7 1.95
#> 9 8 2.08
#> 10 9 2.20
#> 11 10 2.30
df %>%
arrow_table() %>%
mutate(y = log(x)) %>%
collect()
#> Error in `collect()`:
#> ! Invalid: logarithm of zero
{code}
This is because R defines {{log(0)}} as {{-Inf}} whereas Acero defines it as an
error. Not sure what the solution is here; do we want to request the addition
of an Acero option to define behaviour for this?
was:
We get different results for dplyr versus Acero if we call log on a column that
contains 0, i.e.
{code:r}
``` r
library(arrow)
library(dplyr)
df <- tibble(x = 0:10)
df %>%
mutate(y = log(x)) %>%
collect()
#> # A tibble: 11 × 2
#> x y
#> <int> <dbl>
#> 1 0 -Inf
#> 2 1 0
#> 3 2 0.693
#> 4 3 1.10
#> 5 4 1.39
#> 6 5 1.61
#> 7 6 1.79
#> 8 7 1.95
#> 9 8 2.08
#> 10 9 2.20
#> 11 10 2.30
df %>%
arrow_table() %>%
mutate(y = log(x)) %>%
collect()
#> Error in `collect()`:
#> ! Invalid: logarithm of zero
```
{code}
This is because R defines {{log(0)}} as {{-Inf}} whereas Acero defines it as an
error. Not sure what the solution is here; do we want to request the addition
of an Acero option to define behaviour for this?
> [R] Differing results in log bindings
> -------------------------------------
>
> Key: ARROW-17490
> URL: https://issues.apache.org/jira/browse/ARROW-17490
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Reporter: Nicola Crane
> Priority: Major
>
> We get different results for dplyr versus Acero if we call log on a column
> that contains 0, i.e.
> {code:r}
> library(arrow)
> library(dplyr)
> df <- tibble(x = 0:10)
> df %>%
> mutate(y = log(x)) %>%
> collect()
> #> # A tibble: 11 × 2
> #> x y
> #> <int> <dbl>
> #> 1 0 -Inf
> #> 2 1 0
> #> 3 2 0.693
> #> 4 3 1.10
> #> 5 4 1.39
> #> 6 5 1.61
> #> 7 6 1.79
> #> 8 7 1.95
> #> 9 8 2.08
> #> 10 9 2.20
> #> 11 10 2.30
> df %>%
> arrow_table() %>%
> mutate(y = log(x)) %>%
> collect()
> #> Error in `collect()`:
> #> ! Invalid: logarithm of zero
> {code}
> This is because R defines {{log(0)}} as {{-Inf}} whereas Acero defines it as
> an error. Not sure what the solution is here; do we want to request the
> addition of an Acero option to define behaviour for this?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)