[
https://issues.apache.org/jira/browse/ARROW-16172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521891#comment-17521891
]
Jonathan Keane commented on ARROW-16172:
----------------------------------------
This is super helpful, thanks. When you say "a rough approximation of our
current rules" do you mean these are what should already be implemented in
coercing join keys, or is that a proposal of what we should add?
Those rules all look reasonable to me, one that I'm slightly confused by (and
might just be a typo?) is:
"int + int / uint + uint => widest type (e.g. int32 + int16 => int16)"
should that be:
"int + int / uint + uint => widest type (e.g. int32 + int16 => int32)"
> [C++] cast when reasonable for join keys
> ----------------------------------------
>
> Key: ARROW-16172
> URL: https://issues.apache.org/jira/browse/ARROW-16172
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Jonathan Keane
> Priority: Major
>
> Joining an integer column with a float column that happens to have whole
> numbers errors. For kernels, we would autocast in this circumstance, so it's
> a surprising UX that this doesn't work + I need to type coerce on my own for
> this.
> {code}
> library(arrow, warn.conflicts = FALSE)
> #> See arrow_info() for available features
> library(dplyr, warn.conflicts = FALSE)
> tab_int <- arrow_table(data.frame(let = letters, num = 1L:26L))
> tab_float <- arrow_table(data.frame(let = letters, num = as.double(1:26)))
> left_join(tab_int, tab_float) %>% collect()
> #> Error in `handle_csv_read_error()`:
> #> ! Invalid: Incompatible data types for corresponding join field keys:
> FieldRef.Name(num) of type int32 and FieldRef.Name(num) of type double
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)