TPDeramus commented on issue #39038:
URL: https://github.com/apache/arrow/issues/39038#issuecomment-1836931108
Apologies but I am not well versed in the implementations of `browser()`.
And it's doubly problematic because this is not always thrown as a typical
error.
Occasionally (but not always), if passed to a variable, its saved as a list
item containing the error:
```
> Out_table
Error: NotImplemented: Function 'coalesce' has no kernel matching input
types (numeric(0)
attr(,"class")
[1] NA, numeric(0)
attr(,"class")
[1] NA)
> typeof(Out_table)
[1] "list"
```
But the example I have that works is also a list:
```
> Dummytable
Table (query)
ID: string (coalesce(ID.x, ID.y))
String: string (coalesce(String.x, String.y))
Value_A: int32
Value_F: int32
Value_G: int32
Value_K: int32
Value_B: int32
Value_C: int32
Value_L: int32
Value_H: int32
Value_M: int32
Value_D: int32
Value_I: int32
Value_N: int32 (coalesce(Value_N.x, Value_N.y))
Value_E: int32
Value_J: int32
See $.data for the source Arrow object
> typeof(Dummytable)
[1] "list"
```
As such, it's hard to debug via `rlang::last_trace()` and the like because
it is not read as something that can be traced in the terminal, and in
`browser()` it will frequently not report as an error and simply continue or
exit the `browser()` session as if it proceeded without an error.
However, from what I was able to gather within at least one session of
`browser()` from a call to `Out_table %>% full_join(out)`, this was the order
of the commands:
```
Error: NotImplemented: Function 'coalesce' has no kernel matching input
types (numeric(0)
attr(,"class")
[1] NA, numeric(0)
attr(,"class")
[1] NA)
10. compute___expr__type(self, schema)
9. .$type(old_schm)
8. .f(.x[[i]], ...)
7. map(.data$selected_columns, ~.$type(old_schm))
6. implicit_schema(x)
5. collapse.arrow_dplyr_query(x)
4. do_join(x, y, by, copy, suffix, ..., keep = keep, join_type =
"FULL_OUTER")
3. full_join.arrow_dplyr_query(., out)
2. full_join(., out)
1. Out_table %>% full_join(out)
```
Interestingly enough, when I made the following changes to the code:
```
library(arrow)
library(tidyverse)
library(fastDummies)
temp <- open_csv_dataset(sources = cohort_csvs) %>% compute()
Subs <- data.frame(temp %>% distinct(key) %>% collect())
for (Subnum in 1:dim(Subs)[1]) {
out <-
data.frame(temp %>% filter(key == Subs[Subnum, ]) %>% collect())
out[is.na(out)] <- 'NA'
out$tags <- 'NA'
out <-
dummy_cols(
out,
select_columns = "terms",
remove_selected_columns = FALSE,
omit_colname_prefix = TRUE
)
out <-
dummy_cols(
out,
select_columns = "tags",
remove_selected_columns = FALSE,
omit_colname_prefix = TRUE
)
if (Subnum == 1){
Out_table <- arrow_table(out)
} else {
#Out_table <-Out_table %>% full_join(out)
Out_table %>% full_join(out)
}
```
And just didn't assign it to a variable at all, it ran just fine.
This seems to happen when `Subnum` hits a value of 3, giving me the
impression it's not quite sure what to do with the `NA` values once it hits the
third table to be joined.
Do you think this can be addressed with some call to `fill.null` or similar?
https://arrow.apache.org/docs/python/generated/pyarrow.compute.fill_null.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]