TPDeramus commented on issue #39038:
URL: https://github.com/apache/arrow/issues/39038#issuecomment-1836931108

   Apologies but I am not well versed in the implementations of `browser()`.
   
   And it's doubly problematic because this is not always thrown as a typical 
error.
   
   Occasionally (but not always), if passed to a variable, its saved as a list 
item containing the error:
   ```
   > Out_table
   Error: NotImplemented: Function 'coalesce' has no kernel matching input 
types (numeric(0)
   attr(,"class")
   [1] NA, numeric(0)
   attr(,"class")
   [1] NA)
   > typeof(Out_table)
   [1] "list"
   ```
   
   But the example I have that works is also a list:
   ```
   > Dummytable
   Table (query)
   ID: string (coalesce(ID.x, ID.y))
   String: string (coalesce(String.x, String.y))
   Value_A: int32
   Value_F: int32
   Value_G: int32
   Value_K: int32
   Value_B: int32
   Value_C: int32
   Value_L: int32
   Value_H: int32
   Value_M: int32
   Value_D: int32
   Value_I: int32
   Value_N: int32 (coalesce(Value_N.x, Value_N.y))
   Value_E: int32
   Value_J: int32
   
   See $.data for the source Arrow object
   > typeof(Dummytable)
   [1] "list"
   ```
   
   As such, it's hard to debug via `rlang::last_trace()` and the like because 
it is not read as something that can be traced in the terminal, and in 
`browser()` it will frequently not report as an error and simply continue or 
exit the `browser()` session as if it proceeded without an error.
   
   However, from what I was able to gather within at least one session of 
`browser()` from a call to `Out_table %>% full_join(out)`, this was the order 
of the commands:
   
   ```
   Error: NotImplemented: Function 'coalesce' has no kernel matching input 
types (numeric(0)
   attr(,"class")
   [1] NA, numeric(0)
   attr(,"class")
   [1] NA)
   10. compute___expr__type(self, schema)
   9. .$type(old_schm)
   8. .f(.x[[i]], ...)
   7. map(.data$selected_columns, ~.$type(old_schm))
   6. implicit_schema(x)
   5. collapse.arrow_dplyr_query(x)
   4. do_join(x, y, by, copy, suffix, ..., keep = keep, join_type = 
"FULL_OUTER")
   3. full_join.arrow_dplyr_query(., out)
   2. full_join(., out)
   1. Out_table %>% full_join(out)
   ```
   
   Interestingly enough, when I made the following changes to the code:
   ```
   library(arrow)
   library(tidyverse)
   library(fastDummies)
   
     temp <- open_csv_dataset(sources = cohort_csvs) %>% compute()
     
     Subs <- data.frame(temp %>% distinct(key) %>% collect())
     
     for (Subnum in 1:dim(Subs)[1]) {
       out <-
         data.frame(temp %>% filter(key == Subs[Subnum, ]) %>% collect())
         out[is.na(out)] <- 'NA'
         out$tags <- 'NA'
         out <-
           dummy_cols(
             out,
             select_columns = "terms",
             remove_selected_columns = FALSE,
             omit_colname_prefix = TRUE
           )
         out <-
           dummy_cols(
             out,
             select_columns = "tags",
             remove_selected_columns = FALSE,
             omit_colname_prefix = TRUE
           )
         if (Subnum == 1){
           Out_table <- arrow_table(out)
         } else {
           #Out_table <-Out_table %>% full_join(out)
           Out_table %>% full_join(out)
           }
   ```
   And just didn't assign it to a variable at all, it ran just fine.
   
   This seems to happen when `Subnum` hits a value of 3, giving me the 
impression it's not quite sure what to do with the `NA` values once it hits the 
third table to be joined.
   
   Do you think this can be addressed with some call to `fill.null` or similar?
   https://arrow.apache.org/docs/python/generated/pyarrow.compute.fill_null.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to