paleolimbot commented on issue #40711:
URL: https://github.com/apache/arrow/issues/40711#issuecomment-2022637845
Thank you for reporting! A slightly more minimal reprex with a workaround.
Basically, there is a nested column that is "all missing", so R probably didn't
know what type to call it (and guessed logical). A workaround is to ensure that
all items in a list column have an identical structure (below).
``` r
library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()`
for more information.
df <- data.frame(
authors = I(list(
data.frame(name = c("Tim", "Steve"), orcid = c("123456", "7890")),
data.frame(name = c("Rhonda", "Merv"), orcid = c(NA, NA))
))
)
arrow_table(df)
#> Error: Invalid: Problem with column 2 (orcid): Invalid: Expecting a
character vector
# Workaround: ensure your authors column has the same schema for each item
authors_common <- vctrs::vec_ptype_common(!!!df$authors)
df$authors <- lapply(df$authors, vctrs::vec_cast, authors_common)
arrow_table(df)
#> Table
#> 2 rows x 1 columns
#> $authors: list<item: struct<name: string, orcid <string>>>
#>
#> See $metadata for additional Schema metadata
```
<sup>Created on 2024-03-27 with [reprex
v2.1.0](https://reprex.tidyverse.org)</sup>
From Arrow/R's end, this happens because our "string" converter errors when
it sees a `logical()`. It probably could check to see if all items are `NA`
(maybe via `anyNA()`) before erroring:
https://github.com/apache/arrow/blob/a407a6b45e6121051966d699017333ce9653e958/r/src/r_to_arrow.cpp#L867-L871
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]