Will Jones created ARROW-16085:
----------------------------------
Summary: [R] Support unifying schemas for InMemoryDatasets
Key: ARROW-16085
URL: https://issues.apache.org/jira/browse/ARROW-16085
Project: Apache Arrow
Issue Type: Improvement
Components: R
Affects Versions: 7.0.0
Reporter: Will Jones
Fix For: 8.0.0
The following fails:
{code:R}
sub_df1 <- Table$create(
x = Array$create(c(1, 2, 3)),
y = Array$create(c("a", "b", "c"))
)
sub_df2 <- Table$create(
x = Array$create(c(4, 5)),
z = Array$create(c("d", "e"))
)
ds1 <- InMemoryDataset$create(sub_df1)
ds2 <- InMemoryDataset$create(sub_df2)
ds <- c(ds1, ds2)
actual <- ds %>% collect()
{code}
{code}
Type error: yielded batch had schema x: double
y: string which did not match InMemorySource's: x: double
y: string
z: string
/Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:541
child_.Next()
/Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:152
value_.status()
/Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:180
maybe_element
/Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/dataset/scanner.cc:840
fragments_it.ToVector()
{code}
If we fixed this, we could implement a function that does for Tables what
{{dplyr::bind_rows}} does for Tibbles:
{code:R}
concat_tables <- function(..., schema = NULL) {
tables <- list2(...)
dataset <- open_dataset(map(tables, InMemoryDataset$create), schema = schema)
dplyr::collect(dataset, as_data_frame = FALSE)
}
{code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)