[
https://issues.apache.org/jira/browse/ARROW-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Will Jones updated ARROW-16085:
-------------------------------
Summary: [C++][R] InMemoryDataset::ReplaceSchema does not alter scan output
(was: [R] Support unifying schemas for InMemoryDatasets)
> [C++][R] InMemoryDataset::ReplaceSchema does not alter scan output
> ------------------------------------------------------------------
>
> Key: ARROW-16085
> URL: https://issues.apache.org/jira/browse/ARROW-16085
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Affects Versions: 7.0.0
> Reporter: Will Jones
> Assignee: Will Jones
> Priority: Major
> Fix For: 9.0.0
>
>
>
> The following fails:
> {code:R}
> sub_df1 <- Table$create(
> x = Array$create(c(1, 2, 3)),
> y = Array$create(c("a", "b", "c"))
> )
> sub_df2 <- Table$create(
> x = Array$create(c(4, 5)),
> z = Array$create(c("d", "e"))
> )
> ds1 <- InMemoryDataset$create(sub_df1)
> ds2 <- InMemoryDataset$create(sub_df2)
> ds <- c(ds1, ds2)
> actual <- ds %>% collect()
> {code}
> {code}
> Type error: yielded batch had schema x: double
> y: string which did not match InMemorySource's: x: double
> y: string
> z: string
> /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:541
> child_.Next()
> /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:152
> value_.status()
> /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:180
> maybe_element
> /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/dataset/scanner.cc:840
> fragments_it.ToVector()
> {code}
> If we fixed this, we could implement a function that does for Tables what
> {{dplyr::bind_rows}} does for Tibbles:
> {code:R}
> concat_tables <- function(..., schema = NULL) {
> tables <- list2(...)
> dataset <- open_dataset(map(tables, InMemoryDataset$create), schema =
> schema)
> dplyr::collect(dataset, as_data_frame = FALSE)
> }
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)