Kirill Müller created ARROW-17886:
-------------------------------------
Summary: [R] Convert schema to the corresponding ptype (zero-row
data frame)?
Key: ARROW-17886
URL: https://issues.apache.org/jira/browse/ARROW-17886
Project: Apache Arrow
Issue Type: Improvement
Components: R
Reporter: Kirill Müller
When fetching data e.g. from a RecordBatchReader, I would like to know, ahead
of time, what the data will look like after it's converted to a data frame. I
have found a way using utils::head(0), but I'm not sure if it's efficient in
all scenarios.
My use case is the Arrow extension to DBI, in particular the default
implementation for drivers that don't speak Arrow yet. I'd like to know which
types the columns should have on the database. I can already infer this from
the corresponding R types, but those existing drivers don't know about Arrow
types.
Should we support as.data.frame() for schema objects? The semantics would be to
return a zero-row data frame with correct column names and types.
library(arrow)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for
more information.
#>
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#>
#> timestamp
data <- data.frame(
a = 1:3,
b = 2.5,
c = "three",
stringsAsFactors = FALSE
)
data$d <- blob::blob(as.raw(1:10))
tbl <- arrow::as_arrow_table(data)
rbr <- arrow::as_record_batch_reader(tbl)
tibble::as_tibble(head(rbr, 0))
#> # A tibble: 0 × 4
#> # … with 4 variables: a <int>, b <dbl>, c <chr>, d <blob>
rbr$read_table()
#> Table
#> 3 rows x 4 columns
#> $a <int32>
#> $b <double>
#> $c <string>
#> $d <<blob[0]>>
#>
#> See $metadata for additional Schema metadata
--
This message was sent by Atlassian Jira
(v8.20.10#820010)