[ 
https://issues.apache.org/jira/browse/ARROW-17886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611061#comment-17611061
 ] 

Dewey Dunnington commented on ARROW-17886:
------------------------------------------

It hasn't been implemented yet, but we're probably going to include this at 
least internally to support additional tidyselect helpers (see ARROW-12778). In 
the meantime, you may be able to use this workaround:

{code:R}
library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for 
more information.

simulate_data_frame <- function(schema) {
  arrays <- lapply(schema$fields, function(field) concat_arrays(type = 
field$type))
  vectors <- lapply(
    arrays,
    function(array) tryCatch(
      as.vector(array), 
      error = function(...) vctrs::unspecified()
    )
  )
  
  names(vectors) <- names(schema)
  tibble::new_tibble(vectors, nrow = 0)
}

simulate_data_frame(schema(col1 = int32(), col2 = string()))
#> # A tibble: 0 × 2
#> # … with 2 variables: col1 <int>, col2 <chr>
{code}


> [R] Convert schema to the corresponding ptype (zero-row data frame)?
> --------------------------------------------------------------------
>
>                 Key: ARROW-17886
>                 URL: https://issues.apache.org/jira/browse/ARROW-17886
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Kirill Müller
>            Priority: Minor
>
> When fetching data e.g. from a RecordBatchReader, I would like to know, ahead 
> of time, what the data will look like after it's converted to a data frame. I 
> have found a way using utils::head(0), but I'm not sure if it's efficient in 
> all scenarios.
> My use case is the Arrow extension to DBI, in particular the default 
> implementation for drivers that don't speak Arrow yet. I'd like to know which 
> types the columns should have on the database. I can already infer this from 
> the corresponding R types, but those existing drivers don't know about Arrow 
> types.
> Should we support as.data.frame() for schema objects? The semantics would be 
> to return a zero-row data frame with correct column names and types.
> library(arrow)
> #> Some features are not enabled in this build of Arrow. Run `arrow_info()` 
> for more information.
> #> 
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #> 
> #>     timestamp
> data <- data.frame(
>   a = 1:3,
>   b = 2.5,
>   c = "three",
>   stringsAsFactors = FALSE
> )
> data$d <- blob::blob(as.raw(1:10))
> tbl <- arrow::as_arrow_table(data)
> rbr <- arrow::as_record_batch_reader(tbl)
> tibble::as_tibble(head(rbr, 0))
> #> # A tibble: 0 × 4
> #> # … with 4 variables: a <int>, b <dbl>, c <chr>, d <blob>
> rbr$read_table()
> #> Table
> #> 3 rows x 4 columns
> #> $a <int32>
> #> $b <double>
> #> $c <string>
> #> $d <<blob[0]>>
> #> 
> #> See $metadata for additional Schema metadata



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to