Michael Quinn created ARROW-9235:
------------------------------------
Summary: [R] Support for `connection` class when reading and
writing files
Key: ARROW-9235
URL: https://issues.apache.org/jira/browse/ARROW-9235
Project: Apache Arrow
Issue Type: New Feature
Components: R
Reporter: Michael Quinn
We have an internal filesystem that we interact with through objects that
inherit from the connection class. These files aren't necessarily local, making
it slightly more complicated to read and write parquet files, for example.
For now, we're generating raw vectors and using that to create the file. For
example, to read files
```
ReadParquet <- function(filename, ...) {
file <-file(filename,"rb")
on.exit(close(file))
raw <- readBin(file, "raw", FileInfo(filename)$size)
return(arrow::read_parquet(raw, ...))
}
```
And to write,
```
WriteParquet <- function(df, filepath, ...) {
stream <- BufferOutputStream$create()
write_parquet(df, stream, ...)
raw <- stream$finish()$data()
file <- file(filepath, "wb")
on.exit(close(file))
writeBin(raw, file)
return(invisible())
}
```
At the C++ level, we are interacting with ` R_new_custom_connection` defined
here:
https://github.com/wch/r-source/blob/trunk/src/include/R_ext/Connections.h
I've been very impressed with how feature-rich arrow is. It would be nice to
see this API supported as well.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)