Michael Quinn created ARROW-9235:
------------------------------------

             Summary: [R] Support for `connection` class when reading and 
writing files
                 Key: ARROW-9235
                 URL: https://issues.apache.org/jira/browse/ARROW-9235
             Project: Apache Arrow
          Issue Type: New Feature
          Components: R
            Reporter: Michael Quinn


We have an internal filesystem that we interact with through objects that 
inherit from the connection class. These files aren't necessarily local, making 
it slightly more complicated to read and write parquet files, for example.

For now, we're generating raw vectors and using that to create the file. For 
example, to read files

```
ReadParquet <- function(filename, ...) {
  file <-file(filename,"rb")
  on.exit(close(file))
  raw <- readBin(file, "raw", FileInfo(filename)$size)
  return(arrow::read_parquet(raw, ...))
}
```

And to write,

```
WriteParquet <- function(df, filepath, ...) {
  stream <- BufferOutputStream$create()
  write_parquet(df, stream, ...)
  raw <- stream$finish()$data()

  file <- file(filepath, "wb")
  on.exit(close(file))
  writeBin(raw, file)
  return(invisible())
}
```

At the C++ level, we are interacting with ` R_new_custom_connection` defined 
here:
https://github.com/wch/r-source/blob/trunk/src/include/R_ext/Connections.h

I've been very impressed with how feature-rich arrow is. It would be nice to 
see this API supported as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to