For many statistics-oriented Julia users there is a great advantage in
being able to piggy-back on R development and to use at least the data sets
from R packages. Hence the RDatasets package and the read_rda function in
the DataFrames package for reading saved R data.
Over the last couple of days I have been experimenting with running an
embedded R within Julia and calling R functions from Julia. This is similar
in scope to the Rif package except that this code is written in Julia and
not as a set of wrapper functions written in C. The R API is a C API and,
in some ways, very simple. Everything in R is represented as a "symbolic
expression" or SEXPREC and passed around as pointers to such expressions
(called an SEXP type). Most functions take one or more SEXP values as
arguments and return an SEXP.
I have avoided reading the code for Rif for two reasons:
1. It is GPL3 licensed
2. I already know a fair bit of the R API and where to find API function
signatures.
Here's a simple example
julia> initR()
1
julia> globalEnv = unsafe_load(cglobal((:R_GlobalEnv,libR),SEXP),1)
Ptr{Void} @0x0000000008c1c388
julia> formaldehyde = tryEval(install(:Formaldehyde))
Ptr{Void} @0x0000000008fd1d18
julia> inherits(formaldehyde,"data.frame")
true
julia> printValue(formaldehyde)
carb optden
1 0.1 0.086
2 0.3 0.269
3 0.5 0.446
4 0.6 0.538
5 0.7 0.626
6 0.9 0.782
julia> length(formaldehyde)
2
julia> names(formaldehyde)
2-element Array{ASCIIString,1}:
"carb"
"optden"
julia> form1 = ccall((:VECTOR_ELT,libR),SEXP,(SEXP,Cint),formaldehyde,0)
Ptr{Void} @0x000000000a5baf58
julia> ccall((:TYPEOF,libR),Cint,(SEXP,),form1)
14
julia> carb =
copy(pointer_to_array(ccall((:REAL,libR),Ptr{Cdouble},(SEXP,),form1),length(form1)))
6-element Array{Float64,1}:
0.1
0.3
0.5
0.6
0.7
0.9
julia> form2 = ccall((:VECTOR_ELT,libR),SEXP,(SEXP,Cint),formaldehyde,1)
Ptr{Void} @0x000000000a5baef0
julia> ccall((:TYPEOF,libR),Cint,(SEXP,),form2)
14
julia> optden =
copy(pointer_to_array(ccall((:REAL,libR),Ptr{Cdouble},(SEXP,),form2),length(form2)))
6-element Array{Float64,1}:
0.086
0.269
0.446
0.538
0.626
0.782
A call to printValue uses the R printing mechanism.
Questions:
- What would be a good name for such a package? In the spirit of PyCall
it could be RCall or Rcall perhaps.
- Right now I am defining several functions that emulate the names of
functions in R itself ir in the R API. What is a good balance? Obviously
it would not be a good idea to bring in all the names in the R base
namespace. On the other hand, those who know names like "inherits" and
what it means in R will find it convenient to have such names in such a
package.
- Should I move the discussion the the julia-stats list?