paleolimbot commented on issue #187:
URL: 
https://github.com/apache/arrow-nanoarrow/issues/187#issuecomment-1821193828

   >  setting a tag somehow upsets the Arrow nature of things
   
   Yes: the `tag` is where (perhaps inadvisably) the external pointer to the 
`ArrowSchema` is stored, and I'm not sure the class is checked everywhere so 
perhaps the failing allocation of a bazillion bytes is because it's being 
misinterpted somehow. The `protected` member is where you should set any SEXP 
dependency, with the caveat that if there is already one there you have to 
maintain a reference to it (e.g., `list(your_new_sexp_dep, old_sexp_dep)`). You 
can use `nanoarrow_pointer_export()` to wrap the array in another array that 
maintains the reference via the `release()` callback instead of via the `prot` 
tag. All of that is undocumented, of course...I didn't expect this level of 
internal use quite yet but obviously it should be clear 🙂 .
   
   > So would there be some appetite to extend, say, what is in nanoarrow.hpp 
in light of possible 'interface helpers' ?
   
   For R-specific helpers, perhaps a header like the one you mentioned in 
`r/inst/include/nanoarrow/nanoarrow_r.h|hpp`? Even if all they do is 
`Rf_eval()` to call into R in the first iteration (we can make them faster 
later if the call into R is limiting). Usually I just allocate from R and pass 
the `array_xptr` into the C/C++ function (e.g., 
https://github.com/apache/arrow-adbc/blob/main/r/adbcdrivermanager/R/adbc.R#L180-L191
 ). At the very least, a copy of the C Data/Stream structures would be helpful.
   
   For Python, we generate Cython definitions (`nanoarrow_c.pxd`), and I've 
wondered if it's worth putting that in `dist/` (Python extensions can just copy 
nanoarrow.h, nanoarrow.c, and nanoarrow_c.pxd and it's reasonably easy to wrap 
from there (Cython is considerably easier as a glue language to anything we 
have in R). This is what I've done in 
https://github.com/geoarrow/geoarrow-c/tree/main/python . nanoarrow for Python 
will probably serve a similar purpose (help with the allocating). In Python 
there is also 
https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html , 
which is more or less what `as_nanoarrow_XXX()` is trying to do.
   
   For C++ (i.e., `nanoarrow.hpp`), there is almost certainly more that could 
be useful, although I am hesitant to increase the scope beyond not leaking 
memory. It might be worth drafting an internal set of helpers you developed and 
used somewhere and linking to it here? (Or maybe that's not what you had in 
mind)
   
   > 'how to work with nanoarrow for R extensions' 
   
   This should definitely be a vignette/article! As you noted there is now ADBC 
and soon geoarrow (and whatever you are up to!) which are the first few test 
cases. Porting the linesplitter example would be a great place to start, as you 
noted (i.e., here's how you'd wrap the function in an R package...).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to