paleolimbot commented on code in PR #12817:
URL: https://github.com/apache/arrow/pull/12817#discussion_r855310102


##########
r/R/python.R:
##########
@@ -105,19 +105,25 @@ py_to_r.pyarrow.lib.ChunkedArray <- function(x, ...) {
 }
 
 r_to_py.Table <- function(x, convert = FALSE) {
-  # Import with convert = FALSE so that `_import_from_c` returns a Python 
object
-  pa <- reticulate::import("pyarrow", convert = FALSE)
-  out <- pa$Table$from_arrays(x$columns, schema = x$schema)
-  # But set the convert attribute on the return object to the requested value
+  # Going through RecordBatchReader maintains schema metadata (e.g.,

Review Comment:
   I made a JIRA (ARROW-16269) for the schema metadata thing...the gist of it 
is that the schema metadata roundtrips fine but you end up with a situation 
where `roundtripped_table$col1$type` isn't the same as 
`roundtripped_table$schema$col1$type`.
   
   Going through `RecordBatchReader` does re-chunk everything to line up into 
record batches BUT the slicing is zero-copy (according to the comments: 
https://github.com/apache/arrow/blob/1157e677f9ba3e6d5b203adde4756a2e4d178713/cpp/src/arrow/table.h#L240-L244
 ). It looks like it zero-copy slices everything to match the column with the 
smallest batches (see details below). I don't think this will materially matter 
although it would be nice to avoid the potential re-chunking (I made 
ARROW-16269 to see if we can't reinstate column-wise conversion).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to