paleolimbot opened a new pull request, #14929: URL: https://github.com/apache/arrow/pull/14929
This was a problem on both sides: the created `Array` really did have length 0 because of a cast to `int` on the way in, and on the way out, we were returning `int` from `array->length()` which wouldn't have been correct either. On the way in, we were using an imported C callable from the vctrs package: the C callable version of `vctrs::vec_size()`. This C callable took care of returning the correct value for normal vectors (`length()`), for data frames (`nrow()`), and for other classed vectors whose C concept of "length" was not the value returned at the R level (i.e., "record style vectors" like POSIXlt). At the C++ conversion level, we don't handle record style vectors: they are handled via the `VctrsExtensionType` and C++ only sees the `vec_data()` (i.e., data.frame). Because of this, implementing our own `vec_size()` that also supports long vectors was not hard. This allowed removing the link to vctrs for now (until a time that need to use more of the exported C API). On the way out, we already had the concept of `r_vec_size` from an earlier PR, we had just forgotten to use it in `Array__length()`. Before this PR: ``` r library(arrow, warn.conflicts = FALSE) too_big <- raw(.Machine$integer.max + 1) too_big_array <- Array$create(too_big) length(too_big) #> [1] 2147483648 length(too_big_array) #> [1] 0 ``` After this PR: ``` r library(arrow, warn.conflicts = FALSE) too_big <- raw(.Machine$integer.max + 1) too_big_array <- Array$create(too_big) length(too_big) #> [1] 2147483648 length(too_big_array) #> [1] 2147483648 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
