nealrichardson commented on code in PR #13514:
URL: https://github.com/apache/arrow/pull/13514#discussion_r918891006
##########
r/tests/testthat/test-Table.R:
##########
@@ -696,3 +696,18 @@ test_that("as_arrow_table() errors for invalid input", {
class = "arrow_no_method_as_arrow_table"
)
})
+
+test_that("num_rows method not susceptible to integer overflow", {
Review Comment:
On the format, see
https://github.com/apache/arrow/blob/master/format/Schema.fbs#L155-L171 and
https://arrow.apache.org/docs/format/Columnar.html.
In R:
```
# Here's a simple integer array
> a <- Array$create(1:5)
> a$data()
ArrayData
> a$data()$buffers
[[1]]
NULL
[[2]]
Buffer
# the first buffer is for the null bitmask, but it's empty because there are
no NAs
> a$data()$buffers[[2]]
Buffer
> a$data()$buffers[[2]]$size
[1] 20
> a$data()$buffers[[2]]$capacity
[1] 20
# here's a string array
> strings <- Array$create(letters)
# note it has 3 buffers, one for the data and one for the offsets
> strings$data()$buffers
[[1]]
NULL
[[2]]
Buffer
[[3]]
Buffer
> strings$data()$buffers[[2]]$size
[1] 108
> strings$data()$buffers[[3]]$size
[1] 26
# let's make a LargeString array using the helper from the test suite
> make_big_string <- function() {
+ # This creates a character vector that would exceed the capacity of
BinaryArray
+ rep(purrr::map_chr(2047:2050, ~ paste(sample(letters, ., replace =
TRUE), collapse = "")), 2^18)
+ }
> big <- Array$create(make_big_string())
> big$type
LargeUtf8
large_string
> big$data()$buffers
[[1]]
NULL
[[2]]
Buffer
[[3]]
Buffer
> big$data()$buffers[[2]]$size
[1] 8388616
# the data buffer is > MAX_INT32 so it overflows (on master)
> big$data()$buffers[[3]]$size
[1] -2146959360
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]