[
https://issues.apache.org/jira/browse/ARROW-11211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Neal Richardson updated ARROW-11211:
------------------------------------
Description:
It detects the type from the first chunk and uses it for all chunks. Normally
this works ok, but it can lead to unexpected behavior, such as:
{code:r}
data <- list(1:10, NaN)
x <- chunked_array(!!!data)
{code}
returns:
{{Error: Invalid: Value is too large to fit in C integer type}}
There are a few things that might fix/change this:
* improved error message
* chunked arrays not assuming the first chunk's types can be cast safely to all
others
Note that in this case, specifying the type to int64() does "work" with an
overflowed NaN value (-9223372036854775808)
{code:r}
data <- list(1:10, NaN)
x <- chunked_array(!!!data, type = int64())
{code}
was:
{code:r}
data <- list(1:10, NaN)
x <- chunked_array(!!!data)
{code}
returns:
{{Error: Invalid: Value is too large to fit in C integer type}}
There are a few things that might fix/change this:
* improved error message
* chunked arrays not assuming the first chunk's types can be cast safely to all
others
Note that specifying the type to int64() does work with an overflowed NaN value
(-9223372036854775808)
{code:r}
data <- list(1:10, NaN)
x <- chunked_array(!!!data, type = int64())
{code}
> [R] ChunkedArray$create assumes all chunks are the same type
> ------------------------------------------------------------
>
> Key: ARROW-11211
> URL: https://issues.apache.org/jira/browse/ARROW-11211
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Reporter: Jonathan Keane
> Priority: Minor
>
> It detects the type from the first chunk and uses it for all chunks. Normally
> this works ok, but it can lead to unexpected behavior, such as:
> {code:r}
> data <- list(1:10, NaN)
> x <- chunked_array(!!!data)
> {code}
> returns:
> {{Error: Invalid: Value is too large to fit in C integer type}}
> There are a few things that might fix/change this:
> * improved error message
> * chunked arrays not assuming the first chunk's types can be cast safely to
> all others
> Note that in this case, specifying the type to int64() does "work" with an
> overflowed NaN value (-9223372036854775808)
> {code:r}
> data <- list(1:10, NaN)
> x <- chunked_array(!!!data, type = int64())
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)