[ 
https://issues.apache.org/jira/browse/ARROW-11211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-11211:
------------------------------------
    Description: 
It detects the type from the first chunk and uses it for all chunks. Normally 
this works ok, but it can lead to unexpected behavior, such as:

{code:r}
data <- list(1:10, NaN)
x <- chunked_array(!!!data)
{code}

returns:
{{Error: Invalid: Value is too large to fit in C integer type}}

There are a few things that might fix/change this: 
* improved error message
* chunked arrays not assuming the first chunk's types can be cast safely to all 
others

Note that in this case, specifying the type to int64() does "work" with an 
overflowed NaN value (-9223372036854775808)

{code:r}
data <- list(1:10, NaN)
x <- chunked_array(!!!data, type = int64())
{code}



  was:
{code:r}
data <- list(1:10, NaN)
x <- chunked_array(!!!data)
{code}

returns:
{{Error: Invalid: Value is too large to fit in C integer type}}

There are a few things that might fix/change this: 
* improved error message
* chunked arrays not assuming the first chunk's types can be cast safely to all 
others

Note that specifying the type to int64() does work with an overflowed NaN value 
(-9223372036854775808)

{code:r}
data <- list(1:10, NaN)
x <- chunked_array(!!!data, type = int64())
{code}




> [R] ChunkedArray$create assumes all chunks are the same type
> ------------------------------------------------------------
>
>                 Key: ARROW-11211
>                 URL: https://issues.apache.org/jira/browse/ARROW-11211
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>            Reporter: Jonathan Keane
>            Priority: Minor
>
> It detects the type from the first chunk and uses it for all chunks. Normally 
> this works ok, but it can lead to unexpected behavior, such as:
> {code:r}
> data <- list(1:10, NaN)
> x <- chunked_array(!!!data)
> {code}
> returns:
> {{Error: Invalid: Value is too large to fit in C integer type}}
> There are a few things that might fix/change this: 
> * improved error message
> * chunked arrays not assuming the first chunk's types can be cast safely to 
> all others
> Note that in this case, specifying the type to int64() does "work" with an 
> overflowed NaN value (-9223372036854775808)
> {code:r}
> data <- list(1:10, NaN)
> x <- chunked_array(!!!data, type = int64())
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to