[ 
https://issues.apache.org/jira/browse/ARROW-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619516#comment-16619516
 ] 

Wes McKinney commented on ARROW-3263:
-------------------------------------

I would suggest defining optional metadata to indicate that a field's null 
values use the R sentinel value conventions. That way an R consumer, if they 
see the custom metadata, do not have to examine the valid bits and simply 
memcpy the values buffer for numbers. R, for its part, could roundtrip data to 
Arrow format with less serialization work

I don't think that using a specific value for null value slots is a good idea, 
since it would introduce brittleness into implementations, as there are many 
ways that a value could end up null. If you had to make a pass over the memory 
to "sanitize" the null slots to use a particular value, then that would require 
extra computing work in many cases. 

> Use R sentinel values for missingness in addition to bitmask
> ------------------------------------------------------------
>
>                 Key: ARROW-3263
>                 URL: https://issues.apache.org/jira/browse/ARROW-3263
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Format
>            Reporter: Gabriel Becker
>            Priority: Major
>
> R uses sentinal values to indicate missingness within Atomic vectors (read 
> arrays in Arrow parlance, AFAIK). 
> Currently according to [~wesmckinn], the current value in the array in memory 
> is undefined if the bitmap indicating missingness is set to 1. 
> This will force R to copy and modify data whenever adopting Arrow data which 
> has missingness present as a native vector.
> If the value were written to the relevant sentinal values (INT_MIN for 32 bit 
> integers, and NaN with payload 1954 for double precision floats) _in addition 
> to_ the bit mask, then R would be able to use Arrow as intended while not 
> breaking any other systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to