[ 
https://issues.apache.org/jira/browse/ARROW-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619586#comment-16619586
 ] 

Wes McKinney commented on ARROW-3263:
-------------------------------------

> Will the core machinery either automate or offer tools to do this sanitizing 
> pass or will people be forced to write their own.

I think it'd be reasonable to have code to prepare data for consumption with R 
in the common libraries, so a user of the common libraries (e.g. 
Java/Python/C++/Ruby) could emit the R metadata in IPC payloads so that the R 
receiver could do less work.

AFAICT this would only apply to numeric and integer/factor vectors, and 
possibly also boolean. Strings would have to be put into / looked up in the 
global string hash table

cc [~romainfrancois]

> Use R sentinel values for missingness in addition to bitmask
> ------------------------------------------------------------
>
>                 Key: ARROW-3263
>                 URL: https://issues.apache.org/jira/browse/ARROW-3263
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Format
>            Reporter: Gabriel Becker
>            Priority: Major
>
> R uses sentinal values to indicate missingness within Atomic vectors (read 
> arrays in Arrow parlance, AFAIK). 
> Currently according to [~wesmckinn], the current value in the array in memory 
> is undefined if the bitmap indicating missingness is set to 1. 
> This will force R to copy and modify data whenever adopting Arrow data which 
> has missingness present as a native vector.
> If the value were written to the relevant sentinal values (INT_MIN for 32 bit 
> integers, and NaN with payload 1954 for double precision floats) _in addition 
> to_ the bit mask, then R would be able to use Arrow as intended while not 
> breaking any other systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to