thisisnic commented on issue #49149:
URL: https://github.com/apache/arrow/issues/49149#issuecomment-3923020181
I asked Claude to investigate this. Here's what it found:
**How `haven::tagged_na()` works:**
Tagged NAs encode missing value tags (a-z) directly in the IEEE 754 NaN
payload bits. Regular R NA has bytes `a2 07 00 00 00 00 f0 7f`, while
`tagged_na('a')` has `a2 07 00 00 61 00 f0 7f` (0x61 = 'a' in byte 5). R's
`ISNA()` returns TRUE for both.
**R → Arrow conversion:**
Arrow's `is_NA<double>()` calls R's `ISNA()` macro
(`r/src/r_to_arrow.cpp:148`), so tagged NAs are marked as null in the validity
bitmap. However, the actual bytes ARE preserved in the data buffer.
**Arrow → R conversion (where the loss happens):**
`Converter_Double::Ingest_some_nulls()`
(`r/src/array_to_vector.cpp:219-236`) iterates through values and checks the
validity bitmap. For null positions, it writes `NA_REAL` regardless of what's
in the buffer:
```cpp
auto null_one = [&](R_xlen_t i) {
p_data[i] = NA_REAL; // overwrites buffer bytes with standard NA
return Status::OK();
};
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]