bknakker commented on issue #45956:
URL: https://github.com/apache/arrow/issues/45956#issuecomment-2790443728

   Thanks for the response! is_arrow_altrep is exactly something I would have 
looked for, somehow I failed to look at unexported functions.
   
    test_arrow_altrep_force_materialize() seem to call 
test_arrow_altrep_is_materialized(). Looking at a column from 
df=collect(arrowobj); eg. x=df$col1, is_arrow_altrep(x) returns TRUE, but 
test_arrow_altrep_is_materialized(x) also returns TRUE, and consequently 
test_arrow_altrep_force_materialize(x) throws an error that the array is 
already materialized. If I pull the same (integer) vector from the arrow object 
(xa=pull(arrowobj,col1), is_arrow_altrep is TRUE though 
test_arrow_altrep_is_materialized() also returns FALSE in this case. So 
something can be materialized by still be an ALTREP, which means that your 
proposed function doesn't un-altrep the variable.
   
   ```
   > .Internal(inspect(x))
   @0x000001eb6c9b6e30 14 REALSXP g1c0 [MARK,REF(65535)] materialized 
arrow::array_dbl_vector len=15812
   > .Internal(inspect(xa))
   @0x000001eb1d8cf390 14 REALSXP g0c0 [REF(65535)] 
arrow::array_dbl_vector<0x000001eb0881e510, double, 183 chunks, 0 nulls> 
len=3174882
   ```
   
   I thought if I raise this problem here it may turn out that my idea is not 
the right solution. Tbh I'm not sure - it's a broader question on the level of 
the R ecosystem that I don't really see through, I'm trying to learn the 
philosophy behind the whole ALTREP thing, how it is supposed to work and how it 
is supposed to be used, I think I'll give it a round on R-help about what to 
read and maybe on specific design principles of the whole thing. So I'm a bit 
hesitant to do a PR, though I would be happy to contribute (I'm also not a 
software engineer by training but the idea of contributing even little things 
to great open source software is exciting and would be an honour to me.)
   
   As for the RData file, this specific one I save consists of multiple data 
frames and variables, including df-s that results from quite a few 
transformations of the dfs loaded by arrow. Also, I needed to transfer the data 
for a junior colleague so I didn't want to bother them with deep info on data 
storage formats. It might not be the best solution but this is what I needed 
now, and I'm quite sure this happens with other R users as well. I might be 
wrong, but I feel that in general, besides arrow / IPC formats, even RData 
might have its place and use case in a project or system. 
   
   Short sessionInfo:
   ```
   R version 4.4.3 (2025-02-28 ucrt)
   Platform: x86_64-w64-mingw32/x64
   Running under: Windows 11 x64 (build 26100)
   
   arrow_19.0.1      
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to