Re: [I] [R] Change the binary type mapping to `blob::blob` [arrow]

via GitHub Tue, 24 Sep 2024 06:22:18 -0700


nealrichardson commented on issue #43135:
URL: https://github.com/apache/arrow/issues/43135#issuecomment-2371263477


   I'm not familiar with what one does with a `blob` object. With something 
like `integer64`, it comes with methods that let you work with the data as an 
integer, so there's value in mapping the int64 type to that. int64 is also 
quite common when working with Arrow: for one, the CSV reader infers integer 
type data to int64.
   
   One reason I do see for having a separate `arrow_binary` class, with or 
without `blob`, is that there are actually more than one binary type involved: 
arrow_binary, arrow_large_binary, and arrow_fixed_size_binary.
   
   Would a compromise, or possibly a transition path, be to assign `blob` 
attributes to the `arrow_binary` et al. objects? Like how we set tibble 
attributes on data.frames so that if you have `tibble` loaded, they act like 
tibbles, but you don't have to use tibble. A `blob` S3 object appears to be a 
list of raw vectors with these attributes: 
   
   ```
   $ptype
   raw(0)
   
   $class
   [1] "blob"          "vctrs_list_of" "vctrs_vctr"    "list"      
   ```
   
   The `arrow_binary` types "inherit" from `vctrs_vctr` too. So we could add 
the arrow_binary class name ahead of "blob" in that class vector, and then the 
arrow types inherit from blob. But they will otherwise still work as they do 
today with vctrs methods if you don't have blob installed.
   
   Does that make sense?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [R] Change the binary type mapping to `blob::blob` [arrow]

Reply via email to