paleolimbot commented on pull request #12467:
URL: https://github.com/apache/arrow/pull/12467#issuecomment-1075362907


   I think so! This example is probably better than the example I have in there 
right now because the serializing/deserializing of the metadata is a big part 
of the picture and the current documentation example only implements the 
array-to-r conversion. Check to make sure it's what you meant though!
   
   I didn't implement this quite in the same way as the Python one...I think in 
Python the workflow (and correct me if I'm wrong) is along the lines of
   
   - Create a `ExtensionTypeSubclass(with, parameters, like, this)` instance
   - C++ calls the `__arrow_ext_serialize__` method of the Python instance when 
the serialized metadata is needed
   
   In R it's totally bananas to call from C++ into R and we can't do it safely 
most of the time. So instead I wrote it like:
   
   - Create the extension metadata before either the R or C++ instance is 
created
   - Create a C++ instance of `RExtensionType()` that contains the definitive 
copy of the serialized extension metadata
   - Create the ExtensionTypeSubclass R6 instance and then call the R6 
instance's `.Deserialize()` method to populate data fields.
   
   It isn't all that straightforward to do (the way I've implemented it in R) 
and I'm not sure I *like* how it's implemented (but I'm also not sure how to 
make it better).
   
   <details>
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   
   QuantizedType <- R6::R6Class(
     "QuantizedType", 
     inherit = ExtensionType,
     public = list(
       center = function() private$.center,
       scale = function() private$.scale,
       
       .array_as_vector = function(extension_array) {
         as.vector(extension_array$storage() / private$.scale + private$.center)
       },
       
       .Deserialize = function(storage_type, extension_name, 
extension_metadata) {
         parsed <- jsonlite::fromJSON(self$extension_metadata_utf8())
         private$.center <- as.double(parsed$center)
         private$.scale <- as.double(parsed$scale)
       }
     ),
     private = list(
       .center = NULL,
       .scale = NULL
     )
   )
   
   quantized <- function(center = 0, scale = 1, storage_type = int32()) {
     new_extension_type(
       storage_type = storage_type,
       extension_name = "arrow.example.quantized",
       extension_metadata = jsonlite::toJSON(
         list(
           center = jsonlite::unbox(as.double(center)),
           scale = jsonlite::unbox(as.double(scale))
         )
       ),
       type_class = QuantizedType
     )
   }
   
   quantized_array <- function(x, center = 0, scale = 1, 
                               storage_type = int32()) {
     type <- quantized(center, scale, storage_type)
     new_extension_array(
       Array$create((x - center) * scale, type = storage_type),
       type
     )
   }
   
   reregister_extension_type(quantized())
   
   (vals <- runif(5, min = 19, max = 21))
   #> [1] 19.33526 19.47467 19.14288 20.39798 19.04523
   
   (array <- quantized_array(
     vals,
     center = 20,
     scale = 2 ^ 15 - 1,
     storage_type = int16())
   )
   #> ExtensionArray
   #> <QuantizedType <{"center":20,"scale":32767}>>
   #> [
   #>   -21781,
   #>   -17213,
   #>   -28085,
   #>   13040,
   #>   -31284
   #> ]
   
   array$type$center()
   #> [1] 20
   array$type$scale()
   #> [1] 32767
   
   as.vector(array)
   #> [1] 19.33528 19.47468 19.14289 20.39796 19.04526
   ```
   
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to