paleolimbot commented on pull request #12467:
URL: https://github.com/apache/arrow/pull/12467#issuecomment-1075362907
I think so! This example is probably better than the example I have in there
right now because the serializing/deserializing of the metadata is a big part
of the picture and the current documentation example only implements the
array-to-r conversion. Check to make sure it's what you meant though!
I didn't implement this quite in the same way as the Python one...I think in
Python the workflow (and correct me if I'm wrong) is along the lines of
- Create a `ExtensionTypeSubclass(with, parameters, like, this)` instance
- C++ calls the `__arrow_ext_serialize__` method of the Python instance when
the serialized metadata is needed
In R it's totally bananas to call from C++ into R and we can't do it safely
most of the time. So instead I wrote it like:
- Create the extension metadata before either the R or C++ instance is
created
- Create a C++ instance of `RExtensionType()` that contains the definitive
copy of the serialized extension metadata
- Create the ExtensionTypeSubclass R6 instance and then call the R6
instance's `.Deserialize()` method to populate data fields.
It isn't all that straightforward to do (the way I've implemented it in R)
and I'm not sure I *like* how it's implemented (but I'm also not sure how to
make it better).
<details>
``` r
library(arrow, warn.conflicts = FALSE)
QuantizedType <- R6::R6Class(
"QuantizedType",
inherit = ExtensionType,
public = list(
center = function() private$.center,
scale = function() private$.scale,
.array_as_vector = function(extension_array) {
as.vector(extension_array$storage() / private$.scale + private$.center)
},
.Deserialize = function(storage_type, extension_name,
extension_metadata) {
parsed <- jsonlite::fromJSON(self$extension_metadata_utf8())
private$.center <- as.double(parsed$center)
private$.scale <- as.double(parsed$scale)
}
),
private = list(
.center = NULL,
.scale = NULL
)
)
quantized <- function(center = 0, scale = 1, storage_type = int32()) {
new_extension_type(
storage_type = storage_type,
extension_name = "arrow.example.quantized",
extension_metadata = jsonlite::toJSON(
list(
center = jsonlite::unbox(as.double(center)),
scale = jsonlite::unbox(as.double(scale))
)
),
type_class = QuantizedType
)
}
quantized_array <- function(x, center = 0, scale = 1,
storage_type = int32()) {
type <- quantized(center, scale, storage_type)
new_extension_array(
Array$create((x - center) * scale, type = storage_type),
type
)
}
reregister_extension_type(quantized())
(vals <- runif(5, min = 19, max = 21))
#> [1] 19.33526 19.47467 19.14288 20.39798 19.04523
(array <- quantized_array(
vals,
center = 20,
scale = 2 ^ 15 - 1,
storage_type = int16())
)
#> ExtensionArray
#> <QuantizedType <{"center":20,"scale":32767}>>
#> [
#> -21781,
#> -17213,
#> -28085,
#> 13040,
#> -31284
#> ]
array$type$center()
#> [1] 20
array$type$scale()
#> [1] 32767
as.vector(array)
#> [1] 19.33528 19.47468 19.14289 20.39796 19.04526
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]