paleolimbot commented on PR #13397:
URL: https://github.com/apache/arrow/pull/13397#issuecomment-1177991132

   I *think* I've incorporated all the comments here - I've summarise the 
unresolved bits below but feel free to add to that list.
   
   I agree that the "the whole entire plan must be completely evaluated in one 
call into C++ from R" constraint is not ideal and I'm not offended if we want 
to bump this to the next release to see if we can do it better. It's a new 
feature and I think it's OK that we include it and let users give feedback on 
ways that user-defined functions can be improved (which may include support for 
the R-level record batch reader).
   
   I included improvements to `SafeCallIntoR<>()` / `RunWithCapturedR()` in 
this PR because it the like the bad error messages and code complexity of using 
them was becoming particularly evident. I'm happy to remove those changes and 
put them in another PR, too, since they widen the scope of this PR beyond just 
UDFs.
   
   A motivating example from the geospatial end of things that might be more 
fun to play with...it does highlight some of the complexities with matching 
extension types which is not all that well supported yet.
   
   <details>
   
   ``` r
   # remotes::install_github("apache/arrow#13397")
   library(arrow, warn.conflicts = FALSE)
   #> Some features are not enabled in this build of Arrow. Run `arrow_info()` 
for more information.
   library(dplyr, warn.conflicts = FALSE)
   # remotes::install_github("paleolimbot/geoarrow")
   library(geoarrow)
   library(sf)
   #> Linking to GEOS 3.9.1, GDAL 3.4.2, PROJ 8.2.1; sf_use_s2() is TRUE
   
   # (need a better generator for this in geoarrow)
   geoarrow_wkb_type_arrow <- arrow:::DataType$import_from_c(
     narrow::as_narrow_schema(geoarrow_wkb())
   )
   
   # scalar function wrapper
   st_perimeter_wrapper <- arrow_scalar_function(
     function(x) {
       sf::st_length(sf::st_boundary(sf::st_as_sfc(x)))
     },
     in_type = schema(x = geoarrow_wkb_type_arrow),
     out_type = float64()
   )
   
   # register!
   register_user_defined_function(st_perimeter_wrapper, "st_perimeter")
   
   # some example data
   nc <- sf::read_sf(system.file("shape/nc.shp", package = "sf"))
   # parameterized extension types (e.g., with crs) don't match the kernel 
signature
   sf::st_crs(nc) <- NA_crs_
   nc_table <- as_geoarrow_table(nc, schema = geoarrow_schema_wkb())
   
   # use in a pipeline
   nc_table |> 
     transmute(NAME, len = st_perimeter(geometry)) |> 
     collect()
   #> # A tibble: 100 × 2
   #>    NAME          len
   #>    <chr>       <dbl>
   #>  1 Ashe         1.44
   #>  2 Alleghany    1.23
   #>  3 Surry        1.63
   #>  4 Currituck    2.97
   #>  5 Northampton  2.21
   #>  6 Hertford     1.67
   #>  7 Camden       1.55
   #>  8 Gates        1.28
   #>  9 Warren       1.42
   #> 10 Stokes       1.43
   #> # … with 90 more rows
   
   # check answers
   nc |> 
     transmute(NAME, len = sf::st_length(sf::st_boundary(geometry)))
   #> Simple feature collection with 100 features and 2 fields
   #> Geometry type: MULTIPOLYGON
   #> Dimension:     XY
   #> Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 
36.58965
   #> CRS:           NA
   #> # A tibble: 100 × 3
   #>    NAME          len                                                    
geometry
   #>  * <chr>       <dbl>                                              
<MULTIPOLYGON>
   #>  1 Ashe         1.44 (((-81.47276 36.23436, -81.54084 36.27251, -81.56198 
36.27…
   #>  2 Alleghany    1.23 (((-81.23989 36.36536, -81.24069 36.37942, -81.26284 
36.40…
   #>  3 Surry        1.63 (((-80.45634 36.24256, -80.47639 36.25473, -80.53688 
36.25…
   #>  4 Currituck    2.97 (((-76.00897 36.3196, -76.01735 36.33773, -76.03288 
36.335…
   #>  5 Northampton  2.21 (((-77.21767 36.24098, -77.23461 36.2146, -77.29861 
36.211…
   #>  6 Hertford     1.67 (((-76.74506 36.23392, -76.98069 36.23024, -76.99475 
36.23…
   #>  7 Camden       1.55 (((-76.00897 36.3196, -75.95718 36.19377, -75.98134 
36.169…
   #>  8 Gates        1.28 (((-76.56251 36.34057, -76.60424 36.31498, -76.64822 
36.31…
   #>  9 Warren       1.42 (((-78.30876 36.26004, -78.28293 36.29188, -78.32125 
36.54…
   #> 10 Stokes       1.43 (((-80.02567 36.25023, -80.45301 36.25709, -80.43531 
36.55…
   #> # … with 90 more rows
   ```
   
   <sup>Created on 2022-07-07 by the [reprex 
package](https://reprex.tidyverse.org) (v2.0.1)</sup>
   
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to