[
https://issues.apache.org/jira/browse/SEDONA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690113#comment-17690113
]
Gregoire Leleu commented on SEDONA-244:
---------------------------------------
[~jiayu]
I did a bit more research on how other Sparklyr extensions are written; and
{*}extensions all keep the "spark_" prefix for their read/write functions{*}.
We could use this to distinguish:
* Functions that return a DF like the rest of the ecosystem:
spark_read_\{geoparquet|geojson|shapefile|geotiff|...}
* Functions that return an RDD: sedona_read_\{wkt|wkb|shapefile|geojson}
Technically, spark_read_* functions don't return a DF, they return a "tbl" that
makes them compatible with the rest of the data-wrangling functions. So {*}the
"_df" suffix won't mean anything to R users{*}. Aligning with spakr_read_* will
make it easy for them to start using Sedona with their usual flow. And they can
use RDD sedona_read_* functions if they need to. In terms of work, that's just
renaming the two new functions (geoparquet, geotiff) and making two wrappers
(shapefile, geojson), no breaking changes.
On the memory and partition arguments: R users get the "tbl" object wrapping
the DF, so if they want to "df.partition(n)" they would need to unwrap,
partition and re-wrap. Something similar happens for caching. I believe that's
why other spark_read_* functions have the arguments (and it is three lines of
code to add).
> Align R read/write functions with the Sparklyr framework
> --------------------------------------------------------
>
> Key: SEDONA-244
> URL: https://issues.apache.org/jira/browse/SEDONA-244
> Project: Apache Sedona
> Issue Type: Improvement
> Reporter: Gregoire Leleu
> Priority: Major
>
> Apache Sedona in R works as an extension of Sparklyr. Read/write functions
> for Sedona should follow the same overall format than the rest of the
> framework. E.g. :
> * Type of return value (I believe a tbl)
> * Standard arguments: name, path, memory, repartition...
> * Standard behavior: overwrite, default names etc.
> Currently some functions in R sedona return RDDs that need to be registered
> as sdf.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)