[
https://issues.apache.org/jira/browse/SEDONA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690070#comment-17690070
]
Jia Yu commented on SEDONA-244:
-------------------------------
The design philosophy behind the read/write functions in Sedona Scala / Java is
that:
# By default, Sedona DataFrame does not need special reader/writer. It will
use sparklyr DF reader/writer and ST functions to create Sedona DF. e.g. SELECT
ST_GeomFromWKT(wktCol)
# Sedona DF has special reader/writer for two formats: GeoTiff and GeoParquet,
and possibly more in the future.
# Sedona RDD has special reader/writer for WKT/WKB/GeoJSON/Shapefile. Among
these, GeoJSON and Shapefile reader/writer are really useful. WKT/WKB are not
commonly used. Shapefile does not have writer. And we have no plan to support
this.
# Adapter class can convert a SRDD to SDF and vice versa.
# All constructors of typed RDD (PointRDD, PolygonRDD, LineStringRDD) are soft
deprecated.
We currently have no plan to align these APIs all to DataFrame on Scala/Java
side as aligning them takes lots of efforts. But you could do it on the R side.
Here is my proposal. This way, we can achieve consistent R API from the
DataFrame perspective. The RDD reader/writer remains unchanged.
df postfix means that this func takes an input location and produces a SDF
sedona_read_geoparquet_df() (change from its current name to this name)
sedona_read_geojson_df()
sedona_read_geotiff_df()
sedona_read_shapefiles_df()
df postfix means that this func takes a SDF and save it somewhere
sedona_write_geoparquet_df() (change from its current name to this name)
sedona_write_geojson_df()
sedona_write_geotiff_df()
In addition, I suggest that we do not add memory and partition arguments to
simplify the logic of each function. Because the user can easily call cache
time and df.partition(\{num_partitions} on their own. We don't have to do this
for them.
> Align R read/write functions with the Sparklyr framework
> --------------------------------------------------------
>
> Key: SEDONA-244
> URL: https://issues.apache.org/jira/browse/SEDONA-244
> Project: Apache Sedona
> Issue Type: Improvement
> Reporter: Gregoire Leleu
> Priority: Major
>
> Apache Sedona in R works as an extension of Sparklyr. Read/write functions
> for Sedona should follow the same overall format than the rest of the
> framework. E.g. :
> * Type of return value (I believe a tbl)
> * Standard arguments: name, path, memory, repartition...
> * Standard behavior: overwrite, default names etc.
> Currently some functions in R sedona return RDDs that need to be registered
> as sdf.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)