[jira] [Commented] (SEDONA-244) Align R read/write functions with the Sparklyr framework

Jia Yu (Jira) Thu, 16 Feb 2023 16:44:04 -0800


    [ 
https://issues.apache.org/jira/browse/SEDONA-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690070#comment-17690070
 ]


Jia Yu commented on SEDONA-244:
-------------------------------

The design philosophy behind the read/write functions in Sedona Scala / Java is 
that:

 
 # By default, Sedona DataFrame does not need special reader/writer. It will 
use sparklyr DF reader/writer and ST functions to create Sedona DF. e.g. SELECT 
ST_GeomFromWKT(wktCol)
 # Sedona DF has special reader/writer for two formats: GeoTiff and GeoParquet, 
and possibly more in the future.
 # Sedona RDD has special reader/writer for WKT/WKB/GeoJSON/Shapefile. Among 
these, GeoJSON and Shapefile reader/writer are really useful. WKT/WKB are not 
commonly used. Shapefile does not have writer. And we have no plan to support 
this.
 # Adapter class can convert a SRDD to SDF and vice versa.
 # All constructors of typed RDD (PointRDD, PolygonRDD, LineStringRDD) are soft 
deprecated.

 

We currently have no plan to align these APIs all to DataFrame on Scala/Java 
side as aligning them takes lots of efforts. But you could do it on the R side.

 

Here is my proposal. This way, we can achieve consistent R API from the 
DataFrame perspective. The RDD reader/writer remains unchanged.

 

df postfix means that this func takes an input location and produces a SDF

sedona_read_geoparquet_df()   (change from its current name to this name)

sedona_read_geojson_df()

sedona_read_geotiff_df()

sedona_read_shapefiles_df()

 

df postfix means that this func takes a SDF and save it somewhere

sedona_write_geoparquet_df() (change from its current name to this name)

sedona_write_geojson_df()

sedona_write_geotiff_df()

 

 

In addition, I suggest that we do not add memory and partition arguments to 
simplify the logic of each function. Because the user can easily call cache 
time and df.partition(\{num_partitions} on their own. We don't have to do this 
for them.

> Align R read/write functions with the Sparklyr framework
> --------------------------------------------------------
>
>                 Key: SEDONA-244
>                 URL: https://issues.apache.org/jira/browse/SEDONA-244
>             Project: Apache Sedona
>          Issue Type: Improvement
>            Reporter: Gregoire Leleu
>            Priority: Major
>
> Apache Sedona in R works as an extension of Sparklyr. Read/write functions 
> for Sedona should follow the same overall format than the rest of the 
> framework. E.g. :
>  * Type of return value (I believe a tbl)
>  * Standard arguments: name, path, memory, repartition...
>  * Standard behavior: overwrite, default names etc.
> Currently some functions in R sedona return RDDs that need to be registered 
> as sdf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (SEDONA-244) Align R read/write functions with the Sparklyr framework

Reply via email to