[jira] [Commented] (SEDONA-166) Provide DataFrame Style API

Jia Yu (Jira) Thu, 15 Sep 2022 15:25:09 -0700


    [ 
https://issues.apache.org/jira/browse/SEDONA-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605540#comment-17605540
 ]


Jia Yu commented on SEDONA-166:
-------------------------------

[~dougdennis] Awesome! It will be a great feature. Please go ahead and make a 
PR. Your contribution will be highly appreciated!

> Provide DataFrame Style API
> ---------------------------
>
>                 Key: SEDONA-166
>                 URL: https://issues.apache.org/jira/browse/SEDONA-166
>             Project: Apache Sedona
>          Issue Type: New Feature
>            Reporter: Doug Dennis
>            Priority: Major
>
> Spark provides an API to operate on Column types. Especially in python, this 
> API is by far the most common pattern I have seen used when developing Spark 
> applications. Currently, Sedona only provides the SQL API which requires 
> either generating a temporary view and using the sql method, using the expr 
> function, or using the selectExpr method. There is no performance loss but it 
> does cause disruption when writing applications using sedona and it makes 
> certain tasks tricky to accomplish.
> I'll use an example of using a Sedona function inside of a transform function 
> call to generate geometry from a list of coordinates. Assume the variable 
> spark is a spark session. Here is how it can be accomplished today (I omit 
> the version with expr since it is nearly identical to selectExpr):
> {code:python}
> df = spark.sql("SELECT array(array(0.0,0.0),array(1.1,2.2)) AS points_list")
> # generate a temp view and use the sql method
> df.createTempView("tbl")
> spark.sql("SELECT transform(points_list, p -> ST_Point(p[0], p[1])) AS 
> points_list FROM tbl")
> # selectExpr
> df.selectExpr("transform(points_list, p -> ST_Point(p[0], p[1])) AS 
> points_list")
> {code}
> I propose implementing a similar API style to Spark that works with Columns. 
> This would allow for something like this:
> {code:python}
> import pyspark.sql.functions as f
> import sedona.sql.st_functions as stf
> df.select(f.transform(f.col("points_list"), lambda x: stf.st_point(x[0], 
> x[1])))
> {code}
> I believe that the way that Spark implements this functionality can be 
> mirrored to accomplish this task.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (SEDONA-166) Provide DataFrame Style API

Reply via email to