[I] Are Apache Sedona geometry functions compatible with Spark Connect? [sedona]

via GitHub Fri, 17 Jan 2025 16:40:45 -0800


barrieca opened a new issue, #1764:
URL: https://github.com/apache/sedona/issues/1764


   Hello.
   We are trying to use geometry data with Apache Spark Connect and Apache 
Sedona. We are able to convert binary geometry data to Sedona geometry types 
using `ST_GeomFromWKB` on a local Apache Sedona instance, but when attempting 
to do this via a remote Spark Connect server, the `ST_GeomFromWKB` function is 
unable to be found (see below error). Are Sedona operations compatible with a 
Spark Connect server?
   
   ```
   pyspark.errors.exceptions.connect.AnalysisException: [UNRESOLVED_ROUTINE] 
Cannot resolve function `ST_GeomFromWKB` on search path [`system`.`builtin`, 
`system`.`session`, `spark_catalog`.`default`].; line 1 pos 0
   ```
   
   ## Actual behavior
   
   ```Python
   from pyspark.sql import SparkSession
   import pyspark.sql.functions as f
   from sedona.spark import *
   
   
   spark = 
SparkSession.builder.remote("sc://<spark_connect_address>:<port>").getOrCreate()
   url = "jdbc:postgresql://<database_address>"
   
   sedona = SedonaContext.create(spark)
   df = sedona.read.format("jdbc").option("url", url).option("dbtable", 
"nyc_neighborhoods").load().withColumn("geom", f.expr("ST_GeomFromWKB(geom)"))
   
   df.show()
   
   ```
   
   Running this code produces the above error at `df.show()`. When we use 
Sedona Spark in conjunction with our Spark Connect server without geospatial 
data (i.e., we don't use `.withColumn("geom", 
f.expr("ST_GeomFromWKB(geom)"))`), there is no error; the data is loaded and 
made available with the `geom` column in the original binary form.
   
   Note: We are using the PostGIS demo database found 
[here](https://postgis.net/workshops/postgis-intro/).
   
   ## Steps to reproduce the problem
   
   1. Start the Spark Connect server:
   ```
   ./sbin/start-connect-server.sh --packages 
org.apache.spark:spark-connect_2.12:3.5.0,org.apache.sedona:sedona-spark-shaded-3.5_2.12:1.7.0,org.datasyslab:geotools-wrapper:1.7.0-28.5,org.postgresql:postgresql:42.7.4
 --repositories https://artifacts.unidata.ucar.edu/repository/unidata-all 
--executor-memory 28G
   ```
   <br>
   2. Run the Python code above.
   
   ## Settings
   
   Sedona version = 1.7.0
   
   Apache Spark version = 3.5.0
   
   Scala version = 2.12
   
   Python version = 3.8
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] Are Apache Sedona geometry functions compatible with Spark Connect? [sedona]

Reply via email to