[I] Python API Issues [sedona]

via GitHub Thu, 05 Sep 2024 14:13:11 -0700


ivanthewebber opened a new issue, #1581:
URL: https://github.com/apache/sedona/issues/1581


   ## Expected behavior
   Instructions in 
[docs](https://sedona.apache.org/1.6.1/setup/install-python/) with latest 
versions should succeed without errors. I have been unable to initialize 
Sedona-Spark for the Python API. I think the docs need updated or there are 
errors in the most recent versions.
   
   I installed Sedona and PySpark (with Hadoop) from PyPi and have a Java 11 
JDK and Scala 2.12 on my computer. I also tried installing Spark from a 
download directly. I have tried manually downloading the Jars as well.
   
   I want to initialize the session like follows:
   ```python
   import pyspark
   import pyspark.version
   import pyspark.sql
   import sedona
   import sedona.spark
   
   def get_sedona_spark(spark_version=pyspark.version.__version__, 
scala_version='2.12', sedona_version=sedona.version, geotools_version='28.2') 
-> sedona.spark.SedonaContext:
       """
       Get the Sedona Spark context.
   
       We use the newest version, so Sedona's methods will expect lon-lat order.
       """
   
       if spark_version.count('.') > 1:
           spark_version = '.'.join(spark_version.split('.')[:2])
   
       builder: pyspark.sql.SparkSession.Builder = 
sedona.spark.SedonaContext.builder()
       spark = builder\
           .config(
               'spark.jars.packages',
               
f'org.apache.sedona:sedona-spark-{spark_version}_{scala_version}:{sedona_version},'
 +
               
f'org.datasyslab:geotools-wrapper:{sedona_version}-{geotools_version}'
           ).config(
               'spark.jars.repositories',
               'https://artifacts.unidata.ucar.edu/repository/unidata-all'
           ).getOrCreate()
   
       return sedona.spark.SedonaContext.create(spark)
   
   if __name__ == "__main__":
       get_sedona_spark()
   ```
   
   Ideally like the quickstarts for Spark/Flink there would be simple steps to 
run a simple word count program.
   
   ## Actual behavior
   Various errors. I've tried a lot of variations and recommended fixes from 
StackOverflows but haven't made much progress.
   
   I get errors like the following: `PySparkRuntimeError: [JAVA_GATEWAY_EXITED] 
Java gateway process exited before sending its port number.`
   
   ## Steps to reproduce the problem
   ```
   # create new env
   python -m env env
   ./env/scripts/Activate.ps1
   python -m pip install --upgrade pip
   pip install --upgrade apache-sedona[spark] pyspark
   
   # I tried setting the spark home to a few different options, but
   # if I'm reading the docs right when installing from PyPi I shouldn't need to
   # $SPARK_HOME = "venv/.../pyspark"
   
   # attempt to initialize session (see above)
   python test.py
   ```
   
   ## Settings
   
   Sedona version = 1.6.1, 1.5
   
   Apache Spark version = 3.5.2, 3.5.1, 3.4
   
   API type = Python
   
   Scala version = 2.12
   
   JRE version = 1.11
   
   Python version = 3.12
   
   Environment = Local
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] Python API Issues [sedona]

Reply via email to