ruanqizhen opened a new issue, #1743:
URL: https://github.com/apache/sedona/issues/1743

   ## Actual behavior
   
   I'm attempting to apply DBSCAN to my data, but I'm encountering an error. To 
troubleshoot, I tested it with the example data provided in the Sedona 
documentation and received the same error. Can anyone suggest what might be 
causing the issue and how I can resolve it?
   
   It was running on AWS Glue, the error message:
   
   > An error occurred while calling 
z:org.apache.sedona.stats.clustering.DBSCAN.dbscan. Checkpoint directory has 
not been set in the SparkContext.
   
   The code:
   
   ```python
   from sedona.spark import *
   from sedona.stats.clustering.dbscan import dbscan
   
   config = (
       SedonaContext.builder()
       .config(
           "spark.jars.packages",
           "org.apache.sedona:sedona-spark-shaded-3.0_2.12:1.7.0,"
           "org.datasyslab:geotools-wrapper:1.7.0-28.2",
       )
       .getOrCreate()
   )
   spark = SedonaContext.create(config)
   
   import pyspark.sql.functions as F
   import pyspark.sql.types as T
   from pyspark.sql import Row
   
   data = [
        Row(wkt="POINT (2.5 4)", id=3),
        Row(wkt="POINT (3 4)", id=2),
        Row(wkt="POINT (3 5)", id=5),
        Row(wkt="POINT (1 3)", id=9),
        Row(wkt="POINT (2.5 4.5)", id=7),
        Row(wkt="POINT (1 2)", id=1),
        Row(wkt="POINT (1.5 2.5)", id=4),
        Row(wkt="POINT (1.2 2.5)", id=8),
        Row(wkt="POINT (1 2.5)", id=11),
        Row(wkt="POINT (1 5)", id=10),
        Row(wkt="POINT (5 6)", id=12),
        Row(wkt="POINT (12.8 4.5)", id=6),
        Row(wkt="POINT (4 3)", id=13),
   ]
   df = spark.createDataFrame(data)
   
   df = df.withColumn(
       "geometry", F.expr("ST_GeomFromWKT(wkt)")
   )
   
   dbscan(df, 0.15, 1).write.mode(
       "overwrite"
   ).parquet(
       "s3://omf-internal-usw2/test/"
   )
   ```
   
   ## Settings
   
   Sedona version = 1.7.0
   
   Apache Spark version = 3.3
   
   Apache Flink version = ?
   
   API type = Python
   
   Scala version = 2.12
   
   JRE version = ?
   
   Python version = 3
   
   Environment = AWS Glue 4.0
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to