Kristin Cowalcijk created SEDONA-560:
----------------------------------------

             Summary: Spatial join involving dataframe containing 0 partition 
throws exception
                 Key: SEDONA-560
                 URL: https://issues.apache.org/jira/browse/SEDONA-560
             Project: Apache Sedona
          Issue Type: Bug
    Affects Versions: 1.6.0
            Reporter: Kristin Cowalcijk


Sedona cannot handle dataframes containing 0 partitions properly when 
performing spatial join. For example:
{code:python}
schema = StructType([
    StructField("id", IntegerType(), True)
])

# Create empty RDD
empty_rdd = spark.sparkContext.emptyRDD()

# Create empty DataFrame
empty_df = spark.createDataFrame(empty_rdd, schema)

df_point = spark.range(0, 10).toDF("id").withColumn('geom', expr("ST_Point(id, 
id)"))
df_poly = empty_df.withColumn("poly", expr("ST_Buffer(ST_Point(id, id), 
2)")).drop("geom")

spark.conf.set("sedona.join.autoBroadcastJoinThreshold", "-1")
df_point.join(broadcast(df_poly), expr("ST_Intersects(poly, geom)")).count()
{code}
failed with the following error message:
{code:java}
Py4JJavaError: An error occurred while calling o107.showString.
: java.util.NoSuchElementException: next on empty iterator
    at scala.collection.Iterator$$anon$2.next(Iterator.scala:41)
    at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
    at scala.collection.IterableLike.head(IterableLike.scala:109)
    at scala.collection.IterableLike.head$(IterableLike.scala:108)
    at scala.collection.AbstractIterable.head(Iterable.scala:56)
    at 
org.apache.spark.sql.sedona_sql.strategy.join.SpatialIndexExec.doExecuteBroadcast(SpatialIndexExec.scala:63)
{code}
This does not only happen to broadcast join, range join also has problems:
{code:python}
df_point.join(df_poly, expr("ST_Intersects(poly, geom)")).count()

24/05/27 10:50:13.989 WARN RangeJoinExec: [SedonaSQL] Join dominant side 
partition number 8 is larger than 1/2 of the dominant side count 10
24/05/27 10:50:13.989 WARN RangeJoinExec: [SedonaSQL] Try to use follower side 
partition number 0
Number of partitions must be >= 0
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to