[I] try 1-N-N performance tuning with LATERAL subquery [sedona]

via GitHub Wed, 20 Mar 2024 19:29:13 -0700


MyqueWooMiddo opened a new issue, #1280:
URL: https://github.com/apache/sedona/issues/1280


   ## Expected behavior
   
   reference to https://postgis.net/workshops/postgis-intro/knn.html
   
   
https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-lateral-subquery.html
   
   I upgrade spark to 3.5.1 , try LATERAL to calculate 1-N-N 
(1-Nearest-Neighbour)
   
   I'll get point's 1-N-N inside the same table : 
data_points(id,longitude,latitude)  ,use sedona 
   
   ## Actual behavior
   
   spark do not support this type LATERAL 
   
   ## Steps to reproduce the problem
   
   with t_data as (
        select  id ,st_point(longitude,latitude) as point from data_points 
order by 1 limit 1000
   )
   select * from t_data t1, lateral (
        select t2.id,ST_DistanceSpheroid(t1.point,t2.point) as distance from 
t_data t2 
        where t1.id!=t2.id order by 2 limit 1
   )
   
   Spark throws :
   "org.apache.spark.sql.catalyst.ExtendedAnalysisException: 
[UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.ACCESSING_OUTER_QUERY_COLUMN_IS_NOT_ALLOWED]
 Unsupported subquery expression: Accessing outer query column is not allowed 
in this locationProject"
   
   I just want to know How can optimize 1-N-N in a large dataset rather than 
row_number(order by distance) = 1
   
   ## Settings
   
   Sedona version = 1.5.1
   
   Apache Spark version = 3.5.1
   
   API type = Scala
   
   Scala version = 2.12
   
   JRE version = 1.8
   
   Environment = Standalone
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] try 1-N-N performance tuning with LATERAL subquery [sedona]

Reply via email to