paleolimbot commented on code in PR #508:
URL: https://github.com/apache/sedona-db/pull/508#discussion_r2684720352


##########
rust/sedona-spatial-join/src/exec.rs:
##########
@@ -1173,4 +1202,157 @@ mod tests {
         })?;
         Ok(spatial_join_execs)
     }
+
+    fn extract_geoms_and_ids(partitions: &[Vec<RecordBatch>]) -> Vec<(i32, 
geo::Geometry<f64>)> {
+        let mut result = Vec::new();
+        for partition in partitions {
+            for batch in partition {
+                let id_idx = batch.schema().index_of("id").expect("Id column 
not found");
+                let ids = batch
+                    .column(id_idx)
+                    .as_any()
+                    .downcast_ref::<arrow_array::Int32Array>()
+                    .expect("Column 'id' should be Int32");
+
+                let geom_idx = batch
+                    .schema()
+                    .index_of("geometry")
+                    .expect("Geometry column not found");
+                let geoms_col = batch.column(geom_idx);
+                let geoms_binary = geoms_col
+                    .as_any()
+                    .downcast_ref::<arrow_array::BinaryArray>();
+                let geoms_binary_view = geoms_col
+                    .as_any()
+                    .downcast_ref::<arrow_array::BinaryViewArray>();
+
+                if geoms_binary.is_none() && geoms_binary_view.is_none() {
+                    panic!(
+                        "Column 'geometry' should be Binary or BinaryView. 
Schema: {:?}",
+                        batch.schema()
+                    );
+                }

Review Comment:
   I wonder if you could simplify this part with an executor:
   
   ```rust
   let executor = GeoTypesExecutor::new(...);
   let idx = geom_idx.iter();
   executor.execute_wkb_void(|maybe_geom| { 
result.push((idx.next().unwrap().unwrap(), maybe_geom.unwrap())); })
   ```



##########
rust/sedona-spatial-join/src/stream.rs:
##########
@@ -244,10 +244,10 @@ impl SpatialJoinStream {
         // Extract the necessary data first to avoid borrowing conflicts
         let (batch_opt, is_complete) = match &mut self.state {
             SpatialJoinStreamState::ProcessProbeBatch(iterator) => {
-                // For KNN joins, we swapped build/probe sides, so build_side 
should be Right
-                // For regular joins, build_side is Left
+                // For KNN joins, we may have swapped build/probe sides, so 
build_side might be Right;
+                // For regular joins, build_side is always Left.
                 let build_side = match &self.spatial_predicate {
-                    SpatialPredicate::KNearestNeighbors(_) => JoinSide::Right,
+                    SpatialPredicate::KNearestNeighbors(knn) => 
knn.probe_side.negate(),

Review Comment:
   When debugging my partition out of range issue with `sd_random_geometry()`, 
copilot sent me to this line and asked me to debug print the output of 
build_side (I should have listened!)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to