Copilot commented on code in PR #558:
URL: https://github.com/apache/sedona-db/pull/558#discussion_r2742337148


##########
python/sedonadb/src/dataframe.rs:
##########
@@ -149,6 +152,34 @@ impl InternalDataFrame {
         ))
     }
 
+    fn to_batches<'py>(
+        &self,
+        py: Python<'py>,
+        requested_schema: Option<Bound<'py, PyAny>>,
+    ) -> Result<Batches, PySedonaError> {
+        check_py_requested_schema(requested_schema, 
self.inner.schema().as_arrow())?;
+
+        let df = self.inner.clone();
+        let batches = wait_for_future(py, &self.runtime, async move {
+            let mut stream = df.execute_stream().await?;
+            let schema = stream.schema();
+            let mut count = 0usize;
+            let mut batches = Vec::new();
+            while let Some(batch) = stream.try_next().await? {
+                count += batch.num_rows();
+                batches.push(batch);
+            }

Review Comment:
   The variable `count` is computed during batch collection but could be 
derived from the batches themselves. Consider calculating it from 
`batches.iter().map(|b| b.num_rows()).sum()` after the loop to avoid 
maintaining a separate counter and reduce the chance of inconsistency.
   ```suggestion
               let mut batches = Vec::new();
               while let Some(batch) = stream.try_next().await? {
                   batches.push(batch);
               }
               let count = batches.iter().map(|b| b.num_rows()).sum();
   ```



##########
python/sedonadb/tests/test_udf.py:
##########
@@ -122,6 +123,19 @@ def shapely_udf(geom, distance):
         pd.DataFrame({"col": [3857]}, dtype=np.uint32),
     )
 
+    # Ensure we can collect with >1 batch without hanging
+    con.funcs.table.sd_random_geometry("Point", 20000).to_view("pts", 
overwrite=True)

Review Comment:
   The test creates a view with 20,000 points that is used in two subsequent 
tests. Consider creating this view once in a fixture or setUp method to avoid 
duplication and improve test performance.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to