Robinlovelace commented on issue #477:
URL: https://github.com/apache/sedona-db/issues/477#issuecomment-3705201292

   Update: I have identified the root cause. It is indeed the **metadata** 
attached to the Arrow schema, specifically the **Pandas metadata** (which 
includes index range information).
   
   When `sedona.db` converts a GeoPandas/Pandas DataFrame to an Arrow Table 
(via PyArrow), it attaches metadata including the index range (e.g., `start: 0, 
stop: 100000` for points vs `start: 0, stop: 100` for polygons). The DataFusion 
optimizer's `join_selection` rule seems to compare these schemas strictly, 
including the metadata, and fails when they differ.
   
   **Workaround:**
   Stripping the table-level metadata from the Arrow Table before creating the 
Sedona DataFrame resolves the issue.
   
   ```python
   # Workaround: Strip metadata
   temp_df = sd.create_data_frame(gdf)
   table = temp_df.to_arrow_table()
   clean_table = table.replace_schema_metadata(None)
   final_df = sd.create_data_frame(clean_table)
   final_df.to_view("my_view")
   ```
   
   With this workaround, the intersection query succeeds.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to