Re: [I] Python API to cast binary columns to WKB columns [sedona-db]

via GitHub Tue, 20 Jan 2026 08:43:52 -0800


paleolimbot commented on issue #530:
URL: https://github.com/apache/sedona-db/issues/530#issuecomment-3773889868


   Thank you for opening! I agree that the SQL to make this happen is pretty 
ugly. I didn't know about the `* EXCLUDE (geo_bin)` trick which is nice.
   
   I'm a tiny bit worried about adding `DataFrame` methods because it's so 
central to SedonaDB and we haven't spent a lot of time thinking about what a 
Pythonic interface would be. I'm also hesitant to add things to the Python API 
that rely on string SQL processing (but also this is what I did to implement 
`sd_random_geometry()`, and we could always replace the usage hwn we have 
Python tools to help with this).
   
   For example, if we added a `mutate()` method similar to Ibis' `mutate()` one 
could just do:
   
   ```python
   sd.read_parquet(...).mutate(geo_bin=st.geomfromwkb(_.geo_bin))
   ```
   
   We could also add the ability to override the GeoParquet metadata in the 
reader and put some Python on top of that so you could do:
   
   ```python
   sd.read_parquet(..., geometry_columns={"geo_bin": {"encoding": "WKB"}})
   ```
   
   The nice part about that approach is that it would work in SQL if 
implemented as options (`CREATE EXTERNAL TABLE ... OPTIONS ('geometry_columns' 
'{"geo_bin": {"encoding": "WKB"}}')`).
   
   I'm also open to geo-specific methods on the `DataFrame` given that geo is 
what we do...happy to hear other thoughts!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Python API to cast binary columns to WKB columns [sedona-db]

Reply via email to