2010YOUY01 commented on issue #530:
URL: https://github.com/apache/sedona-db/issues/530#issuecomment-3776547330
> Thank you for opening! I agree that the SQL to make this happen is pretty
ugly. I didn't know about the `* EXCLUDE (geo_bin)` trick which is nice.
>
> I'm a tiny bit worried about adding `DataFrame` methods because it's so
central to SedonaDB and we haven't spent a lot of time thinking about what a
Pythonic interface would be. I'm also hesitant to add things to the Python API
that rely on string SQL processing (but also this is what I did to implement
`sd_random_geometry()`, and we could always replace the usage hwn we have
Python tools to help with this).
>
> For example, if we added a `mutate()` method similar to Ibis' `mutate()`
one could just do:
>
> sd.read_parquet(...).mutate(geo_bin=st.geomfromwkb(_.geo_bin))
> We could also add the ability to override the GeoParquet metadata in the
reader and put some Python on top of that so you could do:
>
> sd.read_parquet(..., geometry_columns={"geo_bin": {"encoding": "WKB"}})
> The nice part about that approach is that it would work in SQL if
implemented as options (`CREATE EXTERNAL TABLE ... OPTIONS ('geometry_columns'
'{"geo_bin": {"encoding": "WKB"}}')`).
>
> I'm also open to geo-specific methods on the `DataFrame` given that geo is
what we do...happy to hear other thoughts!
This makes sense! The `DataFrame` API should stay general (like `mutate`).
For more specific APIs, we need to be more cautious: they should either
significantly improve UX or cover very common use cases. After a second
thought, I don’t think this case meets that bar.
I believe those two APIs are both needed:
- something like `mutate()` can make the geo type casting easier to do on
`DataFrame`s, and it's also more flexible for WKB Binary column generated
elsewhere
- adding `read_parquet()` option is better for this issue's specific
requirement.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]