MrPowers opened a new issue, #159:
URL: https://github.com/apache/sedona-db/issues/159
SedonaDB currently has SQL and Python interfaces.
Here's how you can get the number of rows in a SedonaDB DataFrame with
Python:
```python
df.count()
```
Here's how you can get the number of rows with SQL:
```python
df.to_view("earth_cities")
sd.sql("select count(*) from earth_cities").show()
```
Maybe this isn't the best example because `df.count()` returns an integer
and `sd.sql()` returns a DataFrame, but you get the point.
Other functionality can only be expressed via the SQL API. Here is an
example:
```python
nyc_bbox_wkt = (
"POLYGON((-74.2591 40.4774, -74.2591 40.9176, -73.7004 40.9176, -73.7004
40.4774, -74.2591 40.4774))"
)
sd.sql(f"""
SELECT
id,
height,
num_floors,
roof_shape,
ST_Centroid(geometry) as centroid
FROM
buildings
WHERE
is_underground = FALSE
AND height IS NOT NULL
AND height > 20
AND ST_Intersects(geometry,
ST_SetSRID(ST_GeomFromText('{nyc_bbox_wkt}'), 4326))
LIMIT 5;
""").show()
```
Some libraries, like Spark, have both SQL and Python interfaces. Here's how
this logic would be expressed in PySpark:
```python
result = (buildings
.where(col("is_underground") == False)
.where(col("height").isNotNull())
.where(col("height") > 20)
.where(ST_Intersects(
col("geometry"),
ST_SetSRID(ST_GeomFromText(lit(nyc_bbox_wkt)), lit(4326))
))
.select(
"id",
"height",
"num_floors",
"roof_shape",
ST_Centroid(col("geometry")).alias("centroid")
)
.limit(5)
.show()
)
```
Polars syntax would be something like this (I think, feel free to chime in
here Polars experts):
```python
result = (
buildings
.filter(
(pl.col("is_underground") == False) &
(pl.col("height").is_not_null()) &
(pl.col("height") > 20) &
pl.st_intersects(
pl.col("geometry"),
pl.st_set_srid(pl.st_geom_from_text(pl.lit(nyc_bbox_wkt)),
pl.lit(4326))
)
)
.select([
pl.col("id"),
pl.col("height"),
pl.col("num_floors"),
pl.col("roof_shape"),
pl.st_centroid(pl.col("geometry")).alias("centroid")
])
.limit(5)
)
```
Should we expose a Python interface like this? If so, what syntax would
users like the best?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]