MrPowers opened a new issue, #159:
URL: https://github.com/apache/sedona-db/issues/159

   SedonaDB currently has SQL and Python interfaces.
   
   Here's how you can get the number of rows in a SedonaDB DataFrame with 
Python:
   
   ```python
   df.count()
   ```
   
   Here's how you can get the number of rows with SQL:
   
   ```python
   df.to_view("earth_cities")
   sd.sql("select count(*) from earth_cities").show()
   ```
   
   Maybe this isn't the best example because `df.count()` returns an integer 
and `sd.sql()` returns a DataFrame, but you get the point.
   
   Other functionality can only be expressed via the SQL API.  Here is an 
example:
   
   ```python
   nyc_bbox_wkt = (
       "POLYGON((-74.2591 40.4774, -74.2591 40.9176, -73.7004 40.9176, -73.7004 
40.4774, -74.2591 40.4774))"
   )
   
   sd.sql(f"""
   SELECT
       id,
       height,
       num_floors,
       roof_shape,
       ST_Centroid(geometry) as centroid
   FROM
       buildings
   WHERE
       is_underground = FALSE
       AND height IS NOT NULL
       AND height > 20
       AND ST_Intersects(geometry, 
ST_SetSRID(ST_GeomFromText('{nyc_bbox_wkt}'), 4326))
   LIMIT 5;
   """).show()
   ```
   
   Some libraries, like Spark, have both SQL and Python interfaces.  Here's how 
this logic would be expressed in PySpark:
   
   ```python
   result = (buildings
       .where(col("is_underground") == False)
       .where(col("height").isNotNull())
       .where(col("height") > 20)
       .where(ST_Intersects(
           col("geometry"), 
           ST_SetSRID(ST_GeomFromText(lit(nyc_bbox_wkt)), lit(4326))
       ))
       .select(
           "id",
           "height", 
           "num_floors",
           "roof_shape",
           ST_Centroid(col("geometry")).alias("centroid")
       )
       .limit(5)
       .show()
   )
   ```
   
   Polars syntax would be something like this (I think, feel free to chime in 
here Polars experts):
   
   ```python
   result = (
       buildings
       .filter(
           (pl.col("is_underground") == False) &
           (pl.col("height").is_not_null()) &
           (pl.col("height") > 20) &
           pl.st_intersects(
               pl.col("geometry"),
               pl.st_set_srid(pl.st_geom_from_text(pl.lit(nyc_bbox_wkt)), 
pl.lit(4326))
           )
       )
       .select([
           pl.col("id"),
           pl.col("height"),
           pl.col("num_floors"),
           pl.col("roof_shape"),
           pl.st_centroid(pl.col("geometry")).alias("centroid")
       ])
       .limit(5)
   )
   ```
   
   Should we expose a Python interface like this?  If so, what syntax would 
users like the best?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to