james-willis commented on code in PR #24:
URL:
https://github.com/apache/sedona-spatialbench/pull/24#discussion_r2365277246
##########
print_queries.py:
##########
@@ -404,23 +405,15 @@ def q5() -> str:
return """
-- Q5 (SedonaDB): In SedonaDB ST_Collect is an aggregate function so no need
to use ARRAY_AGG first.
-- ST_Collect does not accept an array as input so we cannot use the query
with ARRAY_AGG.
-WITH per AS (
- SELECT
- c.c_custkey,
- c.c_name AS customer_name,
- DATE_TRUNC('month', t.t_pickuptime) AS pickup_month,
- COUNT(t.t_tripkey) AS n_trips,
- ST_Area(ST_ConvexHull(
- ST_Collect(ST_GeomFromWKB(t.t_dropoffloc))
- )) AS monthly_travel_hull_area
- FROM trip t
- JOIN customer c ON t.t_custkey = c.c_custkey
- GROUP BY c.c_custkey, c.c_name, pickup_month
-)
-SELECT *
-FROM per
-WHERE n_trips > 5
-ORDER BY n_trips DESC, c_custkey ASC;
+SELECT
+ c.c_custkey, c.c_name AS customer_name,
+ DATE_TRUNC('month', t.t_pickuptime) AS pickup_month,
+ ST_Area(ST_ConvexHull(ST_Collect(ST_GeomFromWKB(t.t_dropoffloc)))) AS
monthly_travel_hull_area,
Review Comment:
I tried running sedona spark benchmarks this afternoon. The order by clause
of q5 was invalid spark sql.
Once I updated that query I felt it was important to update all
implementations of q5 to match as closely as possible.
I believe it is essential for the queries to match as closely as possible
across engines to have both a perception and a reality of fairness.
Do you think we should make a different change in order to get a working
sedona spark q5 implementation?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]