petern48 commented on code in PR #363:
URL: https://github.com/apache/sedona-db/pull/363#discussion_r2560511508
##########
rust/sedona-functions/benches/native-functions.rs:
##########
@@ -147,6 +147,9 @@ fn criterion_benchmark(c: &mut Criterion) {
),
);
+ benchmark::scalar(c, &f, "native", "st_reverse", Polygon(10));
+ benchmark::scalar(c, &f, "native", "st_reverse", MultiPoint(10));
Review Comment:
I noticed your benchmark result in your description doesn't actually compare
itself to the old `geos` implementation. Here's what it usually looks like for
a performance boost (notice the `-` values).
<img width="558" height="254" alt="Image"
src="https://github.com/user-attachments/assets/9d95d322-f269-4d96-b109-174106cd2707"
/>
I ran into this issue before
[here](https://github.com/apache/sedona-db/pull/233#issuecomment-3420056856),
and pushed off getting back to benchmarking it.
Looking at the [source
code](https://github.com/apache/sedona-db/blob/240898d4f48b18821c4190abbeaa1d7c71a4253a/rust/sedona-testing/src/benchmark_util.rs#L88)
in benchmark_utils.rs, I think using the lib name is causing the bench util to
not compare them. As a quick workaround, I think changing the lib names to be
the same would enable the bench tool to group them together properly
```
cd c/sedona-geos/`
cargo bench -- st_reverse
# Manually modify the `"native"` string there to say `"geos"`
cd ../../rust/sedona-functions
cargo bench -- st_reverse
```
Don't actually push that string change. Maybe we can decide how to modify
the bench helpers later.
##########
c/sedona-geos/src/lib.rs:
##########
@@ -37,7 +37,6 @@ mod st_minimumclearance_line;
mod st_perimeter;
mod st_polygonize;
mod st_polygonize_agg;
-mod st_reverse;
Review Comment:
I actually think it's fine to leave the `geos` implementation instead of
deleting it entirely. We should register things in order so that
`sedona-functions` takes precedence. Maybe it would be nice to keep things
around for benchmarking purposes? Let's see what Dewey prefers. I can imagine
adding a feature one day to allow us to pick which implementation to use
easily, which could be nice for benchmarking purposes or maybe even dynamically
choosing which implementation to use if it's not as simple as one always being
faster. For this case, I do think `geo-traits` is always faster than `geos`.
Taking a look at the code here,
https://github.com/apache/sedona-db/blob/240898d4f48b18821c4190abbeaa1d7c71a4253a/rust/sedona/src/context.rs#L130-L135
It looks like `sedona-functions` is registered before `geos`, making `geos`
implementations take precedence. Shouldn't we move the `sedona-functions` to
the bottom, since they are generally the fastest, @paleolimbot?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]