2010YOUY01 opened a new pull request, #90: URL: https://github.com/apache/sedona-db/pull/90
Hi šš¼ , Iām new to the project and still learning my way around. `sedona-db` looks great, and Iād really appreciate any feedbacks. ## Rationale Before, the execution logic for `st_geometrytype()` function is, for each row, first parse the `WKB` binary into a `WKB` object, then extract the base type from the object. This approach includes parsing unused fields in the `WKB` binary, since only the geometry type is needed. This PR let it iterate through the raw `WKB` bytes, and directly parse the bytes to get the geometry type. ## Implementation 1. Extend `GenericExecutor` with a new API `execute_wkb_bytes_void()` to iterate on raw `WKB` bytes. 2. Implement a util to parse the type from `WKB` binary according to the spec. 3. Update `st_geometrytype()` with 1 and 2 I think it's better to move `2` to `wkb` crate, it doesn't have such a public interface yet š¤ ## Benchmark ### Command ``` pytest --benchmark-group-by=param:table --benchmark-columns=median,mean,stddev test_functions.py::TestBenchFunctions::test_st_geometrytype ``` ### Result: 5x faster for complex collections, 30% faster for simple collections: ```sh -------------------------------- benchmark 'table=collections_complex': 3 tests ------------------------------- Name (time in ms) Median Mean StdDev --------------------------------------------------------------------------------------------------------------- test_st_geometrytype[collections_complex-SedonaDB] 2.3656 (1.0) 2.4929 (1.0) 0.3857 (1.0) test_st_geometrytype[collections_complex-DuckDB] 34.2037 (14.46) 34.3980 (13.80) 0.8402 (2.18) test_st_geometrytype[collections_complex-PostGIS] 304.6275 (128.77) 306.7333 (123.04) 5.8908 (15.27) --------------------------------------------------------------------------------------------------------------- ------------------------------ benchmark 'table=collections_simple': 3 tests ------------------------------- Name (time in ms) Median Mean StdDev ------------------------------------------------------------------------------------------------------------ test_st_geometrytype[collections_simple-SedonaDB] 1.3585 (1.0) 1.7419 (1.0) 1.2142 (9.41) test_st_geometrytype[collections_simple-DuckDB] 5.1103 (3.76) 5.1443 (2.95) 0.1291 (1.0) test_st_geometrytype[collections_simple-PostGIS] 46.8870 (34.51) 46.9021 (26.93) 0.3712 (2.88) ------------------------------------------------------------------------------------------------------------ ``` ```sh -------------------------------------- benchmark 'table=collections_complex': 3 tests ------------------------------------- Name (time in us) Median Mean StdDev --------------------------------------------------------------------------------------------------------------------------- test_st_geometrytype[collections_complex-SedonaDB] 419.2500 (1.0) 450.9272 (1.0) 124.1193 (1.0) test_st_geometrytype[collections_complex-DuckDB] 32,422.7921 (77.34) 32,917.7395 (73.00) 2,088.4215 (16.83) test_st_geometrytype[collections_complex-PostGIS] 295,752.0001 (705.43) 294,866.8750 (653.91) 3,872.8562 (31.20) --------------------------------------------------------------------------------------------------------------------------- ------------------------------------ benchmark 'table=collections_simple': 3 tests ------------------------------------- Name (time in us) Median Mean StdDev ------------------------------------------------------------------------------------------------------------------------ test_st_geometrytype[collections_simple-SedonaDB] 613.2090 (1.0) 1,144.3652 (1.0) 1,073.4389 (3.42) test_st_geometrytype[collections_simple-DuckDB] 5,502.5411 (8.97) 5,556.3829 (4.86) 314.2311 (1.0) test_st_geometrytype[collections_simple-PostGIS] 36,191.1250 (59.02) 36,322.7638 (31.74) 730.0613 (2.32) ------------------------------------------------------------------------------------------------------------------------ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@sedona.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org