Kristin Cowalcijk created SEDONA-207:
----------------------------------------
Summary: Faster serialization/deserialization of geometry objects
Key: SEDONA-207
URL: https://issues.apache.org/jira/browse/SEDONA-207
Project: Apache Sedona
Issue Type: Improvement
Reporter: Kristin Cowalcijk
Attachments: image-2022-12-02-20-19-15-597.png,
image-2022-12-02-20-19-36-449.png
Recently we've looked into the performance of geometry serdes, since it greatly
impacts the performance of Spatial SQL. After benchmarking and assessing the
geometry serializers currently in Apache Sedona (ShapeSerde, WKB-based
GeometrySerializer, etc.), we came up with a high performance geometry serde
implementation which outperforms existing serdes in both benchmarks and Spatial
SQL end-to-end tests. It makes simple range queries like this speed up by 2x:
{code:sql}
SELECT COUNT(1) FROM traj_points WHERE ST_Within(geom,
ST_GeomFromText('POLYGON((120.40586018622339
31.429636201527515,120.84256672919214 31.429636201527515,120.84256672919214
31.089198624963103,120.40586018622339 31.089198624963103,120.40586018622339
31.429636201527515))'))
{code}
[Here|https://github.com/Kontinuation/play-with-geometry-serde] is the
benchmark code and result of geometry serdes. The benchmark was performed on an
ECS instance with 4 Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz CPUs, using
OpenJDK 1.8.0_352.
!image-2022-12-02-20-19-15-597.png|width=481,height=311!
!image-2022-12-02-20-19-36-449.png|width=482,height=307!
I'll write a detailed description for the proposed geometry serde in the next
few days. There're still a lot of things to do to integrate it into Apache
Sedona. We'll implement a python version of proposed serde as a C extension,
and also implement a pure python version using {{struct}} package as a fallback.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)