[ 
https://issues.apache.org/jira/browse/SEDONA-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646102#comment-17646102
 ] 

Kristin Cowalcijk commented on SEDONA-207:
------------------------------------------

The proposed geometry serialization format is described in [this 
document|https://docs.google.com/document/d/1SHLCcWpcU44P0LUMPzPolaX46LZHpw8Ndg53q-b_izk/edit?usp=sharing].
 We can review the design of the format on Google Docs.

I have also written a [blog 
post|https://kontinuation.one/posts/2022/12/improving-geometry-serde-performance-of-apache-sedona/]
 introducing the motivation and thought behind the design of the proposed 
serialization format.

> Faster serialization/deserialization of geometry objects
> --------------------------------------------------------
>
>                 Key: SEDONA-207
>                 URL: https://issues.apache.org/jira/browse/SEDONA-207
>             Project: Apache Sedona
>          Issue Type: Improvement
>            Reporter: Kristin Cowalcijk
>            Priority: High
>         Attachments: image-2022-12-02-20-19-15-597.png, 
> image-2022-12-02-20-19-36-449.png
>
>
> Recently we've looked into the performance of geometry serdes, since it 
> greatly impacts the performance of Spatial SQL. After benchmarking and 
> assessing the geometry serializers currently in Apache Sedona (ShapeSerde, 
> WKB-based GeometrySerializer, etc.), we came up with a high performance 
> geometry serde implementation which outperforms existing serdes in both 
> benchmarks and Spatial SQL end-to-end tests. It makes simple range queries 
> like this speed up by 2x:
>  
> {code:sql}
> SELECT COUNT(1) FROM traj_points WHERE ST_Within(geom, 
> ST_GeomFromText('POLYGON((120.40586018622339 
> 31.429636201527515,120.84256672919214 31.429636201527515,120.84256672919214 
> 31.089198624963103,120.40586018622339 31.089198624963103,120.40586018622339 
> 31.429636201527515))'))
> {code}
> [Here|https://github.com/Kontinuation/play-with-geometry-serde] is the 
> benchmark code and result of geometry serdes. The benchmark was performed on 
> an ECS instance with 4 Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz CPUs, 
> using OpenJDK 1.8.0_352.
> !image-2022-12-02-20-19-15-597.png|width=481,height=311! 
> !image-2022-12-02-20-19-36-449.png|width=482,height=307!
> Besides of the performance improvements, the proposed serde also supports 
> SRID and 3D/4D geometries (Z/M dimensions). I'll write a detailed 
> documentation for the proposed geometry serde in the next few days. There're 
> still a lot of things to do to integrate it into Apache Sedona. We'll 
> implement a python version of proposed serde as a C extension, and also 
> implement a pure python version using {{struct}} package as a fallback.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to