Kontinuation opened a new pull request, #739:
URL: https://github.com/apache/incubator-sedona/pull/739

   ## Did you read the Contributor Guide?
   
   - Yes, I have read [Contributor 
Rules](https://sedona.apache.org/community/rule/) and [Contributor Development 
Guide](https://sedona.apache.org/community/develop/)
   
   ## Is this PR related to a JIRA ticket?
   
   - Yes, the URL of the assoicated JIRA ticket is 
https://issues.apache.org/jira/browse/SEDONA-207. The PR name follows the 
format `[SEDONA-XXX] my subject`.
   
   ## What changes were proposed in this PR?
   
   ### New Geometry Serde
   
   The new geometry serde was implemented in 
`common/src/main/java/org/apache/sedona/common/geometrySerde/`. The ShapeSerde 
used by Kryo serializer and the WKB based serde used by `GeometryUDT` were 
replaced by this new serde. Please refer to 
[SEDONA-207](https://issues.apache.org/jira/browse/SEDONA-207) for a detailed 
explanation of this new geometry serde.
   
   ### GeoParquet
   
   GeoParquet stores geometry objects as WKB binary values, which happens to be 
the old serialization format of `GeometryUDT` thus no special treatment was 
needed. This PR changed the serialization format of `GeometryUDT`, so geometry 
values in GeoParquet files need to be explicitly parsed and serialized.
   
   ### GeometryUDT in Python
   
   We've implemented the new serialization format in pure python, it is 2~3x 
slower than `shapely.wkb.loads/dumps`, which would impact the performance of 
`collect`, `toPandas` and Python UDFs in pyspark. We'll explore ways to 
implement it as a CPython extension to achieve good performance.
   
   ## How was this patch tested?
   
   Unit tests were added to test this patch. This patch was also manually 
tested on a Spark standalone cluster.
   
   The geometry serde code for Python was manually tested with shapely 2.0. We 
need to update the python unit tests to be compatible with shapely 2.0 in the 
future.
   
   ## Did this PR include necessary documentation updates?
   
   - No, this PR does not affect any public API so no need to change the docs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to