douglasdennis opened a new pull request, #792:
URL: https://github.com/apache/sedona/pull/792

   
   ## Did you read the Contributor Guide?
   
   - Yes, I have read [Contributor 
Rules](https://sedona.apache.org/latest-snapshot/community/rule/) and 
[Contributor Development 
Guide](https://sedona.apache.org/latest-snapshot/community/develop/)
   
   ## Is this PR related to a JIRA ticket?
   
   - Yes, the URL of the associated JIRA ticket is 
https://issues.apache.org/jira/browse/SEDONA-231.
   
   ## What changes were proposed in this PR?
   
   To avoid redundant serde during Spark execution, I propose the following 
changes:
   
   - Create a `SerdeAware` trait that has a single abstract method called 
`evalWithoutSerialization`. This method acts the normal `eval` methods except 
that it will not attempt to serialize its results into a Spark internal type.
   - Apply this trait to the null safe expression classes as well as whichever 
other function classes that it will work with. Implement the 
`evalWithoutSerialization` for those classes such that they do not attempt to 
serialize any Geometry object they return (and returning the actual Geometry 
object instead).
   - When a sedona function goes to extract the values from its input, it first 
checks to see if the input expression is `SerdeAware`. If the input expression 
is SerdeAware then the `evalWithoutSerialization` method is called and an 
actual Geometry object is returned, skipping the serialization and 
deserialization that would normally happen.
   
   To facilitate this, some changes had to be made to a few functions. 
Specifically, any function that could return LinearRing objects had to be 
changed such that they guarantee not doing that. This allows for compliance 
with OGC to be maintained.
   
   I have three points of concern, which is why I marked this PR as draft:
   1. I'm not sure if my approach is the most succinct or performant. If there 
are better ways to eliminate the redundant serde then I am open to comment. I 
can either implement the suggestions myself or I can close this PR and would be 
happy to see another implementation come out.
   2. I don't know how to test this. At one point I had print statements in the 
serializer (which you can see in the issue) that helped show that I had in fact 
eliminated the redundant serde, but I'm not sure how to write a test to 
automate that. Is there a way to mock the geometry serializer? Or maybe add 
debug logging that runs for the test but is disabled in some kind of release 
mode? I'm not sure what the convention is here.
   3. By enforcing LineStrings for certain functions, I have subtly changed a 
public API in flink. I'm not sure if that needs to be documented and, if so, 
where. Spark should not be affected since the serializer already does not allow 
for LinearRings. 
   
   ## How was this patch tested?
   
   Unit tests pass. A few in flink had to be updated since certain functions no 
longer return LinearRing objects.
   
   As for ensuring the redundant serde is eliminated, please see point #2 above.
   
   ## Did this PR include necessary documentation updates?
   
   - No, see above.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to