[GitHub] [sedona] umartin opened a new pull request, #748: [SEDONA-233] Incorrect results for several joins in a single stage

GitBox Wed, 18 Jan 2023 01:09:23 -0800


umartin opened a new pull request, #748:
URL: https://github.com/apache/sedona/pull/748


   
   ## Did you read the Contributor Guide?
   
   - Yes, I have read [Contributor 
Rules](https://sedona.apache.org/community/rule/) and [Contributor Development 
Guide](https://sedona.apache.org/community/develop/)
   
   ## Is this PR related to a JIRA ticket?
   
   - Yes, the URL of the assoicated JIRA ticket is 
https://issues.apache.org/jira/browse/SEDONA-233. The PR name follows the 
format `[SEDONA-XXX] my subject`.
   
   ## What changes were proposed in this PR?
   
   This patch changes how the deduplication gets it partition id. The previous 
method of getting it from TaskContext was unreliable. Now it uses 
mapPartitionsWithIndex. The documentation clearly states that is uses the 
_original_ partition id. 
https://spark.apache.org/docs/latest/api/scala/org/apache/spark/rdd/RDD.html#mapPartitionsWithIndex[U](f:(Int,Iterator[T])=%3EIterator[U],preservesPartitioning:Boolean)(implicitevidence$9:scala.reflect.ClassTag[U]):org.apache.spark.rdd.RDD[U]
   
   Deduplication is refactored out of the join judgement into a separate 
DuplicatesFilter.
   
   Deduplication code that is used in sedona-flink is moved to common.
   
   ## How was this patch tested?
   
   Unit test added
   
   ## Did this PR include necessary documentation updates?
   
   - No, this PR does not affect any public API so no need to change the docs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [sedona] umartin opened a new pull request, #748: [SEDONA-233] Incorrect results for several joins in a single stage

Reply via email to