Re: [PR] [WIP] Expose spatial partitioning from SpatialRDD [sedona]

via GitHub Mon, 27 Jan 2025 11:38:25 -0800


paleolimbot commented on code in PR #1751:
URL: https://github.com/apache/sedona/pull/1751#discussion_r1931088637



##########
spark/common/src/main/scala/org/apache/sedona/sql/utils/Adapter.scala:
##########
@@ -235,6 +236,50 @@ object Adapter {
     sparkSession.sqlContext.createDataFrame(rdd, schema)
   }
 
+  /**
+   * Convert a spatial RDD to DataFrame with a given schema keeping spatial 
partitioning
+   *
+   * Note that spatial partitioning methods that introduce duplicates will 
result in an output
+   * data frame with duplicate features. This property is essential for 
implementing correct
+   * joins; however, may introduce surprising results.
+   *
+   * @param spatialRDD
+   *   Spatial RDD
+   * @param fieldNames
+   *   Desired field names
+   * @param sparkSession
+   *   Spark Session
+   * @tparam T
+   *   Geometry
+   * @return
+   *   DataFrame with the specified field names with spatial partitioning 
preserved
+   */
+  def toDfPartitioned[T <: Geometry](
+      spatialRDD: SpatialRDD[T],
+      fieldNames: Seq[String],
+      sparkSession: SparkSession): DataFrame = {
+    @transient lazy val log = LoggerFactory.getLogger(getClass.getName)
+    log.warn(
+      "toDfParitioned() may introduce duplicates when used with 
non-specialized partitioning")

Review Comment:
   Is this the correct way to go about this? Other classes use `with Logging` 
to get a instance-specific logger, but I didn't know how to rig that here since 
the `Adapter` is an object and not a class?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [WIP] Expose spatial partitioning from SpatialRDD [sedona]

Reply via email to